Multi-Threaded Mandelbrot Set Program

Because each pixel in an image of the Mandelbrot Set can be computed independently of the others, rendering the Mandelbrot Set is a great opportunity to take advantage of multiple cores in modern computers. We can start with the Simple Mandelbrot Set Program developed in a prior post, then modify it to support multiple threads. Each thread will compute a row of the image independently of the others.

To coordinate the separate threads and assign rows to compute to each, we will use an atomic_int. The atomic_int is thread safe and can be read and incremented in the same atomic operation. Thus it prevents any race conditions among threads when they try to update the atomic_int.

#include <atomic>

// Atomic int used to assign tasks (row to compute) to the threads
atomic_int counter{0};

// Then in our compute_mandelbrot() function we can fetch and increment the counter
int j = counter.fetch_add(1);

Our compute_mandelbrot() function no longer takes the row to compute as an argument, but instead uses the counter fetch_add() to get the row it should compute. Once the counter returns a value greater than the image_size value, we can quit the thread.

void compute_mandelbrot(double x_min, double x_max, double y_min, double y_max, int max_iteration, int image_size) {
    double width_per_pixel = (x_max - x_min) / image_size;
    double height_per_pixel = (y_max - y_min) / image_size;
    int modulus = ceil(image_size / 40.);     // Used to control progress output
    while (true) {
        int j = counter.fetch_add(1);
        if (j < image_size) {
            image[j] = make_unique<int32_t[]>(image_size);
            // Calculate the imaginary coordinate of the middle of the pixel row
            double y0 = y_max - height_per_pixel * (j + 0.5);
            for (int i = 0; i < image_size; ++i) {
                // Calculate the real coordinate of the middle of the pixel
                double x0 = width_per_pixel * (i + 0.5) + x_min;
                // Compute the escape number of iterations [0, max_iteration)
                int level = compute_escape(complex(x0, y0), max_iteration);
                // Store the escape iterations in the image array.
                image[j][i] = level;
            }
            // Output periodic progress
            if (j % modulus == 0) cout << '.' << flush;
        } else {
            break;
        }
    }
}

Then in our main() function we can launch multiple threads to work the row computations. We do this by creating thread objects as we emplace them on a vector to keep track of them. We do this so that we can wait on each thread to complete before terminating the program.

    // Get the number of CPU cores.
    unsigned int num_cpus = thread::hardware_concurrency();
    if (num_cpus == 0) {
        cout << "Could not determine number of CPUs.  Using 4.\n";
        num_cpus = 4;
    } else {
        cout << "Number of CPU cores: " << num_cpus << endl;
    }
    // Create the threads
    vector<thread> threads;
    for (int i = 0; i < num_cpus; ++i) {
        threads.emplace_back(compute_mandelbrot, x_min, x_max, y_min, y_max, max_iteration, image_size);
    }
    // Wait for all threads to complete
    for (auto& t : threads) {
        t.join();  
    }

When we put it all together we have a program that will take advantage of all the cores available.

Complete mandelbrot_threaded.cpp
/**
* @file mandelbrot_threaded.cpp
* @brief Program to compute the Mandelbrot Set.
* 
* This program can compute images of the Mandelbrot Set in its
* entirety or close-ups of boundary areas.  It will use as many
* CPUs as are available to speed up the rendering.
*
* Requires:
* Linux<br>
* get cli11.hpp at https://github.com/CLIUtils/CLI11/releases
*
* macOS<br>
* brew install cli11<br>
*
* Build:
* Linux<br>
* put CLI.hpp in /usr/local/include <br>
*
* macOS<br>
* Add to .zshrc<br>
* export CPPFLAGS="-I/opt/homebrew/include"<br>
* export LDFLAGS="-L/opt/homebrew/lib"
*
* Compile with<br>
* Linux<br>
* g++ --std=gnu++20 -o mandelbrot_threaded mandelbrot_threaded.cpp

* macOS<br>
* g++ $CPPFLAGS $LDFLAGS --std=gnu++20 -o mandelbrot_threaded mandelbrot_threaded.cpp
*
* Execute:
* mandelbrot_threaded real imaginary radius [--filename=basename]
*     [--image_size=pixels] [--max_iterations=count]
*
* @author Richard Lesh
* @date 2025-03-12
* @version 1.0
*/

#include <CLI/CLI.hpp>
#include <complex>
#include <fstream>
#include <string>
#include <thread>
#include <vector>

#define MAX_PIXEL_LEVELS 256            // 256 for 8-bit

using namespace std;

/**
 * @brief Computes the Unit in the Last Place (ULP) of a double-precision value.
 *
 * This function calculates the distance between the given value @p x and the next
 * representable double greater than @p x in the direction of positive infinity.
 * The result gives a measure of the precision at the value @p x.
 *
 * @param x The input double value.
 * @return The ULP of @p x (i.e., the difference to the next larger representable double).
 *
 * @note If @p x is NaN or infinity, the result may be undefined.
 */
inline double ULP(double x) {
    double next = nextafter(x, numeric_limits<double>::infinity());
    double ulp = next - x;
    return ulp;
}

// Array used to store the image in a thread safe way as an array of arrays
// Each element of the array will be a pointer to an array containing the
// pixels for a row of the image.  Rows will be computed by threads.
unique_ptr<unique_ptr<int32_t[]>[]> image = nullptr;
// Atomic int used to assign tasks (row to compute) to the threads
atomic_int counter{0};

/**
 * @brief Computes the squared magnitude (norm) of a complex number without
 * taking the square root.
 *
 * This function returns the sum of the squares of the real and imaginary parts
 * of the complex number @p z.  It avoids the overhead of computing a square
 * root, making it useful for performance-sensitive contexts like Mandelbrot
 * set calculations where only relative magnitude comparisons are needed.
 *
 * @tparam T The numeric type (e.g., float, double, mpreal).
 * @param z The complex number whose squared magnitude is to be computed.
 * @return The squared magnitude of @p z, equivalent to |z|^2.
 *
 * @see std::norm may be used as a standard alternative, but may perform
 * additional checks.
 */
template<typename T>
inline T norm_squared(complex<T> z) {
    return z.real() * z.real() + z.imag() * z.imag();
}

// Number of pixels that escape
atomic_int num_escaped{0};
// Number of cycles detected
atomic_int num_cycles{0};
// Number of pixels that reach max. iterations
atomic_int num_max_iter{0};

/**
* @brief Compute the number of iterations to escape.
*
* Function computes the number of iterations of the formula
* z(n+1) = z(n)^2 + c that can be executed before the value of |z|
* exceeds the escape limit radius of 2.0.
*
* The function uses Floyd’s Cycle Detection (Tortoise 
* and Hare) algorithm to detect cycles.

* @param c Complex point used to perform the calculation.
* @param max_iteration Number of iterations, if exceeded, when we will declare
*                      as not escaping and therefore in the Mandelbrot set.
* @return Value [0, max_iteration) representing the escape iterations.
*                      0 is considered as not escaping.
*/

int compute_escape(complex<double> c, int max_iteration) {
    complex<double> tortoise{0., 0.};
    complex<double> hare{0., 0.};

    int i = 0;
    while (i < max_iteration && norm_squared(tortoise) < 4.0) {
        tortoise = tortoise * tortoise + c;
        hare = hare * hare + c;
        hare = hare * hare + c; // hare moves twice

        double ulps = ULP(norm(tortoise));
        ulps = 9 * ulps * ulps;
        if (norm_squared(tortoise - hare) < ulps) {
            ++num_cycles;
            return 0L; // cycle detected
        }
        ++i;
    }

    if (i == max_iteration) {   // Return 0 (black) if we are in the set
        ++num_max_iter;
        return 0L;
    } else {
        ++num_escaped;
        return i;
    }
}

/**
* @brief Compute a portion of the Mandelbrot Set.
* 
* Function computes a portion of the Mandelbrot Set row by row.
* The atomic integer counter is used to get the row to compute.
* The row is stored in the image array.
* 
* @param x_min Minimum real axis bound to plot.
* @param x_max Maximum real axis bound to plot.
* @param y_min Minimum imaginary axis bound to plot.
* @param y_max Maximum imaginary axis bound to plot.
* @param max_iteration Limit to the number of iterations we will compute per point.
* @param image_size Size, in pixels, of the image to generate.
*/
void compute_mandelbrot(double x_min, double x_max, double y_min, double y_max, int max_iteration, int image_size) {
    double width_per_pixel = (x_max - x_min) / image_size;
    double height_per_pixel = (y_max - y_min) / image_size;
    int modulus = ceil(image_size / 40.);     // Used to control progress output
    while (true) {
        int j = counter.fetch_add(1);
        if (j < image_size) {
            image[j] = make_unique<int32_t[]>(image_size);
            // Calculate the imaginary coordinate of the middle of the pixel row
            double y0 = y_max - height_per_pixel * (j + 0.5);
            for (int i = 0; i < image_size; ++i) {
                // Calculate the real coordinate of the middle of the pixel
                double x0 = width_per_pixel * (i + 0.5) + x_min;
                // Compute the escape number of iterations [0, max_iteration)
                int level = compute_escape(complex(x0, y0), max_iteration);
                // Store the escape iterations in the image array.
                image[j][i] = level;
            }
            // Output periodic progress
            if (j % modulus == 0) cout << '.' << flush;
        } else {
            break;
        }
    }
}

/**
* @brief Write out a PGM file.
* 
* Function writes out image data to a PGM 8-bit format file. This function 
* normalizes the level values using a histogram.
*
* @param filename Pathspec to the file that will be created.
* @param max_iteration Limit to the number of iterations we used to compute each point.
* @param image_size Size, in pixels, of the image to save.
*/

void output_file(const string &filename, int max_iteration, int image_size) {
    string ext = ".pgm";

    cout << endl << "Creating " << (filename + ext) << "..." << endl;
    ofstream out_file((filename + ext).c_str(), ios::binary);
    if (!out_file) {                    // Check if file opened successfully
        cerr << "Error opening file " << (filename + ext) << " for writing!" << endl;
        exit(1);
    }
    // File header is P5 for binary gray map, x_size y_size, max pixel value
    out_file << "P5\n" << image_size << " " << image_size << "\n" << (MAX_PIXEL_LEVELS - 1) << "\n";

    // Find the maximum iteration count
    int max = 0;
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (image[j][i] > max) max = image[j][i];
        }
    }
    max_iteration = max + 1;
    // Compute bucket size for histogram
    int max_iteration_bits = static_cast<int>(ceil(log(max_iteration)/log(2.0)));
    int bucket_shift_bits = max_iteration_bits > 20 ? max_iteration_bits - 20 : 0;
    vector<int>::size_type histogram_size = (max_iteration >> bucket_shift_bits) + 1;
    // Compute histogram
    vector<int> histogram(histogram_size, 0);
    int totalCount = 0;
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (image[j][i] > 0) {
                ++histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)];
                ++totalCount;
            }
        }
    }
    // Compute cumulative histogram
    int count = 0;
    for (vector<int>::size_type i = 1; i < histogram_size; ++i) {
        count += histogram[i];
        histogram[i] = count;
    }
    // Normalize using histogram
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            uint8_t level = floor(histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)]
                / static_cast<double>(totalCount + 1) * MAX_PIXEL_LEVELS);
            out_file.write(reinterpret_cast<const char*>(&level), 1);
        }
    }
    out_file.close();
}

/**
* @brief Main function.
* 
* The entry point of the program.
*
* Syntax: mandelbrot_threaded real imaginary radius [--filename=basename]
*     [--image_size=n] [--max_iterations=n]
* 
* real & imaginary specify the coordinates of the Mandelbrot Set to display at
* the center of the image.
*
* radius is the radius in Mandelbrot Set coordinates out to the extreme edges
* of the image.
*
* max_iterations is the maximum number of iterations per pixel allowed to
* determine if it is in the set.  The max_iterations will generally need to get
* larger as radius gets smaller.
*
* image_size is the size in pixels for the width and height of the square image.
*
* filename is the basename of the image file.  It defaults to "image".
*
* @return int Returns 0 on successful execution.
*/
int main(int argc, char** argv) {
    CLI::App app{"Program to generate Mandelbrot Set images."};
    app.set_help_flag("", "");
    argv = app.ensure_utf8(argv);
    // use CLI11 to process command line options.
    vector<string> coords;
    app.add_option("coords", coords, "Location to plot (x y radius)")
        ->expected(3);

    // Optional arguments
    string filename = "image";
    int max_iteration = 10000;
    int image_size = 512;
    bool show_help = false;

    app.add_flag("--help", show_help, "Show help");
    app.add_option("-f,--filename", filename, "Base filename (no extension)");
    app.add_option("-m,--max_iterations", max_iteration, "Max. iterations");
    app.add_option("-s,--image_size", image_size, "Image size in pixels");

    CLI11_PARSE(app, argc, argv);

    if (show_help) {
        cout << app.help() << endl;
        return 0;
    }

    // Get the three required coordinates to plot
    double x, y, radius;
    if (coords.size() == 3) {
        x = stod(coords[vector<string>::size_type(0)]);
        y = stod(coords[vector<string>::size_type(1)]);
        radius = stod(coords[vector<string>::size_type(2)]);
    } else {
        cout << "Arguments x, y and radius required but not found!" << endl;
        return 1;
    }

    // Check the image size
    if (image_size > sqrt(numeric_limits<int>::max())) {
        cout << "Image size can not exceed " << floor(sqrt(numeric_limits<int>::max())) << endl;
        return 1;
    }
    
    // Compute limits of image
    double x_min = x - radius;
    double x_max = x + radius;
    double y_min = y - radius;
    double y_max = y + radius;

    // Output settings
    cout << "Maximum Iterations: " << max_iteration << endl;
    cout << "Image Bounds: " << setprecision(numeric_limits<double>::digits10)
        << x_min << ", " << x_max << ", " << y_min << ", " << y_max << endl;
    cout << "Image Size: " << image_size << " x " << image_size << endl;
    
    // Allocate an array for the row pointers.
    image = make_unique<unique_ptr<int32_t[]>[]>(image_size);

    // Do a simple check to see if our doubles have enough precision
    if ((x_max - x_min) / image_size < ULP(x) ||
        (y_max - y_min) / image_size < ULP(y))
        cout << "Doubles may not have enough precision!" << endl;

    // Get the number of CPU cores.
    unsigned int num_cpus = thread::hardware_concurrency();
    if (num_cpus == 0) {
        cout << "Could not determine number of CPUs.  Using 4.\n";
        num_cpus = 4;
    } else {
        cout << "Number of CPU cores: " << num_cpus << endl;
    }

    // Create the threads
    vector<thread> threads;
    threads.reserve(num_cpus);
    for (int i = 0; i < num_cpus; ++i) {
        threads.emplace_back(&compute_mandelbrot, x_min, x_max, y_min, y_max,
            max_iteration, image_size);
    }
    // Wait for all threads to complete
    for (thread &t: threads) {
        t.join();
    }

    cout << endl;
    cout << "Number of escaped pixels: " << num_escaped << endl;
    cout << "Number of cycles detected: " << num_cycles << endl;
    cout << "Number of max iteration pixels: " << num_max_iter << endl;

    // Generate the output file
    output_file(filename, max_iteration, image_size);
    return 0;
}

Adding Color

Another enhancement that we can make is to add color to our Mandelbrot Images. While the Mandelbrot Set is inherently a grayscale image, many people like to create schemes for colorizing the image. Many of these schemes add so much false color that it can be hard to see the structure of the Mandelbrot Set. We can however, add color without hiding details by changing the colors at the two ends of the grayscale. Instead of using black for rapidly escaping and white for slowly escaping points, we can pick any two colors and blend them according to the number of iterations to escape. This program adds two more command line arguments --low_color and --high_color for rapidly and slowly escaping points respectively. The color arguments are three or six hexadecimal digits for the RGB specification of the color.

Working with colors is easy if we first define a simple structure to hold the RGB values of a color.

/**
* @brief Stucture for storing RGB colors.
*
* Stores the RGB components of an RGB color.
* Can be 8-bit or 16-bit per channel.
* Use Depth = uint8_t or uint16_t
*/
template <typename Depth>
struct Color {
    Depth R;
    Depth G;
    Depth B;

    /**
    * @brief Default Color constructor.
    *
    * Creates a Color object set to white.
    */
    Color() {
        R = G = B = std::numeric_limits<Depth>::max();
    }

    /**
    * @brief Constructor for RGB colors.
    *
    * Each color compontent can be in the range [0, MAX_PIXEL_LEVELS)
    * @param r Red component
    * @param g Green component
    * @param b Blue component
    */
    Color(Depth r, Depth g, Depth b) {
        R = std::max(Depth(0), r);
        G = std::max(Depth(0), g);
        B = std::max(Depth(0), b);
    }

    /**
    * @brief Constructor using hexadecimal string.
    *
    * Each color compontent is specified in hexadecimal in string form
    * "#xxx"  or "#xxxxxx".
    *
    * @param hex Hexadecimal color string
    */
    explicit Color(std::string hex) {
        // Ensure the input is valid
        std::string hex_color = hex;
        if (hex_color[std::string::size_type(0)] == '#') hex_color = hex_color.substr(1);   // Remove `#` if present

        if (hex_color.length() == 3) {                              // Shorthand #RGB format
            hex_color = {hex_color[std::string::size_type(0)], hex_color[std::string::size_type(0)],
                hex_color[std::string::size_type(1)], hex_color[std::string::size_type(1)],
                hex_color[std::string::size_type(2)], hex_color[std::string::size_type(2)]};
        }

        if (hex_color.length() != 6) {
            throw std::invalid_argument("Invalid hex color format");
        }

        // Convert hex to RGB values
        if constexpr (sizeof(Depth) == 2) {
            int v = std::stoi(hex_color.substr(0, 2), nullptr, 16);
            R = static_cast<Depth>(v * 256 + v);
            v = std::stoi(hex_color.substr(2, 2), nullptr, 16);
            G = static_cast<Depth>(v * 256 + v);
            v = std::stoi(hex_color.substr(4, 2), nullptr, 16);
            B = static_cast<Depth>(v * 256 + v);
        } else if constexpr (sizeof(Depth) == 1){
            R = static_cast<Depth>(std::stoi(hex_color.substr(0, 2), nullptr, 16));
            G = static_cast<Depth>(std::stoi(hex_color.substr(2, 2), nullptr, 16));
            B = static_cast<Depth>(std::stoi(hex_color.substr(4, 2), nullptr, 16));
        } else {
            static_assert(false, "Depth type can only be 1 or 2 bytes unsigned.");
        }
    }
    bool operator==(const Color<Depth> &arg) const {
        return R == arg.R && G == arg.G && B == arg.B;
    }
    bool isGray() {
        return R == G && G == B;
    }
    friend std::ostream;
};

Using this datatype we can define our RGB color type.

typedef Color<PIXEL_TYPE> RGB;

To compute the range of colors between the low and high color based on iterations needed to escape, we can setup a blend_colors() method that blends the two colors. We will pass a double in the range [0.0, 1.0) that determines the amount of blending. Passing 0.0 will return BLACK. Values near 0.0 will return the first color passed whereas a value close to 1.0 will return the second color passed.

template <typename Depth>
Color<Depth> blend_colors(double level, const Color<Depth>& color1, const Color<Depth>& color2) {
    if (level <= 0.0) return Color<Depth>(0, 0, 0);
    if (level >= 1.0) return Color<Depth>();;

    return Color<Depth>(
        static_cast<unsigned int>(color1.R + level * (color2.R - color1.R + 1)),
        static_cast<unsigned int>(color1.G + level * (color2.G - color1.G + 1)),
        static_cast<unsigned int>(color1.B + level * (color2.B - color1.B + 1))
    );
}

If we want to get fancy, we can assign a rainbow of colors to the range [0.0, 1.0). There are many ways to do this but one simple method is provided below. It converts the level into a hue value that is then used to compute the RGB components of a color.

template <typename Depth>
Color<Depth> level_2_RGB(double level) {
    if (level <= 0.0) return Color<Depth>(0, 0, 0);
    double H = (log(level + 1)/log(log(2.0)) - 1);
    double R = (sin(2.0 * M_PI * H) + 1.0) / 2.0;
    double G = (sin(2.0 * M_PI * H + 2.0 * M_PI / 3.0) + 1.0) / 2.0;
    double B = (sin(2.0 * M_PI * H + 4.0 * M_PI / 3.0) + 1.0) / 2.0;
    Color<Depth> result(
        static_cast<unsigned int>(floor(R * std::numeric_limits<Depth>::max())),
        static_cast<unsigned int>(floor(G * std::numeric_limits<Depth>::max())),
        static_cast<unsigned int>(floor(B * std::numeric_limits<Depth>::max())));
    return result;
}
Complete Color.hpp
//
// Created by Richard Lesh on 20/05/2025.
// Copyright (c) 2025 Richard Lesh. All rights reserved.
//

#ifndef COLOR_HPP
#define COLOR_HPP
#pragma once

#include <algorithm>
#include <limits>
#include <string>

/**
* @brief Stucture for storing RGB colors.
*
* Stores the RGB components of an RGB color.
* Can be 8-bit or 16-bit per channel.
* Use Depth = uint8_t or uint16_t
*/
template <typename Depth>
struct Color {
    Depth R;
    Depth G;
    Depth B;

    /**
    * @brief Default Color constructor.
    *
    * Creates a Color object set to white.
    */
    Color() {
        R = G = B = std::numeric_limits<Depth>::max();
    }

    /**
    * @brief Constructor for RGB colors.
    *
    * Each color compontent can be in the range [0, MAX_PIXEL_LEVELS)
    * @param r Red component
    * @param g Green component
    * @param b Blue component
    */
    Color(Depth r, Depth g, Depth b) {
        R = std::max(Depth(0), r);
        G = std::max(Depth(0), g);
        B = std::max(Depth(0), b);
    }

    /**
    * @brief Constructor using hexadecimal string.
    *
    * Each color compontent is specified in hexadecimal in string form
    * "#xxx"  or "#xxxxxx".
    *
    * @param hex Hexadecimal color string
    */
    explicit Color(std::string hex) {
        // Ensure the input is valid
        std::string hex_color = hex;
        if (hex_color[std::string::size_type(0)] == '#') hex_color = hex_color.substr(1);   // Remove `#` if present

        if (hex_color.length() == 3) {                              // Shorthand #RGB format
            hex_color = {hex_color[std::string::size_type(0)], hex_color[std::string::size_type(0)],
                hex_color[std::string::size_type(1)], hex_color[std::string::size_type(1)],
                hex_color[std::string::size_type(2)], hex_color[std::string::size_type(2)]};
        }

        if (hex_color.length() != 6) {
            throw std::invalid_argument("Invalid hex color format");
        }

        // Convert hex to RGB values
        if constexpr (sizeof(Depth) == 2) {
            int v = std::stoi(hex_color.substr(0, 2), nullptr, 16);
            R = static_cast<Depth>(v * 256 + v);
            v = std::stoi(hex_color.substr(2, 2), nullptr, 16);
            G = static_cast<Depth>(v * 256 + v);
            v = std::stoi(hex_color.substr(4, 2), nullptr, 16);
            B = static_cast<Depth>(v * 256 + v);
        } else if constexpr (sizeof(Depth) == 1){
            R = static_cast<Depth>(std::stoi(hex_color.substr(0, 2), nullptr, 16));
            G = static_cast<Depth>(std::stoi(hex_color.substr(2, 2), nullptr, 16));
            B = static_cast<Depth>(std::stoi(hex_color.substr(4, 2), nullptr, 16));
        } else {
            static_assert(false, "Depth type can only be 1 or 2 bytes unsigned.");
        }
    }
    bool operator==(const Color<Depth> &arg) const {
        return R == arg.R && G == arg.G && B == arg.B;
    }
    bool isGray() {
        return R == G && G == B;
    }
    friend std::ostream;
};

template <typename Depth>
std::ostream &operator<<(std::ostream &os, const Color<Depth> &rgb) {
    int w = sizeof(Depth) * 2;
    os << std::hex << std::setfill('0') << "RGB(";
    os << std::setw(w) << rgb.R << ",";
    os << std::setw(w) << rgb.G << ",";
    os << std::setw(w) << rgb.B << ")";
    return os;
}

/**
* @brief Function to blend two RGB colors.
*
* Function to blend two RGB colors based on a level in [0.0, 1.0)
* Exactly 0.0 will return black.
*
* @param level Level value to control blending [0.0, 1.0)
* @param color1 Color for near 0.0 level
* @param color2 Color for near 1.0 level
* return Blended color
*/
template <typename Depth>
Color<Depth> blend_colors(double level, const Color<Depth>& color1, const Color<Depth>& color2) {
    if (level <= 0.0) return Color<Depth>(0, 0, 0);
    if (level >= 1.0) return Color<Depth>();;

    return Color<Depth>(
        static_cast<unsigned int>(color1.R + level * (color2.R - color1.R + 1)),
        static_cast<unsigned int>(color1.G + level * (color2.G - color1.G + 1)),
        static_cast<unsigned int>(color1.B + level * (color2.B - color1.B + 1))
    );
}

/**
* @brief Function to generate rainbow colors.
*
* Function to a rainbow of colors based on a level in [0.0, 1.0)
* Exactly 0.0 will return black.
*
* @param level Level value to control rainbow [0.0, 1.0)
* return Rainbow color
*/
template <typename Depth>
Color<Depth> level_2_RGB(double level) {
    if (level <= 0.0) return Color<Depth>(0, 0, 0);
    double H = (log(level + 1)/log(log(2.0)) - 1);
    double R = (sin(2.0 * M_PI * H) + 1.0) / 2.0;
    double G = (sin(2.0 * M_PI * H + 2.0 * M_PI / 3.0) + 1.0) / 2.0;
    double B = (sin(2.0 * M_PI * H + 4.0 * M_PI / 3.0) + 1.0) / 2.0;
    Color<Depth> result(
        static_cast<unsigned int>(floor(R * std::numeric_limits<Depth>::max())),
        static_cast<unsigned int>(floor(G * std::numeric_limits<Depth>::max())),
        static_cast<unsigned int>(floor(B * std::numeric_limits<Depth>::max())));
    return result;
}
#endif //COLOR_HPP

The only remaining thing that we need to do is to modify the output_file() function to generate PPM (Portable Pixel Map) files that store the three RGB channel values for each pixel. We will still write out PGM (Portable Gray Map) files that just have one value per pixel when the default colors of BLACK and WHITE are used.

    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (USE_COLOR) {
                RGB rgb(0,0,0);
                if (image[j][i] != 0) {
                    if (use_rainbow)
                        rgb = level_2_RGB<PIXEL_TYPE>(
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1));
                    else
                        rgb = blend_colors(
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1), color1, color2);
                }
                if constexpr (sizeof(PIXEL_TYPE) == 2) {
                    out_file.write(reinterpret_cast<const char*>(&rgb.R) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.R), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B), 1);
                } else {
                    out_file.write(reinterpret_cast<const char*>(&rgb.R), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B), 1);
                }
            } else {
                if constexpr (sizeof(PIXEL_TYPE) == 2) {
                    PIXEL_TYPE level = 0;
                    if (image[j][i] != 0)
                        level = static_cast<uint8_t>(color1.R +
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1) * (color2.R - color1.R + 1));
                    out_file.write(reinterpret_cast<const char*>(&level) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&level), 1);
                } else {
                    PIXEL_TYPE level = 0;
                    if (image[j][i] != 0)
                        level = static_cast<uint8_t>(color1.R +
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1) * (color2.R - color1.R + 1));
                    out_file.write(reinterpret_cast<const char*>(&level), 1);
                }
            }
        }
    }

Putting it all together we have the mandelbrot_threaded_color.cpp program.

Complete mandelbrot_threaded_color.cpp
/**
* @file mandelbrot_threaded_color.cpp
* @brief Program to compute the Mandelbrot Set.
* 
* This program can compute images of the Mandelbrot Set in its
* entirety or close ups of boundary areas.  It will use as many
* CPUs as are available to speed up the rendering.
*
* It also allows two RGB colors to be specified to define the
* coloring of the set based on the escape value of a pixel.
* Colors can be specified using #xxx or #xxxxxx hexadecimal.
*
* Requires:
* Linux<br>
* get cli11.hpp at https://github.com/CLIUtils/CLI11/releases
*
* macOS<br>
* brew install cli11<br>
*
* Build:
* Linux<br>
* put CLI.hpp in /usr/local/include <br>
*
* macOS<br>
* Add to .zshrc<br>
* export CPPFLAGS="-I/opt/homebrew/include"<br>
* export LDFLAGS="-L/opt/homebrew/lib"
*
* Compile with<br>
* Linux<br>
* g++ --std=gnu++20 -o mandelbrot_threaded_color mandelbrot_threaded_color.cpp

* macOS<br>
* g++ $CPPFLAGS $LDFLAGS --std=gnu++20 -o mandelbrot_threaded_color mandelbrot_threaded_color.cpp
*
* Execute:
* mandelbrot_threaded_color real imaginary radius [--filename=basename]
*     [--image_size=pixels] [--max_iterations=count]
      [--low_color=#xxx] [--high_color=#xxx] [--rainbow]
*
* @author Richard Lesh
* @date 2025-03-12
* @version 1.0
*/

#include <CLI/CLI.hpp>
#include <complex>
#include <fstream>
#include <string>
#include <thread>
#include <vector>
#include "Color.hpp"

#define MAX_PIXEL_LEVELS 256            // 256 for 8-bit
bool USE_COLOR = true;                  // True to allow colors to be used
bool USE_RAINBOW = false;               // True to use rainbow colors

#if MAX_PIXEL_LEVELS > 256
    #define PIXEL_TYPE uint16_t
#else
    #define PIXEL_TYPE uint8_t
#endif

using namespace std;

typedef Color<PIXEL_TYPE> RGB;

/**
 * @brief Computes the Unit in the Last Place (ULP) of a double-precision value.
 *
 * This function calculates the distance between the given value @p x and the next
 * representable double greater than @p x in the direction of positive infinity.
 * The result gives a measure of the precision at the value @p x.
 *
 * @param x The input double value.
 * @return The ULP of @p x (i.e., the difference to the next larger representable double).
 *
 * @note If @p x is NaN or infinity, the result may be undefined.
 */
inline double ULP(double x) {
    double next = nextafter(x, numeric_limits<double>::infinity());
    double ulp = next - x;
    return ulp;
}

// Array used to store the image in a thread safe way as an array of arrays
// Each element of the array will be a pointer to an array containing the
// pixels for a row of the image.  Rows will be computed by threads.
unique_ptr<unique_ptr<int32_t[]>[]> image = nullptr;
// Atomic int used to assign tasks (row to compute) to the threads
atomic_int counter{0};

/**
 * @brief Computes the squared magnitude (norm) of a complex number without
 * taking the square root.
 *
 * This function returns the sum of the squares of the real and imaginary parts
 * of the complex number @p z.  It avoids the overhead of computing a square
 * root, making it useful for performance-sensitive contexts like Mandelbrot
 * set calculations where only relative magnitude comparisons are needed.
 *
 * @tparam T The numeric type (e.g., float, double, mpreal).
 * @param z The complex number whose squared magnitude is to be computed.
 * @return The squared magnitude of @p z, equivalent to |z|^2.
 *
 * @see std::norm may be used as a standard alternative, but may perform
 * additional checks.
 */
template<typename T>
inline T norm_squared(complex<T> z) {
    return z.real() * z.real() + z.imag() * z.imag();
}

// Number of pixels that escape
atomic_int num_escaped{0};
// Number of cycles detected
atomic_int num_cycles{0};
// Number of pixels that reach max. iterations
atomic_int num_max_iter{0};

/**
* @brief Compute the number of iterations to escape.
*
* Function computes the number of iterations of the formula
* z(n+1) = z(n)^2 + c that can be executed before the value of |z|
* exceeds the escape limit radius of 2.0.
*
* The function uses Floyd’s Cycle Detection (Tortoise 
* and Hare) algorithm to detect cycles.

* @param c Complex point used to perform the calculation.
* @param max_iteration Number of iterations, if exceeded, when we will declare
*                      as not escaping and therefore in the Mandelbrot set.
* @return Value [0, max_iteration) representing the escape iterations.
*                      0 is considered as not escaping.
*/

int compute_escape(complex<double> c, int max_iteration) {
    complex<double> tortoise{0., 0.};
    complex<double> hare{0., 0.};

    int i = 0;
    while (i < max_iteration && norm_squared(tortoise) < 4.0) {
        tortoise = tortoise * tortoise + c;
        hare = hare * hare + c;
        hare = hare * hare + c; // hare moves twice

        double ulps = ULP(norm(tortoise));
        ulps = 9 * ulps * ulps;
        if (norm_squared(tortoise - hare) < ulps) {
            ++num_cycles;
            return 0L; // cycle detected
        }
        ++i;
    }

    if (i == max_iteration) {   // Return 0 (black) if we are in the set
        ++num_max_iter;
        return 0L;
    } else {
        ++num_escaped;
        return i;
    }
}

/**
* @brief Compute a portion of the Mandelbrot Set.
* 
* Function computes a portion of the Mandelbrot Set row by row.
* The atomic integer counter is used to get the row to compute.
* The row is stored in the image array.
* 
* @param x_min Minimum real axis bound to plot.
* @param x_max Maximum real axis bound to plot.
* @param y_min Minimum imaginary axis bound to plot.
* @param y_max Maximum imaginary axis bound to plot.
* @param max_iteration Limit to the number of iterations we will compute per point.
* @param image_size Size, in pixels, of the image to generate.
*/
void compute_mandelbrot(double x_min, double x_max, double y_min, double y_max, int max_iteration, int image_size) {
    double width_per_pixel = (x_max - x_min) / image_size;
    double height_per_pixel = (y_max - y_min) / image_size;
    int modulus = ceil(image_size / 40.);     // Used to control progress output
    while (true) {
        int j = counter.fetch_add(1);
        if (j < image_size) {
            image[j] = make_unique<int32_t[]>(image_size);
            // Calculate the imaginary coordinate of the middle of the pixel row
            double y0 = y_max - height_per_pixel * (j + 0.5);
            for (int i = 0; i < image_size; ++i) {
                // Calculate the real coordinate of the middle of the pixel
                double x0 = width_per_pixel * (i + 0.5) + x_min;
                // Compute the escape number of iterations [0, max_iteration)
                int level = compute_escape(complex(x0, y0), max_iteration);
                // Store the escape iterations in the image array.
                image[j][i] = level;
            }
            // Output periodic progress
            if (j % modulus == 0) cout << '.' << flush;
        } else {
            break;
        }
    }
}

/**
* @brief Write out a PGM/PPM file.
* 
* Function writes out image data to a PGM or PPM format file.  Can be an 
* 8-bit/16-bit PGM or an 8-bit/16-bit PPM.  This function normalizes the
* level values using a histogram.
* 
* @param filename Pathspec to the file that will be created.
* @param max_iteration Limit to the number of iterations we used to compute each point.
* @param image_size Size, in pixels, of the image to save.
* @param color1 First color to use for near 1 iteration before escaping.
* @param color2 Second color to use for near max_iterations iterations before escaping.
* @param use_rainbow true to use rainbow coloring.
*/
void output_file(const string &filename, int max_iteration, int image_size, RGB color1, RGB color2, bool use_rainbow) {
    string ext = ".pgm";
    if (USE_COLOR) ext = ".ppm";

    cout << endl << "Creating " << (filename + ext) << "..." << endl;
    ofstream out_file((filename + ext).c_str(), ios::binary);
    if (!out_file) {                    // Check if file opened successfully
        cerr << "Error opening file " << (filename + ext) << " for writing!" << endl;
        exit(1);
    }
    if (USE_COLOR) {
        // File header is P6 for binary pix map, x_size y_size, max pixel value per RGB component
        out_file << "P6\n" << image_size << " " << image_size << "\n" << (MAX_PIXEL_LEVELS - 1) << "\n";
    } else {
        // File header is P5 for binary gray map, x_size y_size, max pixel value
        out_file << "P5\n" << image_size << " " << image_size << "\n" << (MAX_PIXEL_LEVELS - 1) << "\n";
    }

    // Find the maximum iteration count
    int max = 0;
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (image[j][i] > max) max = image[j][i];
        }
    }
    max_iteration = max + 1;
    // Compute bucket size for histogram
    int max_iteration_bits = static_cast<int>(ceil(log(max_iteration)/log(2.0)));
    int bucket_shift_bits = max_iteration_bits > 20 ? max_iteration_bits - 20 : 0;
    vector<int>::size_type histogram_size = (max_iteration >> bucket_shift_bits) + 1;
    // Compute histogram
    vector<int> histogram(histogram_size, 0);
    int totalCount = 0;
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (image[j][i] > 0) {
                ++histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)];
                ++totalCount;
            }
        }
    }
    // Compute cumulative histogram
    int count = 0;
    for (vector<int>::size_type i = 1; i < histogram_size; ++i) {
        count += histogram[i];
        histogram[i] = count;
    }
    // Normalize using histogram
    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            if (USE_COLOR) {
                RGB rgb(0,0,0);
                if (image[j][i] != 0) {
                    if (use_rainbow)
                        rgb = level_2_RGB<PIXEL_TYPE>(
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1));
                    else
                        rgb = blend_colors(
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1), color1, color2);
                }
                if constexpr (sizeof(PIXEL_TYPE) == 2) {
                    out_file.write(reinterpret_cast<const char*>(&rgb.R) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.R), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B), 1);
                } else {
                    out_file.write(reinterpret_cast<const char*>(&rgb.R), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.G), 1);
                    out_file.write(reinterpret_cast<const char*>(&rgb.B), 1);
                }
            } else {
                if constexpr (sizeof(PIXEL_TYPE) == 2) {
                    PIXEL_TYPE level = 0;
                    if (image[j][i] != 0)
                        level = static_cast<uint8_t>(color1.R +
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1) * (color2.R - color1.R + 1));
                    out_file.write(reinterpret_cast<const char*>(&level) + 1, 1);
                    out_file.write(reinterpret_cast<const char*>(&level), 1);
                } else {
                    PIXEL_TYPE level = 0;
                    if (image[j][i] != 0)
                        level = static_cast<uint8_t>(color1.R +
                            histogram[vector<int>::size_type(image[j][i] >> bucket_shift_bits)] /
                            static_cast<double>(totalCount + 1) * (color2.R - color1.R + 1));
                    out_file.write(reinterpret_cast<const char*>(&level), 1);
                }
            }
        }
    }
    out_file.close();
}

/**
* @brief Main function.
* 
* The entry point of the program.
*
* Syntax: mandelbrot_threaded_color real imaginary radius [--filename=basename]
*     [--image_size=n] [--max_iterations=n] [--high_color=#xxx] [--low_color=#xxx]
*
* real & imaginary specify the coordinates of the Mandelbrot Set to display at
* the center of the image.
*
* radius is the radius in Mandelbrot Set coordinates out to the extreme edges
* of the image.
*
* max_iterations is the maximum number of iterations per pixel allowed to
* determine if it is in the set.  The max_iterations will generally need to get
* larger as radius gets smaller.
*
* image_size is the size in pixels for the width and height of the square image.
*
* high_color and low_color are hexadecimal color values.
*
* filename is the basename of the image file.  It defaults to "image".
*
* @return int Returns 0 on successful execution.
*/
int main(int argc, char** argv) {
    CLI::App app{"Program to generate Mandelbrot Set images."};
    app.set_help_flag("", "");
    argv = app.ensure_utf8(argv);
    // use CLI11 to process command line options.
    vector<string> coords;
    app.add_option("coords", coords, "Location to plot (x y radius)")
        ->expected(3);

    // Optional arguments
    string filename = "image";
    int max_iteration = 10000;
    int image_size = 512;
    string high_color = "#FFF";
    string low_color = "#000";
    bool use_rainbow = false;
    bool show_help = false;

    app.add_flag("--help", show_help, "Show help");
    app.add_option("-f,--filename", filename, "Base filename (no extension)");
    app.add_option("-m,--max_iterations", max_iteration, "Max. iterations");
    app.add_option("-s,--image_size", image_size, "Image size in pixels");
    app.add_option("-h,--high_color", high_color, "High color hex code (#xxx or #xxxxxx)");
    app.add_option("-l,--low_color", low_color, "Low color hex code (#xxx or #xxxxxx)");
    app.add_flag("-r,--rainbow", use_rainbow, "Use rainbow colors");

    CLI11_PARSE(app, argc, argv);

    if (show_help) {
        cout << app.help() << endl;
        return 0;
    }

    // Get the three required coordinates to plot
    double x, y, radius;
    if (coords.size() == 3) {
        x = stod(coords[vector<string>::size_type(0)]);
        y = stod(coords[vector<string>::size_type(1)]);
        radius = stod(coords[vector<string>::size_type(2)]);
    } else {
        cout << "Arguments x, y and radius required but not found!" << endl;
        return 1;
    }

    // Check the image size
    if (image_size > sqrt(numeric_limits<int>::max())) {
        cout << "Image size can not exceed " << floor(sqrt(numeric_limits<int>::max())) << endl;
        return 1;
    }

    // Get the low and high colors
    RGB high_RGB = RGB(high_color);
    RGB low_RGB = RGB(low_color);
    if (high_RGB.isGray() && low_RGB.isGray()) USE_COLOR = false;
    if (use_rainbow) USE_COLOR = true;

    // Compute limits of image
    double x_min = x - radius;
    double x_max = x + radius;
    double y_min = y - radius;
    double y_max = y + radius;

    // Output settings
    cout << "Maximum Iterations: " << max_iteration << endl;
    cout << (USE_COLOR ? "Using color" : "Using grayscale") << endl;
    if (use_rainbow) {
        cout << "Rainbow colors" << endl;
    } else {
        cout << "High escape color: " << high_color << endl;
        cout << "Low escape color: " << low_color << endl;
    }
    cout << "Image Bounds: " << setprecision(numeric_limits<double>::digits10)
        << x_min << ", " << x_max << ", " << y_min << ", " << y_max << endl;
    cout << "Image Size: " << image_size << " x " << image_size << endl;
    
    // Allocate an array for the row pointers.
    image = make_unique<unique_ptr<int32_t[]>[]>(image_size);

    // Do a simple check to see if our doubles have enough precision
    if ((x_max - x_min) / image_size < ULP(x) ||
        (y_max - y_min) / image_size < ULP(y))
        cout << "Doubles may not have enough precision!" << endl;

    // Get the number of CPU cores.
    unsigned int num_cpus = thread::hardware_concurrency();
    if (num_cpus == 0) {
        cout << "Could not determine number of CPUs.  Using 4.\n";
        num_cpus = 4;
    } else {
        cout << "Number of CPU cores: " << num_cpus << endl;
    }

    // Create the threads
    vector<thread> threads;
    threads.reserve(num_cpus);
    for (int i = 0; i < num_cpus; ++i) {
        threads.emplace_back(compute_mandelbrot, x_min, x_max, y_min, y_max,
            max_iteration, image_size);
    }
    // Wait for all threads to complete
    for (auto& t : threads) {
        t.join();  
    }

    cout << endl;
    cout << "Number of escaped pixels: " << num_escaped << endl;
    cout << "Number of cycles detected: " << num_cycles << endl;
    cout << "Number of max iteration pixels: " << num_max_iter << endl;

    // Generate the output file
    output_file(filename, max_iteration, image_size, low_RGB, high_RGB, use_rainbow);
    return 0;
}

Here are some examples of colorized images.