C++ Thread Pool

While launching threads is easy with the Thread class in the C++ standard library, it is handy to have a class that encapsulates launching threads, managing a work queue, and shutting down the threads when all the work is done. Because processing a queue of tasks is the most common use of threads, hiding all the mechanisms involved in a ThreadPool class give us lots of opportunities for reuse. The standard usage pattern for the ThreadPool class is illustrated below.

    // Lazy initialization of the thread pool.  Default allocates threads
    // equal to the number of cores on the system.
    std::shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool();
    // Change the name of the thread pool.
    pool->setPoolName("Main");
    // Create 100 tasks
    for (int id = 0; id < 100; ++id) {
        // We use enqueue() to add tasks to execute to the work queue for the
        // thread pool.  C++ lambda expressions are an easy way to define the
        // task and then pass it to enqueue().
        pool->enqueue([pool, id] {
            // First thing in the task is to set the thread name.  This can
            // only be done in the running thread.
            pool->setThreadName(to_string(id));
            // Do thread work here...
        });
    }
    // Shutdown the thread pool which will wait till the work queue is emptied
    // and all worker threads quit.
    pool->shutdown();

Example 1

Below in example1.cpp is the complete program that illustrates this usage pattern. We use nanosleep() to simulate an arbitrary work load for the tasks. We also use ThreadPool::safePrint() to safely print output to the console. ThreadPool::safePrint() uses a mutex to make sure threads don't interrupt the output of other threads. Also note that the task to enqueue() is created using a lambda expression. See the post on lambda expressions if you need more info on their use.

example1.cpp
/**
* @file example1.cpp
* @brief Program to demonstrate the ThreadPool class.
*
* This program exercises the ThreadPool class with tasks that just sleep for a
* few seconds.
*
* Compile with<br>
* Linux<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example1 src/example1.cpp src/ThreadPool.cpp

* macOS<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example1 src/example1.cpp src/ThreadPool.cpp
*
* Execute:
* ./example1
*
* @author Richard Lesh
* @date 2025-06-11
* @version 1.0
*/

#include <iostream>

#include "ThreadPool.hpp"

using namespace std;

int main(int argc, char **argv) {
    cout << "ThreadPool Example 1" << endl;
    // Lazy initialization of the thread pool.  Default allocates threads
    // equal to the number of cores on the system.
    std::shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool();
    // Change the name of the thread pool.
    pool->setPoolName("Main");
    // Create 100 tasks
    for (int id = 0; id < 100; ++id) {
        // We use enqueue() to add tasks to execute to the work queue for the
        // thread pool.  C++ lambda expressions are an easy way to define the
        // task and then pass it to enqueue().
        pool->enqueue([pool, id] {
            // First thing to do in the task is to set the thread name. This can
            // only be done in the running thread.
            SET_THREAD_NAME(pool, to_string(id));
            // Get the description of the thread (pool name : thread name) and
            // print that we are starting the execution of the task.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " starting");
            // Create a timespec struct for the requested time to wait in the
            // thread.  We will wait 0.5 to 9.5 seconds based on the thread id.
            struct timespec req = { .tv_sec = id % 10, .tv_nsec = 500000000 };
            // Timespec struct for the remaining time if we get interrupted.
            struct timespec rem;
            // Use nanosleep() to simulate a workload.
            if (nanosleep(&req, &rem) == -1) {
                ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " interrupted!");
            }
            // Print that the thread is done.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " done");
        });
    }
    // Shutdown the thread pool which will wait till the work queue is emptied
    // and all worker threads quit.
    pool->shutdown();
}
% ./example1
ThreadPool Example 1
Thread Main:0 starting
Thread Main:1 starting
Thread Main:2 starting
Thread Main:3 starting
Thread Main:7 starting
Thread Main:6 starting
Thread Main:4 starting
Thread Main:8 starting
Thread Main:5 starting
Thread Main:9 starting
Thread Main:10 starting
Thread Main:11 starting
Thread Main:10 done
Thread Main:12 starting
Thread Main:0 done
Thread Main:13 starting
Thread Main:1 done
Thread Main:11 done
Thread Main:14 starting
Thread Main:15 starting
Thread Main:2 done
...
Thread Main:98 starting
Thread Main:78 done
Thread Main:99 starting
Thread Main:93 done
Thread Main:79 done
Thread Main:86 done
Thread Main:87 done
Thread Main:94 done
Thread Main:95 done
Thread Main:88 done
Thread Main:96 done
Thread Main:89 done
Thread Main:97 done
Thread Main:98 done
Thread Main:99 done

Example 2

In example2 we develop a more useful example. This example computes an image of the complete Mandelbrot Set. It uses the same ThreadPool pattern but adds the job of computing one line of the image to each task.

example2.cpp
/**
* @file example2.cpp
* @brief Program to demonstrate the ThreadPool class.  Creates an image of the
* Mandelbrot set.
*
* This program exercises the ThreadPool class with tasks that generate portions
* of the Mandelbrot Set.
*
* Compile with<br>
* Linux<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example2 src/example2.cpp src/ThreadPool.cpp

* macOS<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example2 src/example2.cpp src/ThreadPool.cpp
*
* Execute:
* ./example2
*
* @author Richard Lesh
* @date 2025-06-11
* @version 1.0
*/

#include <complex>
#include <fstream>
#include <iostream>
#include <mutex>

#include "ThreadPool.hpp"

using namespace std;

// Coordinates and width to generate an image of the whole Mandelbrot Set
double x = -0.75;
double y = 0.0;
double w = 1.5;
double x_min = x - w;
double x_max = x + w;
double y_min = y - w;
double y_max = y + w;
// Don't need many iterations at this scale.
int max_iteration = 75;
// Create a really big image.
int image_size = 10000;

#define MAX_PIXEL_LEVELS 256

// Array used to store the image as an array of arrays.
// Each element of the array will be a pointer to an array containing the
// pixels for a row of the image.  Each thread will allocate the row array
// that it will then compute.

unique_ptr<unique_ptr<int32_t[]>[]> image = make_unique<unique_ptr<int32_t[]>[]>(image_size);

// Need a mutex to control access to the master image array
mutex img_mutex;

/**
* @brief Computes the squared magnitude (norm) of a complex number without
* taking the square root.
*
* This function returns the sum of the squares of the real and imaginary parts
* of the complex number @p z.  It avoids the overhead of computing a square
* root, making it useful for performance-sensitive contexts like Mandelbrot
* set calculations where only relative magnitude comparisons are needed.
*
* @tparam T The numeric type (e.g., float, double, mpreal).
* @param z The complex number whose squared magnitude is to be computed.
* @return The squared magnitude of @p z, equivalent to |z|^2.
*
* @see std::norm may be used as a standard alternative, but may perform
* additional checks.
*/

template<typename T>
inline T norm_squared(complex<T> z) {
    return z.real() * z.real() + z.imag() * z.imag();
}

/**
* @brief Compute the number of iterations to escape.
*
* Function computes the number of iterations of the formula
* z(n+1) = z(n)^2 + c that can be executed before the value of |z|
* exceeds the escape limit radius of 2.0.
*
* @param c Complex point used to perform the calculation.
* @param max_iteration Number of iterations, if exceeded, when we will declare
*                      as not escaping and therefore in the Mandelbrot set.
* @return Value [0, max_iteration) representing the escape iterations.
*                      0 is considered as not escaping.
*/

int compute_escape(complex<double> c, int max_iteration) {
    complex<double> z{0., 0.};

    int i = 0;
    while (i < max_iteration && norm_squared(z) < 4.0) {
        z = z * z + c;
        ++i;
    }

    if (i == max_iteration) {   // Return 0 (black) if we are in the set
        return 0L;
    } else {
        return i;
    }
}

/**
* @brief Compute a portion of the Mandelbrot Set.
*
* Function computes a portion of the Mandelbrot Set row by row.
* The row is stored in the image array.
*
* @param row Row to compute [0, image_size)
* @param x_min Minimum real axis bound to plot.
* @param x_max Maximum real axis bound to plot.
* @param y_min Minimum imaginary axis bound to plot.
* @param y_max Maximum imaginary axis bound to plot.
* @param max_iteration Limit to the number of iterations we will compute per point.
* @param image_size Size, in pixels, of the square image to generate.
*/

void compute_mandelbrot(int row, double x_min, double x_max, double y_min, double y_max, int max_iteration, int image_size) {
    double width_per_pixel = (x_max - x_min) / image_size;
    double height_per_pixel = (y_max - y_min) / image_size;
    int modulus = ceil(image_size / 40.);     // Used to control progress output
    if (row >= 0 && row < image_size) {
        // Get a lock on the img_mutex so we can add a new row array for the
        // data that we will compute.  The mutex lock is released when the
        // variable lock goes out of scope.
        {
            lock_guard<mutex> lock(img_mutex);
            image[row] = make_unique<int32_t[]>(image_size);
        }
        // Calculate the imaginary coordinate of the middle of the pixel row
        double y0 = y_max - height_per_pixel * (row + 0.5);
        for (int i = 0; i < image_size; ++i) {
            // Calculate the real coordinate of the middle of the pixel
            double x0 = width_per_pixel * (i + 0.5) + x_min;
            // Compute the escape number of iterations [0, max_iteration)
            int level = compute_escape(complex(x0, y0), max_iteration);
            // Store the escape iterations in the image array.
            image[row][i] = level;
        }
        // Output periodic progress
        if (row % modulus == 0) cout << '.' << flush;
    }
}

/**
* @brief Write out a PGM file.
*
* Function writes out image data to a PGM 8-bit format file.
*
* @param filename Pathspec to the file that will be created.
* @param max_iteration Limit to the number of iterations we used to compute each point.
* @param image_size Size, in pixels, of the image to save.
*/

void output_file(const string &filename, int max_iteration, int image_size) {
    string ext = ".pgm";

    cout << endl << "Creating " << (filename + ext) << "..." << endl;
    ofstream out_file((filename + ext).c_str(), ios::binary);
    if (!out_file) {                    // Check if file opened successfully
        cerr << "Error opening file " << (filename + ext) << " for writing!" << endl;
        exit(1);
    }
    // File header is P5 for binary gray map, x_size y_size, max pixel value
    out_file << "P5\n" << image_size << " " << image_size << "\n" << (MAX_PIXEL_LEVELS - 1) << "\n";

    for (int j = 0; j < image_size; ++j) {
        for (int i = 0; i < image_size; ++i) {
            const uint8_t level = floor(image[j][i] / static_cast<double>(max_iteration) * MAX_PIXEL_LEVELS);
            out_file.write(reinterpret_cast<const char*>(&level), 1);
        }
    }
    out_file.close();
}

int main(int argc, char **argv) {
    cout << "ThreadPool Example 1" << endl;
    // Lazy initialization of the thread pool.  Default allocates threads
    // equal to the number of cores on the system.
    std::shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool();
    // Change the name of the thread pool.
    pool->setPoolName("Mandelbrot");
    // Create tasks for each row's computation
    for (int row = 0; row < image_size; ++row) {
        // We use enqueue() to add tasks to execute to the work queue for the
        // thread pool.  C++ lambda expressions are an easy way to define the
        // task and then pass it to enqueue().
        pool->enqueue([=] {
            // First thing to do in the task is to set the thread name. This can
            // only be done in the running thread.
            SET_THREAD_NAME(pool, to_string(row));
            // Get the description of the thread (pool name : thread name) and
            // print that we are starting the execution of the task.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " starting");
            // Compute one row in the image.
            compute_mandelbrot(row, x_min, x_max, y_min, y_max, max_iteration, image_size);
            // Print that the thread is done.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " done");
        });
    }
    // Shutdown the thread pool which will wait till the work queue is emptied
    // and all worker threads quit.
    pool->shutdown();
    // Save the image.
    output_file("Mandelbrot", max_iteration, image_size);
}
Mandelbrot

Example 3

In example3.cpp we explore how to set up multiple thread pools as needed. This is easily done using the ThreadPool constructor which takes the number of threads you want to have in the pool. In this example, each task in the main pool creates a sub-pool with four threads and assigns work tasks to the sub-pool.

example3.cpp
/**
* @file example3.cpp
* @brief Program to demonstrate multiple ThreadPool instances.
*
* This program exercises multiple ThreadPool instances with tasks that just
* sleep for a few seconds.
*
* Compile with<br>
* Linux<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example3 src/example3.cpp src/ThreadPool.cpp

* macOS<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example3 src/example3.cpp src/ThreadPool.cpp
*
* Execute:
* ./example1
*
* @author Richard Lesh
* @date 2025-06-11
* @version 1.0
*/

#include <iostream>
#include <string>

#include "ThreadPool.hpp"

using namespace std;

int main(int argc, char **argv) {
    cout << "ThreadPool Example 3" << endl;
    // Lazy initialization of the main thread pool.  Set for 4 threads.
    std::shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool(4);
    // Change the name of the thread pool.
    pool->setPoolName("Main");
    // Create 100 tasks
    for (int id = 0; id < 16; ++id) {
        // We use enqueue() to add tasks to execute to the work queue for the
        // thread pool.  C++ lambda expressions are an easy way to define the
        // task and then pass it to enqueue().
        pool->enqueue([pool, id] {
            // First thing in the task is to set the thread name.  This can
            // only be done in the running thread.
            SET_THREAD_NAME(pool, to_string(id));
            // Get the description of the thread (pool name : thread name) and
            // print that we are starting the execution of the task.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " starting");
            // Create another pool of four threads to do the work of this thread.
            // See this pool name to the thread description of the parent thread.
            ThreadPool subpool(ThreadPool::getThreadDescription(), 4);
            for (char c = 'a'; c <= 'z'; ++c) {
                subpool.enqueue([&subpool, c] {
                    SET_THREAD_NAME(pool, string{c});
                    ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " starting");
                    // Create a timespec struct for the requested time to wait in the
                    // thread.  We will wait 0.5 to 3.5 seconds based on the thread name.
                    struct timespec req = { .tv_sec = c % 4, .tv_nsec = 500000000 };
                    // Timespec struct for the remaining time if we get interrupted.
                    struct timespec rem;
                    // Use nanosleep() to simulate a workload.
                    if (nanosleep(&req, &rem) == -1) {
                        ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " interrupted!");
                    }
                    // Print that the thread is done.
                    ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " done");
                });
            }
            // Shutdown the subpool.
            subpool.shutdown();
            // Print that the thread is done.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " done");
        });
    }
    // Shutdown the thread pool which will wait till the work queue is emptied
    // and all worker threads quit.
    pool->shutdown();
}
% ./example3
ThreadPool Example 3
Thread Main:0 starting
Thread Main:1 starting
Thread Main:2 starting
Thread Main:3 starting
Thread Main:0:d starting
Thread Main:0:h starting
Thread Main:0:a starting
Thread Main:1:a starting
Thread Main:0:b starting
Thread Main:1:d starting
Thread Main:3:a starting
Thread Main:1:h starting
Thread Main:3:c starting
Thread Main:2:h starting
Thread Main:2:b starting
Thread Main:2:a starting
Thread Main:2:d starting
Thread Main:1:b starting
Thread Main:3:b starting
Thread Main:3:d starting
Thread Main:1:h done
Thread Main:1:p starting
Thread Main:0:d done
Thread Main:1:d done
Thread Main:0:p starting
Thread Main:1:z starting
Thread Main:0:h done
...
Thread Main:7:k done
Thread Main:7:j done
Thread Main:5:e done
Thread Main:5:k done
Thread Main:9:g done
Thread Main:9 done
Thread Main:8:f done
Thread Main:8 done
Thread Main:5:g done
Thread Main:5 done
Thread Main:7:g done
Thread Main:7 done

Example 4

In example 4.cpp we look at the ability of the ThreadPool class to set task priorities. Priorities are positive integers, where higher integers represent tasks that have higher priority. Higher priority tasks will be executed before lower priority tasks. When you enqueue the tasks there is a second argument for the task priority. In this example, the tasks are enqueued using ids of 1, 2, 3, etc. The id is also used for the priority so tasks enqueued later will have priority and execute sooner.

example4.cpp
/**
* @file example4.cpp
* @brief Program to demonstrate the ThreadPool class.
*
* This program exercises the ThreadPool class with tasks that just sleep for a
* few seconds.  It sets threads to different priorities to demonstratte
* the priority queue feature of the ThreadPool.
*
* Compile with<br>
* Linux<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example4 src/example4.cpp src/ThreadPool.cpp

* macOS<br>
* CPPFLAGS=-Iinclude
* g++ $CPPFLAGS --std=gnu++20 -O3 -o example4 src/example4.cpp src/ThreadPool.cpp
*
* Execute:
* ./example1
*
* @author Richard Lesh
* @date 2025-06-28
* @version 1.0
*/

#include <iostream>

#include "ThreadPool.hpp"

using namespace std;

int main(int argc, char **argv) {
    cout << "ThreadPool Example 4" << endl;
    // Lazy initialization of the thread pool.  Default allocates threads
    // equal to the number of cores on the system.
    std::shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool();
    // Change the name of the thread pool.
    pool->setPoolName("Main");
    // Create 100 tasks
    for (int id = 0; id < 100; ++id) {
        // We use enqueue() to add tasks to execute to the work queue for the
        // thread pool.  C++ lambda expressions are an easy way to define the
        // task and then pass it to enqueue().
        // Use the id as the priority level so that later tasks have higher
        // priority than earlier tasks.
        pool->enqueue([pool, id] {
            // First thing in the task is to set the thread name.  This can
            // only be done in the running thread.
            SET_THREAD_NAME(pool, to_string(id));
            // Get the description of the thread (pool name : thread name) and
            // print that we are starting the execution of the task.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " starting");
            // Create a timespec struct for the requested time to wait in the
            // thread.  We will wait 0.5 to 9.5 seconds based on the thread id.
            struct timespec req = { .tv_sec = id / 10, .tv_nsec = 500000000 };
            // Timespec struct for the remaining time if we get interrupted.
            struct timespec rem;
            // Use nanosleep() to simulate a workload.
            if (nanosleep(&req, &rem) == -1) {
                ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " interrupted!");
            }
            // Print that the thread is done.
            ThreadPool::safePrint("Thread " + ThreadPool::getThreadDescription() + " done");
        }, id);
    }
    // Shutdown the thread pool which will wait till the work queue is emptied
    // and all worker threads quit.
    pool->shutdown();
}
% ./example4
ThreadPool Example 4
Thread Main:3 starting
Thread Main:1 starting
Thread Main:0 starting
Thread Main:2 starting
Thread Main:8 starting
Thread Main:7 starting
Thread Main:6 starting
Thread Main:5 starting
Thread Main:4 starting
Thread Main:18 starting
Thread Main:34 starting
Thread Main:69 starting
Thread Main:3 done
Thread Main:4 done
Thread Main:99 starting
Thread Main:7 done
Thread Main:98 starting
Thread Main:97 starting
Thread Main:1 done
Thread Main:0 done
Thread Main:6 done
Thread Main:5 done
Thread Main:8 done
Thread Main:2 done
Thread Main:95 starting
Thread Main:96 starting
Thread Main:92 starting
Thread Main:91 starting
...
Thread Main:11 starting
Thread Main:10 starting
Thread Main:17 done
Thread Main:9 starting
Thread Main:25 done
Thread Main:15 done
Thread Main:24 done
Thread Main:23 done
Thread Main:22 done
Thread Main:9 done
Thread Main:21 done
Thread Main:20 done
Thread Main:13 done
Thread Main:10 done
Thread Main:12 done
Thread Main:11 done
ThreadPool.hpp
/**
* @class ThreadPool
* @brief Class to manage a pool of worker threads.
*
* This class is used to create a pool of worker threads and manage them.
* Tasks are assigned to a common queue that the workers process.
* 
* @author Richard Lesh
* @date 2025-06-10
* @version 1.0
*/

#ifndef THREAD_POOL_HPP
#define THREAD_POOL_HPP
#pragma once

#include <iostream>
#include <memory>
#include <mutex>
#include <queue>
#include <string>
#include <thread>

#ifdef NDEBUG
#define SET_THREAD_NAME(pool, name)
#else
#define SET_THREAD_NAME(pool, name) pool->setThreadName(name)
#endif

class ThreadPool {
public:
  /**
  * @brief Constructor to create the memory pool and start threads.
  *
  * @param numThreads The number of threads to create in the pool.
  *                   Minimum is 4.  Will use the number of processors
  *                   in the default case.
  */
  ThreadPool(std::string name, unsigned int numThreads = 0);

  /**
  * @brief ThreadPool destructor
  *
  * Calls shutdown() on the thread pool if not already shutdown.
  */
  ~ThreadPool() { if (!stop) shutdown(); }

/**
* @brief Gets the main singleton ThreadPool.
*
* Lazy initialization of the thread pool.  Returns the singleton instance.
* First call to getThreadPool should specify the number of threads or use the
* default.  Argument is not used in subsequent calls.
* 
* @param numThreads The number of threads to create in the pool.
*                   Minimum is 4.  Will use the number of processors
*                   in the default case.
*/
  static std::shared_ptr<ThreadPool> getThreadPool(unsigned int numThreads = 0) {
    static std::shared_ptr<ThreadPool> primary;
    static std::once_flag initFlag;

    std::call_once(initFlag, [numThreads]() {
        primary = std::make_shared<ThreadPool>("Primary", numThreads);
    });

    return primary;
  }

  /**
  * @brief Add a new task to the pool work queue.
  *
  * Enqueues a task to be processed by the thread pool.  Takes an
  * std::function<void()> argument.  Is typically going to be a lambda function.
  * This method is thread-safe.
  *
  * @param f Function argument containing task code.
  */
  void enqueue(std::function<void()> f, int priority = 0) {
    {
      std::unique_lock<std::mutex> lock(queueMutex);
      tasks.push(Task{priority, f});
    }
    condition.notify_one();
  }

  /**
  * @brief Sets the name of the pool.
  *
  * Name of the pool should be set right after creation of the pool.

  * @param name The string to use as the pool name.
  */
  void setPoolName(const std::string& name) { poolName = name; }

  /**
  * @brief Sets the name of the pool.
  *
  * Name of the thread can only be set in the task function.  The thread
  * description will be set as a concatenation of the pool name and
  * the thread name.
  *
  * For exaple:
  * ThreadPool::getThreadPool()->setThreadName(threadName)

  * @param name The string to use as the thread name.
  */
  void setThreadName(const std::string& name) {
    ThreadPool::setThreadDescription(poolName + ":" + name);
  }

  /**
  * @ brief Sets the current thread description.
  *
  * This can be called in the task function but for consistency use
  * setThreadName() instead.  This is the function that sets the name of the
  * thread at the OS level.
  *
  * @param desc The string to use as the thread description.
  */
  static void setThreadDescription(const std::string & desc);

  /**
  * @brief Gets the current thread description.
  *
  * Retrieves the name of the thread from the OS.
  */
  static std::string getThreadDescription();

  /**
  * @brief Terminates the thread pool.
  *
  * This method sends a shutdown request to all worker threads.
  * The pool will shutdown when the task queue is empty and all
  * worker threads have quit.
  */
  void shutdown() {
    stop = true;
    condition.notify_all();
    for (auto& worker : workers) {
      worker.join();
    }
  }

  /**
  * @brief Thread-safe print to console method.
  *
  * Prints the message to cout in a thread-safe manner.
  *
  * @param msg The message to print.
  */
  static void safePrint(const std::string& msg) {
    static std::mutex mtx;
    std::lock_guard<std::mutex> lock(mtx);
    std::cout << msg << std::endl;
  }

private:
  struct Task {
    int priority;
    std::function<void()> func;

    Task() : priority(0), func([] {}) {}
    // Constructor for convenience
    Task(int p, std::function<void()> f) : priority(p), func(f) {}
  };

  // Custom comparator for the priority queue
  struct CompareTask {
    bool operator()(const Task& t1, const Task& t2) {
      return t1.priority < t2.priority;
    }
  };

  static std::shared_ptr<ThreadPool> primary;       // singleton thread pool
  std::string poolName = "";                        // thread pool name default is blank

  // Worker threads
  std::vector<std::thread> workers;                 // vector of worker threads in the pool

  // Task queue
  std::priority_queue<Task, std::vector<Task>, CompareTask> tasks; // queue of tasks to complete

  // Synchronization
  std::mutex queueMutex;                            // mutex for coordination changes to queue
  std::condition_variable condition;                // condition variable for sending notifications
  bool stop;                                        // Signals shutdown in progress if true
};
#endif
ThreadPool.cpp
#include "../include/ThreadPool.hpp"

std::shared_ptr<ThreadPool> ThreadPool::primary = nullptr;

#ifdef _WIN32
#include <windows.h>

std::string ConvertPWSTRtoUTF8(PWSTR pwstr) {
  if (!pwstr) return {};

  int size_needed = WideCharToMultiByte(
      CP_UTF8,              // Convert to UTF-8
      0,                    // No special flags
      pwstr,                // Source wide string
      -1,                   // Null-terminated
      nullptr,              // No output buffer yet
      0,                    // Request buffer size
      nullptr, nullptr      // Default chars
  );

  if (size_needed <= 0) return {};

  std::string result(size_needed, 0);
  WideCharToMultiByte(CP_UTF8, 0, pwstr, -1, &result[0], size_needed, nullptr, nullptr);

  // Remove trailing null character added by WideCharToMultiByte
  result.pop_back();
  return result;
}

PWSTR ConvertUTF8ToPWSTR(const std::string& utf8)
{
  if (utf8.empty()) return nullptr;

  int wide_len = MultiByteToWideChar(
      CP_UTF8, 0,
      utf8.c_str(), -1,  // null-terminated input
      nullptr, 0
  );

  if (wide_len == 0) return nullptr;

  PWSTR pwstr = static_cast<PWSTR>(LocalAlloc(LMEM_FIXED, wide_len * sizeof(wchar_t)));
  if (!pwstr) return nullptr;

  int result = MultiByteToWideChar(
      CP_UTF8, 0,
      utf8.c_str(), -1,
      pwstr, wide_len
  );

  if (result == 0) {
    LocalFree(pwstr);
    return nullptr;
  }

  return pwstr;
}
#endif

ThreadPool::ThreadPool(std::string name, unsigned int numThreads) : poolName(name), stop(false) {
  if (numThreads == 0) {
    // Get the number of available processors
    auto hc = std::thread::hardware_concurrency();
    numThreads = hc ? hc : 4;
  }
  for (size_t threadId = 0; threadId < numThreads; ++threadId) {
    auto workerTask = ([this, threadId] {
      SET_THREAD_NAME(this, std::to_string(threadId) + "(Starting)");
      while (true) {
        ThreadPool::Task task;
        {
          std::unique_lock<std::mutex> lock(this->queueMutex);
          this->condition.wait(lock, [this] { return this->stop || !this->tasks.empty(); });
          if (this->stop && this->tasks.empty()) return;
          if (this->tasks.empty()) {
            lock.unlock();
            continue;
          }
          task = this->tasks.top();
          this->tasks.pop();
        }
        auto start = std::chrono::high_resolution_clock::now();
        task.func();
        auto end = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double> duration = end - start;
        SET_THREAD_NAME(this, std::to_string(threadId) + "(Parked)");
      }
    });
    workers.emplace_back(workerTask);
  }
}

void ThreadPool::setThreadDescription(const std::string& s) {
#ifdef _WIN32
  PWSTR pcwstr = ConvertUTF8ToPWSTR(s);
  if (pcwstr) {
    SetThreadDescription(GetCurrentThread(), pcwstr);
    LocalFree(pcwstr);
  }

#elif defined(__APPLE__) || defined(__linux__)
  // pthread_setname_np limits: Linux (16 chars), macOS (64 chars)
  constexpr size_t MAX_NAME_LEN =
  #ifdef __APPLE__
      63;
#else
      15;
#endif

  std::string truncated = s.substr(0, MAX_NAME_LEN);

#ifdef __APPLE__
  // macOS requires calling pthread_setname_np from within the thread
  pthread_setname_np(truncated.c_str());
#else
  pthread_setname_np(pthread_self(), truncated.c_str());
#endif
#endif
}

std::string ThreadPool::getThreadDescription() {
  std::string s;

#ifdef _WIN32
  PWSTR pwstr = nullptr;
  if (SUCCEEDED(GetThreadDescription(GetCurrentThread(), &pwstr)) && pwstr) {
    s = ConvertPWSTRtoUTF8(pwstr);
    LocalFree(pwstr);
  }
#elif defined(__APPLE__) || defined(__linux__)
  char name_buf[64] = {0};
  int err = pthread_getname_np(pthread_self(), name_buf, sizeof(name_buf));
  if (err == 0) {
    s = std::string(name_buf);
  }
#endif

  return s;
}

See the post on the OutputCollator class to help organize output from multiple threads.