Distributed Prime Number Computation

For computing lots of prime numbers, the Sieve of Eratosthenes is the best algorithm. Unfortunately the amount of memory increases linearly as the number of desired primes increases. For large number of primes this approach becomes infeasible. e.g. \(10^{12}\) or higher.

Instead we need to use a modified version of the Sieve of Eratosthenes call a Segmented Sieve that only works on smaller manageable chunks, computing each segment separately. Not only does this help us with memory considerations but also allows us to compute chunks in parallel on a compute cluster like a Raspberry Pi 5 Cluster.

The idea behind the Segmented Sieve is that you compute the primes in multiple smaller ranges \([L, R]\) that can fit into memory. So if we want to compute all the primes in a large range \([2, N]\) the steps are:

  1. Divide the range \([2, N]\) into smaller blocks, \([L, R]\), of size \(\sqrt{N}\) to search.
  2. Use Sieve Eratosthenes to find all primes up to  \(\sqrt{N}\). This is the solution for the first block to search.
  3. Allocate storage for prime/composite status on range \([L, R]\).
  4. Mark all multiples of primes 2 to \(\sqrt{R}\) found in step 2 on the range \([L, R]\).
  5. Store the primes found in the range \([L, R]\).
  6. Repeat steps 3 - 5 for each block until all blocks are processed.
segmented_sieve.cpp
/**
 * @file segmented_sieve.cpp
 * @brief Program to compute prime numbers in a range.
 * 
 * This program computes prime numbers using the Segmented Sieve 
 * of Eratosthenes method.  Requires a text file of seed primes that 
 * contains the primes 2 through at least sqrt(stop), one per line.
 * If output_file is not supplied, only the count of primes found in
 * the range is printed.
 *
 * Syntax: semented_sieve <prime_seed_file> <start> <stop> [output_file]
 * 
 * Compile with command: g++ --std=c++11 -O3 segmented_sieve.cpp -o segmented_sieve
 *
 * @author Richard Lesh
 * @date 2025-02-05
 * @version 1.0
 */

#include <algorithm>
#include <exception>
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

/**
 * @brief Function to compute primes using Segmented Sieve of Eratosthenes
 * 
 * Calculates all primes between start and stop.
 * 
 * @param prime_file Text file with seed primes, 2 - sqrt(stop)
 * @param start Start of prime search range.
 * @param stop End of prime search range.
 * @param output_file Pathspec of text file to store results.
 * @return size_t Number of primes found.
 */
 size_t compute_primes(string prime_file, long start, long stop, string output_file) {
    long original_start = start;
    if (start < 3) start = 3;
    if (start % 2 == 0) ++start;                        // Always start with an odd number
    if (stop % 2 == 1) ++stop;                          // Always end on an even number
    size_t sieveSize = (stop - start + 1);              // Number of candidates in range is even
    sieveSize >>= 1;                                    // We won't store even numbers in the sieve

    std::vector<bool> is_prime(sieveSize, true);        // Mark all candidates as prime

    std::ifstream file(prime_file);                     // Open the seed primes file
    
    if (!file) {
        throw runtime_error("Error: Could not open seed file " + prime_file);
    }
    
    std::string line;
    std::getline(file, line);                           // Skip the first prime 2
    while (std::getline(file, line)) { 
        long prime = atol(line.c_str());
        if (prime * prime > stop) break;                // Stop when the prime > sqrt(stop)
        long offset = start % prime;                    // Compute location of first multiple
        if (offset != 0) offset = prime - offset;       //  of prime greater than start
        long first = max(prime * prime, start + offset);
        for (size_t p = first; p < stop; p += prime) {
            if (p % 2 == 0) continue;                   // Its a multiple of 2 so skip
            size_t multiple = p - start;
            multiple = multiple >> 1;
            is_prime[multiple] = false;                 // Mark as composite
        }
    }
    if (file.eof())
        throw runtime_error("Error: Seed file doesn't have enough primes!");
    file.close();

    size_t count = 0;
    if (output_file.length() != 0) {                    // Save all primes if output file
        ofstream ofile(output_file);                    // Open the output file
        if (!ofile) {
            throw runtime_error("Error: Could not open output file " + output_file);
        }

        if (original_start <= 2) {                      // Output 2 if in range
            ofile << 2 << std::endl;
            ++count;
        }
        for (size_t i = 0; i < sieveSize; ++i) {        // Check all values in range
            if (is_prime[i]) {                          // Output if its prime
                ofile << ((i << 1) + start) << endl;
                ++count;
            }
        }
        ofile.close();
    } else {                                            // Otherwise just count primes
        if (original_start <= 2) {
            ++count;
        }
        for (size_t i = 0; i < sieveSize; ++i) {
            if (is_prime[i]) {
                ++count;
            }
        }
    }
    return count;
}

int main(int argc, char** argv) {
    if (argc != 4 && argc != 5) {
        cout << "Syntax: " << argv[0] << " <prime_seed_file> <start> <stop> [output_file]" << endl;
        exit(1);
    }
    string seed_file = argv[1];
    long start = stoul(argv[2]);
    long stop = stoul(argv[3]);
    string output_file = argc == 5 ? argv[4] : "";

    size_t primes = compute_primes(seed_file, start, stop, output_file);

    cout << primes << endl;

    return 0;
}

The next component we need to compute large number of prime numbers is an Open MPI program do distribute the work to a cluster.

mpi_sieve.cpp
/**
 * @file mpi_sieve.cpp
 * @brief Distributed program to compute prime numbers.
 * 
 * This program computes prime numbers using the Segmented Sieve 
 * of Eratosthenes method.  Requires a text file of seed primes that 
 * contains the primes, one per line, 2 through at least square root
 *  of maximum value to check.
 * If output_file is not supplied, only the count of primes found in
 * the range is printed.
 *
 * Syntax: mpi_primes <start_task_id> <task_count> <task_size> [file_prefix]
 * start_task_id    ID of the task to start with, minimum 1
 * task_count       Number of tasks to compute
 * task_size        Number of integers in the task to check.
 * file_prefix      Pathspec + file prefix to use for output files.  Task ID 
 *                      will be added to the file prefix.  If not supplied
 *                      the number of primes in the range will be computed
 *                      and printed without storing the primes found.
 *
 * The first number to check will be (start_task_id - 1) * task_size.
 * The last number to check will be (start_task_id + task_count - 1) * task_size.
 * 
 * Compile with command: mpic++ --std=c++20 -o mpi_sieve mpi_sieve.cpp
 *
 * @author Richard Lesh
 * @date 2025-02-05
 * @version 1.0
 */

#include <algorithm>
#include <array>
#include <cstring>
#include <iostream>
#include <filesystem>
#include <fstream>
#include <math.h>
#include <mpi.h>
#include <vector>
#include <map>
#include <unistd.h>

using namespace std;

const int MASTER = 0;
const int TAG_TASK = 1;
const int TAG_RESULT = 2;
const int TAG_TERMINATE = 3;

/**
 * @brief Function to get the host's name.
 * 
 * This function returns the hostname.
 * 
 * @return string Returns the hostname of the host if possble otherwise it returns
 *    the empty string.
 */
 string get_hostname() {
    char hostname[256]; // Buffer to store the hostname
    memset(hostname, 0, sizeof(hostname)); // Clear the buffer

    // Get the hostname
    if (gethostname(hostname, sizeof(hostname)) == 0) {
        return string(hostname);
    } else {
        std::cerr << "Error getting hostname" << std::endl;
        return string();
    }
}

/**
 * @brief Function to get execute a child task.
 * 
 * This function executes the command passed and returns the output of the command.
 * 
 * @param command The command to execute.
 * @return string Returns the output of the command.
 */
 std::string executeCommand(const std::string& command) {
    std::array<char, 128> buffer;
    std::string result;
    
    // Open a pipe to run the command
    FILE* pipe = popen(command.c_str(), "r");
    if (!pipe) {
        throw std::runtime_error("popen() failed!");
    }

    // Read the output from the child process
    while (fgets(buffer.data(), buffer.size(), pipe) != nullptr) {
        result += buffer.data();
    }

    // Close the pipe
    pclose(pipe);
    return result;
}

// This is the structure that contains all the data to send to the worker
struct Task {
	long task_id;
	long task_size;
};

/**
 * @brief Task processing function.
 * 
 * This function implements the processing for each worker node.  The worker
 * will find all the primes in the range (task.task_id - 1) * task.task_size
 * to (task.task_id + task.task_size - 1) * task.task_size.
 * 
 * It will skip the computation of primes if the output file already exists and
 * will return "Skipping <filename>..." in this case.
 *
 * Primes are found by calling the OpenMPI/segmented_sieve program and uses the 
 * file OpenMPI/seed_primes_3G.txt as the prime seed file.
 *
 * @param task Task structure identifying the range of integers to check for primality.
 * @param file_prefix The pathspec + file prefix to use when writing out the primes.
 * @return double Returns the result of the call to segmented_sieve which on
 *                success is the number of primes found.
 */
 string process_task(Task &task, string file_prefix) {
    string output_file = file_prefix + "_" + to_string(task.task_id) + ".txt";
    if (file_prefix.length() != 0 && filesystem::exists(output_file)) return "Skipping " + output_file + "...";

	long base = (task.task_id - 1) * task.task_size;
	string cmd = string("~/OpenMPI/segmented_sieve ~/OpenMPI/seed_primes_3G.txt ") + to_string(base) + " " + 
        to_string(base + task.task_size) + " " + (file_prefix.length() == 0 ? "" : output_file);
	string result = executeCommand(cmd);
	return result;
}

/**
 * @brief Main function.
 * 
 * The entry point of the program.  It handles forking between the master
 * and worker functionality.
 * 
 * @return int Returns 0 on successful execution.
 */
 int main(int argc, char **argv) {
    if (argc !=4 && argc != 5) {
        cout << "Syntax: mpi_primes <start_task_id> <task_count> <task_size> [file_prefix]\n";
        return 0;
    }

    // Parse command-line arguments
    string filename = argc == 5 ? argv[4] : "";
    const long task_start = stol(argv[1]) - 1;
    const long task_max = stol(argv[2]);
	const long task_size = stol(argv[3]);
	MPI_Init(&argc, &argv);

	int rank, size;
	MPI_Comm_rank(MPI_COMM_WORLD, &rank);
	MPI_Comm_size(MPI_COMM_WORLD, &size);

    if (rank == MASTER) {
        int num_workers = size - 1;
		long task_count = 0;
        int active_workers = 0;

        cout << "Master: Distributing tasks to " << num_workers << " workers..." << endl;

        // Distribute initial tasks to workers
        for (long i = 1; i <= min((long)num_workers, task_max); ++i) {
			++task_count;
            long task_id = task_count + task_start;
            MPI_Send(&task_id, 1, MPI_LONG, (int)i, TAG_TASK, MPI_COMM_WORLD);
			++active_workers;
        }
		// terminate unused workers
		if (num_workers > task_max) {
			for (long i = task_max + 1; i <= num_workers; ++i) {
				long dummy = 0;
				MPI_Send(&dummy, 0, MPI_LONG, (int)i, TAG_TERMINATE, MPI_COMM_WORLD);
			}
		}

        long count = 0;
        // Collect results and assign remaining tasks
        while (task_count < task_max || active_workers > 0) {
            MPI_Status status;
			long task_result;
            MPI_Recv(&task_result, 1, MPI_LONG, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
            count += task_result;

            // Send a new task to the worker, or terminate if no tasks are left
            if (task_count < task_max) {
				++task_count;
                long task_id = task_count + task_start;
				MPI_Send(&task_id, 1, MPI_LONG, status.MPI_SOURCE, TAG_TASK, MPI_COMM_WORLD);
            } else {
				long dummy = 0;
                MPI_Send(&dummy, 0, MPI_LONG, status.MPI_SOURCE, TAG_TERMINATE, MPI_COMM_WORLD);
                active_workers--;
            }
        }
		cout << count << endl;
    } else {
        // Worker process
        string hostname = get_hostname();
        while (true) {
            Task task;
			long task_id;
            MPI_Status status;

            // Receive a task or termination signal from the master
            MPI_Recv(&task_id, 1, MPI_LONG, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

            if (status.MPI_TAG == TAG_TERMINATE) {
                break;
            }

            // Process the task
			task.task_id = task_id;
			task.task_size = task_size;
            cout << "Worker " << rank << "(" << hostname << "): Accepted task " << task.task_id << endl;
			string result = process_task(task, filename);
            cout << "Worker " << rank << "(" << hostname << "): Result " << result;
            long result_long = 0;
            try {
                if (!result.starts_with("Skipping"))
                    result_long = stol(result);
            } catch (const std::invalid_argument& e) {
                std::cerr << "Worker result invalid argument: " << result << " -> " << e.what() << std::endl;
            } catch (const std::out_of_range& e) {
                std::cerr << "Worker result out-of-range: " << result << " -> " << e.what() << std::endl;
            }
            // Send the result back to the master
            MPI_Send(&result_long, 1, MPI_LONG, MASTER, TAG_RESULT, MPI_COMM_WORLD);
        }
	}
    MPI_Finalize();
	return 0;
}

Here is a sample run that computes the number of primes less than \(10^{13}\)...

rich@balrog1:~ $ OpenMPI/run.sh mpi_sieve 1 10000 1000000000
Master: Distributing tasks to 31 workers...
Worker 1(balrog1): Accepted task 1
Worker 2(balrog1): Accepted task 2
Worker 3(balrog1): Accepted task 3
Worker 4(balrog2): Accepted task 4
Worker 5(balrog2): Accepted task 5
Worker 6(balrog2): Accepted task 6
Worker 7(balrog2): Accepted task 7
...
Worker 26(balrog6): Result 33404233
Worker 25(balrog6): Result 33404623
Worker 20(balrog5): Result 33405656
Worker 27(balrog6): Result 33408101
Worker 23(balrog5): Result 33404306
346065536839