Computing Prime Numbers

The fastest way to compute prime numbers between 2 and \(n\) where \(n < 10^{18}\), is to use an algorithm called the Sieve of Eratosthenes. Basically the way this algorithm works is to start by assuming all numbers in a range, say 2 to 20 are prime. If we list those numbers we have:

Then we start with the first number 2. It is the smallest number that doesn't have any prime factors and we will see can be the only even prime number. Circle it to identify it as a prime number.

Then proceed to cross off all the multiples of 2 starting with \(2^2\). Note that by definition an even number is evenly divisible by 2 meaning all even numbers larger than 2 are composite with a factor of 2. We just eliminated half the numbers as being composite!

Next find the lowest number greater than 2 that hasn't been crossed off. This will be the next prime number found. In our example this will be 3 so circle it.

Again cross off the multiples of 3 starting with \(3^2\), i.e. 9, 12, 15 and 18. Half of these will already be crossed off because they are also multiples of our first prime number 2, i.e. even numbers.

Now we scan up from 3 looking for the next prime number. We skip 4 because it is crossed off as a composite number, then we land on 5 as the next prime so we circle it.

We now cross off multiples of 5 starting with \(5^2\) which is 25. Since 25 is larger than the maximum number in the sieve we can stop looking for composites. All the remaining un-crossed numbers are prime so we circle them.

With this algorithm in our toolbox we can write a C++ program that can compute the prime numbers between 2 and any maximum value that we desire.

A couple interesting things about this program:

  1. The command line argument that we supply to the program is the number of primes, \(n\) that we want to find. To do this we need an estimate of the size of the sieve that will be needed to find that many primes (plus a few more). In other words we need an estimate of the \(nth\) prime number \(p_n\). A good estimate function for large \(n\) is provided by Gourdon's formula.
    \[p_n \approx n (\ln n + \ln \ln n - 1)\]
  2. We only need a list of booleans to represent the list of numbers that will be marked as primes (true) or composites (false). The index into the list will serve as the value that is either prime or composite. C++ has a container type vector<bool> that is ideal for this situation. Not only is it a well designed STL (Standard Template Library) class but it is specifically optimized for the bool case so that it stores the bools in individual bits of the vector instead of char or int types. This makes it very efficient in terms of memory needed to store the vector.
  3. The actual computation of the primes using the sieve technique is actually extremely fast. Printing out the list of primes found takes much more time.
  4. The number of primes you can find is limited by the amount of memory available on your computer.
  5. The computational complexity of this computation is \(O(n \log \log n)\). For sieves up to \(10^{21}\) (1 sextillion) the \(\log \log n\) factor is less than 4 which is negligible compared to the other factor \(n\) for large \(n\) making the complexity very close to \(O(n)\) or linear. So basically computing 10x more primes is going to take 10x times longer or 10x more CPUs. The following graph illustrates this relationship of compute time to the number of primes computed.
Computation Time

The best curve fit is a power function represented in the top left corner of the graph. As you can see the exponent on the x-axis \(n\) is nearly 1.0 making the function very close to a line.

To compute really large numbers of primes, you need a compute cluster and lots of time. I built a small Raspberry Pi 5 Compute Cluster to try out cluster computing techniques. See my Prime Number Project for the results.

sieve.cpp
/**
 * @file sieve.cpp
 * @brief Program to compute prime numbers.
 * 
 * This program computes prime numbers using the Sieve of Eratosthenes method.
 * 
 * Compile with command: g++ --std=c++11 -O3 sieve.cpp -o sieve
 *
 * @author Richard Lesh
 * @date 2025-02-05
 * @version 1.0
 */

#include <chrono>
#include <cmath>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <vector>

using namespace std;

/**
 * @brief Function to return an upper estimate of the nth prime.
 * 
 * Function returns an upper bound estimate of the nth prime.
 * 
 * @param n Which prime to estimate.
 * @return A value that is larger than the nth prime.
 */

 long estimate_nth_prime(int n) {
    if (n < 1) throw invalid_argument("estimate_nth_prime() argument less than 1");
    if (n <= 27) {                                                      // Small n correction
        return 103;
    }
    double estimate = n * (std::log(n) + std::log(std::log(n)) - 1);    // Gourdon's formula
    estimate *= 1.1;                        // Gourdon's formula underestimates so add some padding
    return static_cast<long>(std::round(estimate));
}

/**
 * @brief Function to compute prime numbers.
 * 
 * Function to compute primes using Sieve of Eratosthenes.
 * 
 * @param limit The number of primes to find.
 * @return vector<long> Returns a list of primes found.
 */
vector<long> compute_primes(size_t limit) {
    size_t sieve_size = estimate_nth_prime(limit) + 1;      // Estimate upper bound for nth prime
    vector<bool> is_prime(sieve_size, true);                // bool vector to hold prime status
    vector<long> primes;                                    // result list of primes
    is_prime[0] = is_prime[1] = false;                      // 0 and 1 are not prime by definition

    size_t max = sqrt(sieve_size);                          // Only need to look at primes < sqrt(sieve_size)
    for (long p = 2; p <= max; ++p) {                       // Start candidate prime with first prime 2 
        if (is_prime[p]) {                                  // if the candidate is not prime we skip
            primes.push_back(p);                            // add the candidate to result list
            // Now eliminate all multiples of the prime found as composite.
            // We start with the square of the prime as all composites less than p * p
            // with factor p would already been found because other factor would be < p.
            for (long multiple = p * p; multiple < sieve_size; multiple += p) {
                is_prime[multiple] = false;
            }
        }
    }

    for (long p = max + 1; p < sieve_size; ++p) {           // find the remaining primes
        if (is_prime[p]) {                                  // if the candidate is not prime we skip
            primes.push_back(p);                            // add the candidate to result list
            if (primes.size() >= limit) break;              // end if we have found the required number of primes
        }
    }
    return primes;
}

int main(int argc, char** argv) {
    size_t num_primes = atol(argv[1]);
    auto start = chrono::high_resolution_clock::now();
    vector<long> primes = compute_primes(num_primes);
    auto stop = chrono::high_resolution_clock::now();
    // Compute the duration in seconds
    auto duration = chrono::duration_cast<chrono::nanoseconds>(stop - start).count() / 1.e9;
    for (long p : primes) {
        cout << p << endl;
    }

    cout << "Computed " << primes.size() << " primes in " << setprecision(6) << duration << " seconds.\n";
    return 0;
}