Image credit: Cormullion, Julia code here.
π (Pi) is the mathematical constant that represents the ratio of a circle’s circumference to its diameter.
\[\pi = \frac{\text{circumference}}{\text{diameter}}\]
Its approximate value is: \(\pi \approx 3.14159265358979323846\ldots\)
Properties of π
- Irrational: π cannot be expressed as a fraction of two integers.
- Transcendental: π is not the root of any polynomial equation with rational coefficients.
- Infinite Digits: π never terminates or repeats in decimal form.
- Ubiquitous in Mathematics: Found in geometry, trigonometry, calculus, physics, and statistics.
One of the best ways to test the performance of a new computer is to compute π to a large number of digits. The most efficient way to compute digits of π is to use the Chudnovsky algorithm developed in 1987 by David and Gregory Chudnovsky.
\[\frac{1}{\pi} = 12 \sum_{k=0}^{\infty} \frac{(-1)^k (6k)! (545140134k + 13591409)} {(3k)! (k!)^3 (640320)^{3k + 3/2}}\]
Each term adds around 14 decimal digits of π, making it the fastest known series for computing π.
Note: From a practical perspective, what would be the maximum number of digits ever needed? To compute the circumference of the universe in the smallest possible units of Angstroms, we need to know π to about 39 digits.
Computing π
To compute π to a larger number of digits we need a multi-precision math library. The GNU C++ compiler has a multi-precision integer library called GMP (GNU Multiple Precision) and a multi-precision floating point library based on GMP called MPFR (Multiple Precision Floating-Point Reliable). Both of these use C-style function calls. GMP also has a C++ wrapper interface that is easier to use. MPFR doesn't come with a similar wrapper class for C++ but there is one available called mpreal.h that you can download and use with MPFR.
The following program uses MPFR and OpenMP to distribute the computation of terms across multiple threads. Each term is computed and added to a thread local sum that will later combined into a total sum when all threads are complete.
pi_chudnovsky.cpp
/** * @file pi_chudnovsky.cpp * @brief Program to compute π. * * This program can compute π using the Chudnovsky algorithm. * * Syntax: pi_chudnovsky_fast <num_digits> * * get mpreal.h at https://github.com/advanpix/mpreal.git * * Compile with command: * MacOS: Apple Clang doesn't support OpenMP but homebrew does * put mpreal.h in /opt/homebrew/include * brew install gcc * brew install gmp * brew install mpfr * g++-14 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky pi_chudnovsky.cpp -lmpfr -lgmp -fopenmp * * Linux * put mpreal.h in /urs/local/include * sudo apt install libgmp-dev * sudo apt install libmpfr-dev * g++ -o pi_chudnovsky pi_chudnovsky.cpp -lmpfr -lgmp -fopenmp * * @author Richard Lesh * @date 2025-03-16 * @version 1.0 */ #include <gmpxx.h> // GMP C++ Wrapper #include <iomanip> #include <iostream> #include <mpreal.h> // MPFR C++ Wrapper #include <omp.h> // OpenMP #include <thread> int DIGITS = 50; int PRECISION; int NUM_TERMS; int NUM_THREADS = 4; // Adjust based on CPU cores using namespace mpfr; using namespace std; /** * @brief Compute π. * * Function to compute π using the Chudnovsky algorithm. * * @return mpreal A multi-precision value for π. */ mpreal compute_pi_chudnovsky() { mpreal::set_default_prec(PRECISION); // Set default precision in bits // Thread local storage for the partial sums computed in that thread mpreal partial_sums[NUM_THREADS]; // Parallel computation of series sum #pragma omp parallel for num_threads(NUM_THREADS) schedule(dynamic) for (int k = 0; k < NUM_TERMS; k++) { // Set default precision in bits for each thread mpreal::set_default_prec(PRECISION); int tid = omp_get_thread_num(); // Local GMP variables for factorials mpz_class k_fact, three_k_fact, six_k_fact; // Compute factorials mpz_fac_ui(k_fact.get_mpz_t(), k); mpz_fac_ui(three_k_fact.get_mpz_t(), 3 * k); mpz_fac_ui(six_k_fact.get_mpz_t(), 6 * k); // Compute numerator: +- (6k)! * (545140134 * k + 13591409) mpz_class numerator = six_k_fact * (mpz_class(545140134) * k + mpz_class(13591409)); // Apply alternating sign (-1)^k if (k % 2 != 0) mpz_neg(numerator.get_mpz_t(), numerator.get_mpz_t()); ; // Compute denominator: (3k)! * (k!)^3 * (640320^(3k+3/2)) mpreal denominator = pow(mpreal(640320), 3.0 * k + 1.5); denominator = denominator * mpz_class(three_k_fact * k_fact * k_fact * k_fact).get_mpz_t(); // Ensure denominator is never zero if (denominator == 0) { std::cerr << "Error: Division by zero at k = " << k << std::endl; exit(1); } // Compute term: numerator / denominator mpreal term = numerator.get_mpz_t() / denominator; // Accumulate thread-local sum partial_sums[tid] += term; } mpreal sum(0); // Reduce all thread-local sums into a single sum for (int i = 0; i < NUM_THREADS; i++) { sum += partial_sums[i]; } // Compute final pi: pi = 1 / (sum * 12) mpreal pi = 1 / (sum * 12); return pi; } /** * @brief Main function. * * The entry point of the program. It handles command-line arguments * setting up global settings and timing the computation. * * @return int Returns 0 on successful execution. */ int main(int argc, char **argv) { if (argc > 1) DIGITS = atoi(argv[1]); PRECISION = (DIGITS * 4); // Precision in bits (~3.3 decimal digits per bit) NUM_TERMS = (DIGITS / 14 + 2); // each additional term contributes about 14.18 decimal digits to π if (std::thread::hardware_concurrency() != 0) NUM_THREADS = std::thread::hardware_concurrency(); double start_time = omp_get_wtime(); mpreal pi = compute_pi_chudnovsky(); double end_time = omp_get_wtime(); cout << setprecision(DIGITS) << pi << endl; printf("Computation Time: %f seconds\n", end_time - start_time); return 0; }
We can make this program significantly faster by avoiding the computation of all the factorials and powers from scratch for each term in the series. If we can figure out how one term in the series relates to a previous term we can reduce the number of multiplies and divides. So assume that there is a factor ƒ such that:
\[t_{k + n} = ƒ_{k,n} \cdot t_k\]
where \(t_k\) is the kth term in the series and \(t_{k + n}\) is a later term in the series n terms beyond \(t_k\).
So this factor \(ƒ_{k,n}\) is:
\[ƒ_{k,n} = \frac{t_{k + n}}{t_k}\]
if we substitute
\[t_{k + n} = \frac{(6k+n)! \cdot (545140134(k+n)+13591409)}{(3(k+n))! \cdot ((k+n)!)^3 \cdot (640320)^{3(k+n)+1.5}}\]
and
\[t_k = \frac{(6k)! \cdot (545140134k+13591409)}{(3k)! \cdot (k!)^3 \cdot (640320)^{3k+1.5}}\]
we get after simplifying
\[ƒ_{k,n} = \frac{\prod_{i=1}^{n} (6k+i)}{\prod_{i=1}^{n} (3k+i) \cdot \left( \prod_{i=1}^{n} (k+i) \right)^3} \cdot \left(1 + \frac{545140134n}{545140134k+13591409}\right) \cdot \frac{1}{(640320)^{3n}}\]
where \(\prod_{i=1}^{n}x\) represents the factorial ratio of \(\frac{(x+n)!}{x!}\).
\[\prod_{i=1}^{n}x = \frac{(x+n)!}{x!} = (x + 1)\cdot(x + 2) \dots (x +n)\]
Notice that by computing factorial ratios we only multiply the last few factors of a factorial instead of multiplying all the factors that it takes to compute individual factorials. This saves us a lot of time. So our strategy is to use the full Chudnovsky formula for the first term, then simply calculate the factor \(ƒ_{k,n}\) for successive terms so that we can multiply the factor by the previous term that we have already computed.
pi_chudnovsky_fast.cpp
/** * @file pi_chudnovsky_fast.cpp * @brief Program to compute π. * * This program can compute π using the Chudnovsky algorithm. * * Syntax: pi_chudnovsky_fast <num_digits> * * get mpreal.h at https://github.com/advanpix/mpreal.git * * Compile with command: * MacOS: Apple Clang doesn't support OpenMP but homebrew does * put mpreal.h in /opt/homebrew/include * brew install gcc * brew install gmp * brew install mpfr * g++-14 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_fast pi_chudnovsky_fast.cpp -lmpfr -lgmp -fopenmp * * Linux * put mpreal.h in /urs/local/include * sudo apt install libgmp-dev * sudo apt install libmpfr-dev * g++ -o pi_chudnovsky_fast pi_chudnovsky_fast.cpp -lmpfr -lgmp -fopenmp * * @author Richard Lesh * @date 2025-03-16 * @version 1.0 */ #include <gmpxx.h> // GMP C++ Wrapper #include <iomanip> #include <iostream> #include <mpreal.h> // MPFR C++ Wrapper #include <omp.h> #include <thread> int DIGITS = 50; int PRECISION; int NUM_TERMS; int NUM_THREADS = 4; // Adjust based on CPU cores using namespace mpfr; using namespace std; /** * @brief Compute factorial ratio. * * Function to compute the ratio of two factorials. * Efficiently computes num_n! / denom_n! * * @param num_n Larger of the two factorial n values * @param denom_n Smaller of the two factorial n values * @return mpreal A multi-precision value for factorial ratio. */ mpz_class factorial_ratio(long num_n, long denom_n) { mpz_class result = 1; for (long i = denom_n + 1; i <= num_n; ++i) result *= i; return result; } /** * @brief Compute π. * * Function to compute π using the Chudnovsky algorithm. * Optimized to avoid computing terms from scratch each loop. * * @return mpreal A multi-precision value for π. */ mpreal compute_pi_chudnovsky() { mpreal::set_default_prec(PRECISION); // Set default precision in bits // thread local values mpreal partial_sums[NUM_THREADS]; mpreal term[NUM_THREADS]; long previous_k[NUM_THREADS]; // Initialzed to 0 in C++11 // Parallel computation of series sum #pragma omp parallel for num_threads(NUM_THREADS) schedule(dynamic) for (long k = 0; k < NUM_TERMS; k++) { // Set default precision in bits for each thread mpreal::set_default_prec(PRECISION); int tid = omp_get_thread_num(); // If no previous term has been computed use full formula if (previous_k[tid] == 0) { // Local GMP variables for factorials mpz_class k_fact, three_k_fact, six_k_fact; // Compute factorials mpz_fac_ui(k_fact.get_mpz_t(), k); mpz_fac_ui(three_k_fact.get_mpz_t(), 3 * k); mpz_fac_ui(six_k_fact.get_mpz_t(), 6 * k); // Compute numerator: +- (6k)! * (545140134 * k + 13591409) mpz_class numerator = six_k_fact * (mpz_class(545140134) * k + mpz_class(13591409)); // Apply alternating sign (-1)^k if (k % 2 != 0) mpz_neg(numerator.get_mpz_t(), numerator.get_mpz_t()); ; // Compute denominator: (3k)! * (k!)^3 * (640320^(3k+3/2)) mpreal denominator = pow(mpreal(640320), 3.0 * k + 1.5); denominator = denominator * mpz_class(three_k_fact * k_fact * k_fact * k_fact).get_mpz_t(); // Ensure denominator is never zero if (denominator == 0) { std::cerr << "Error: Division by zero at k = " << k << std::endl; exit(0); } // Compute term: numerator / denominator term[tid] = numerator.get_mpz_t() / denominator; } else { // otherwise use t(k+n) = f(k,n) * t(k) // n = k - previous_k long n = k - previous_k[tid]; // num_factor = ∏(6k+i) * (1 + (545140134 * n)/(545140134 * previous_k + 13591409)) mpreal num_factor = mpreal(1) + mpreal(545140134) * n / (mpreal(545140134) * previous_k[tid] + mpreal(13591409)); num_factor *= mpreal(factorial_ratio(6 * k, 6 * previous_k[tid]).get_mpz_t()); // denom_factor = ∏(3k+i) * (∏(k+i))^3 * 640320^(3 * n) mpz_class denom_factor = factorial_ratio(k, previous_k[tid]); denom_factor = denom_factor * denom_factor * denom_factor; denom_factor *= factorial_ratio(3 * k, 3 * previous_k[tid]); mpz_class temp = 640320; mpz_pow_ui(temp.get_mpz_t(), temp.get_mpz_t(), 3 * n); denom_factor *= temp; term[tid] *= num_factor / denom_factor.get_mpz_t(); // sign of term is (-1)^k term[tid].setSign(k % 2 != 0 ? -1: 1); } previous_k[tid] = k; // Accumulate thread-local sum partial_sums[tid] += term[tid]; } mpreal sum(0); // Reduce all thread-local sums into a single sum for (int i = 0; i < NUM_THREADS; i++) { sum += partial_sums[i]; } // Compute final pi: pi = 1 / (sum * 12) mpreal pi = 1 / (sum * 12); return pi; } /** * @brief Main function. * * The entry point of the program. It handles command-line arguments * setting up global settings and timing the computation. * * @return int Returns 0 on successful execution. */ int main(int argc, char **argv) { if (argc > 1) DIGITS = atoi(argv[1]); PRECISION = (DIGITS * 4); // Precision in bits (~3.3 decimal digits per bit) NUM_TERMS = (DIGITS / 14 + 2); // each additional term contributes about 14.18 decimal digits to π if (std::thread::hardware_concurrency() != 0) NUM_THREADS = std::thread::hardware_concurrency(); double start_time = omp_get_wtime(); mpreal pi = compute_pi_chudnovsky(); double end_time = omp_get_wtime(); cout << setprecision(DIGITS) << pi << endl; printf("Computation Time: %f seconds\n", end_time - start_time); return 0; }



Comparing run times for the unoptimized vs the optimized version of our program we see that the optimization has made a bit improvement. The first graph shows a nearly linear relationship in log-log space suggesting a power relationship. Even though our data is best fit by a power function in both cases, the optimized version has reduced the power slightly. We can see in the middle graph, that the unoptimized version grows much more rapidly than the optimized version. Unfortunately the optimized version is still \(O(n^2)\) which means that if we project out to even larger numbers of digits we see that the quadratic growth will make it difficult to compute π as the number of digits increase. If only there was an algorithm that was closer to \(O(n)\) that we could use.
Spigot Algorithms
Unlike series summation methods like the Chudnovsky formula, which require computing all preceding digits before reaching later ones, spigot algorithms generate digits sequentially without needing to sum the entire series. One such algorithm, developed by Stanley Rabinowitz and Stan Wagon, efficiently produces decimal digits of π. It operates using only integer arithmetic, eliminating the need for multi-precision math. A key advantage is that once a digit is generated, it is no longer required for subsequent computations. However, the algorithm does require an array of 32-bit integers approximately three times the number of desired digits. Despite this, its memory demands are significantly lower than those of the Chudnovsky formula.
spigot.cpp
/** * @file spicot.cpp * @brief Program to compute π. * * This program can compute π using the decimal spigot algorithm * of Rabinowitz and Wagon. * * http://stanleyrabinowitz.com/bibliography/spigot.pdf * * Syntax: spigot <num_digits> * * Compile with command: * g++ --std=c++11 spigot.cpp -o spigot * * @author Richard Lesh * @date 2025-03-16 * @version 1.0 */ #include <chrono> #include <iostream> #include <sstream> #include <string> #include <thread> #include <vector> using namespace std; /** * @brief Compute π. * * Function to compute π using the decimal spigot algorithm. * * @param n_digits Number of digits to produce. * @return string A string representing π. */ string spigot_pi(int n_digits) { char* s = new char[n_digits + 2]; int N = (10 * n_digits / 3) + 1; // Size estimate of the array vector<int> A(N, 2); // Initialize array with 2s int nines = 0, predigit = 0; int skipFirst = 2; s[0] = '3'; s[1] = '.'; char* sp = s + 2; for (int i = 0; i < n_digits; ++i) { int carry = 0; for (int j = N - 1; j >= 0; --j) { int num = A[j] * 10 + carry * (j + 1); A[j] = num % (2 * j + 1); carry = num / (2 * j + 1); } A[0] = carry % 10; carry = carry / 10; // Handle 9s and predigit corrections if (carry == 9) { ++nines; } else if (carry == 10) { *(sp++) = predigit + 1 + '0'; for (int k = 0; k < nines; ++k) *(sp++) = '0'; predigit = 0; nines = 0; } else { if (skipFirst) --skipFirst; else *(sp++) = predigit + '0'; predigit = carry; if (nines > 0) { for (int k = 0; k < nines; ++k) *(sp++) = '9'; nines = 0; } } } *(sp++) = predigit; *sp = '\0'; return string(s); } /** * @brief Main function. * * The entry point of the program. It handles command-line arguments * and timing the computation. * * @return int Returns 0 on successful execution. */ int main(int argc, char **argv) { int n = atoi(argv[1]); auto start = chrono::high_resolution_clock::now(); // Start timer string pi = spigot_pi(n); auto end = chrono::high_resolution_clock::now(); // End timer chrono::duration<double> duration = end - start; // Compute duration cout << pi << endl; cout << "Computation time: " << duration.count() << " seconds" << endl; return 0; }
One disadvantage of this spigot algorithm is that it cannot be parallelized like the Chudnovsky algorithm can. This means that it can be a good performance test for a singe CPU, it can't be used to test the performance of a multiple CPU system.
Another spigot algorithm is the BBP (Bailey–Borwein–Plouffe) algorithm. It is a spigot algorithm that can compute arbitrary hexadecimal (base-16) digits of π without requiring the computation of preceding digits. This makes it unique among traditional π algorithms.
The BBP algorithm was discovered in 1995 by Simon Plouffe. It is defined as:
\[\pi = \sum_{k=0}^{\infty} \frac{1}{16^k} \left( \frac{4}{8k+1} - \frac{2}{8k+4} - \frac{1}{8k+5} - \frac{1}{8k+6} \right)\]
How BBP Works
- The formula is used to compute the nth hexadecimal digit of π.
- The computation is done in modular arithmetic, which allows skipping previous digits.
- Since it works in base-16, the output is a sequence of hexadecimal digits.
- The modular exponentiation step ensures that intermediate values remain manageable.