Computing π (Revisited)

Image credit: Cormullion, Julia code here.

Simple Series Summation

π (Pi) is the mathematical constant that represents the ratio of a circle’s circumference to its diameter. In a previous post we explored some simple ways to compute π to many digits. The most efficient way to compute π begins with the amazing series attributed to David and Gregory Chudnovsky.

\[\frac{1}{\pi} = 12 \sum_{k=0}^{\infty} \frac{(-1)^k (6k)! (545140134k + 13591409)} {(3k)! (k!)^3 (640320)^{3k + 3/2}}\]

Each term adds around 14 decimal digits of π, making it the fastest known series for computing π. Even so, it doesn't take many digits of π before it begins to take some significant time to complete. On a MacBook Pro with an Apple M2 Max processor, it takes close to four minutes to compute 10,000 digits of π. If we want to compute more digits, we need more efficient algorithms.

chudnovsky.cpp

/**
* @file pi_chudnovsky.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series.
* https://en.wikipedia.org/wiki/Chudnovsky_algorithm
*
* Syntax: pi_chudnovsky [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky pi_chudnovsky.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky pi_chudnovsky.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-03-16
* @version 1.0
*/

#include <gmpxx.h>      // GMP C++ Wrapper
#include <iomanip>
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper

int DIGITS = 50;        // in decimal digits
int PRECISION;          // in bits
int NUM_TERMS;

using namespace mpfr;
using namespace std;

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm.
 *
\f[
\frac{1}{\pi} = 12 \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (545140134k + 13591409)}
{(3k)! (k!)^3 (640320)^{3k + 3/2}}
\f]
* @return mpreal Multi-precision value for π.
 */
mpreal compute_pi_chudnovsky() {
    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    mpreal sum(0);
    for (int k = 0; k < NUM_TERMS; k++) {
        // Local GMP variables for factorials
        mpz_class k_fact, three_k_fact, six_k_fact;

        // Compute factorials
        mpz_fac_ui(k_fact.get_mpz_t(), k);
        mpz_fac_ui(three_k_fact.get_mpz_t(), 3 * k);
        mpz_fac_ui(six_k_fact.get_mpz_t(), 6 * k);

        // Compute numerator: +- (6k)! * (545140134 * k + 13591409)
        mpz_class numerator = six_k_fact * (mpz_class(545140134) * k + mpz_class(13591409));
        // Apply alternating sign (-1)^k
        if (k % 2 != 0) mpz_neg(numerator.get_mpz_t(), numerator.get_mpz_t()); ;

        // Compute denominator: (3k)! * (k!)^3 * (640320^(3k+3/2))
        mpreal denominator = pow(mpreal(640320), 3.0 * k + 1.5);
        denominator = denominator * mpz_class(three_k_fact * k_fact * k_fact * k_fact).get_mpz_t();

        // Compute term: numerator / denominator
        mpreal term = numerator.get_mpz_t() / denominator;

        // Accumulate thread-local sum
        sum += term;
    }

    // Compute final pi: pi = 1 / (sum * 12)
    mpreal pi = 1 / (sum * 12);
    return pi;
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 16);
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);
    auto start = chrono::high_resolution_clock::now();
    mpreal pi = compute_pi_chudnovsky();
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;

    cout << setprecision(DIGITS) << pi << endl;
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}

Multiprocessing Enhancement

We can use multiple cores to compute the Chudnovsky series more quickly. Since we can compute partial sums using separate threads, then combine the partial sums, we can speed up the process of computing π. For example assume that we have four cores available to launch four threads. Then we have...

\[series = \sum_{i=0}^{n}t_i = \sum_{i=0}^{n/4} \left( t_{4i} + t_{4i+1} + t_{4i+2} + t_{4i+3} \right) = \sum_{i=0}^{n/4}t_{4i} + \sum_{i=0}^{n/4}t_{4i+1} + \sum_{i=0}^{n/4}t_{4i+2} +\sum_{i=0}^{n/4}t_{4i+3}\]

where each of the final four partial sums can be computed independently by a thread. This formulation should allow us to compute the Chudnovsky series four times faster. We can apply this idea for as many cores as are available obtaining a corresponding multiple increase in speed.

An alternate, but equivalent approach is to create a thread pool that contains as many threads as there are processors. Then assign the computation of each individual term in the series to the thread pool. This version of the program uses the ThreadPool class developed in another post.

pi_chudnovsky_threaded.cpp
/**
* @file pi_chudnovsky_threaded.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series.
* https://en.wikipedia.org/wiki/Chudnovsky_algorithm
*
* Breaks the Chudnovsky series into partial sums to compute using multiple
* processors.  Requires ThreadPool class.
*
* Syntax: pi_chudnovsky_threaded [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_threaded pi_chudnovsky_threaded.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky_threaded pi_chudnovsky_threaded.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-03-16
* @version 1.0
*/

#include <gmpxx.h>      // GMP C++ Wrapper
#include <iomanip>
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper
#include <ThreadPool.hpp>

int DIGITS = 50;        // in decimal digits
int PRECISION;          // in bits
int NUM_TERMS;

using namespace mpfr;
using namespace std;

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm in a multithreaded
 * manner.
 *
\f[
\frac{1}{\pi} = 12 \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (545140134k + 13591409)}
{(3k)! (k!)^3 (640320)^{3k + 3/2}}
\f]
* @return mpreal A multi-precision value for π.
 */
mpreal compute_pi_chudnovsky() {
    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    mpreal sum(0);
    mutex mtx;
    auto pool = ThreadPool::getThreadPool();
    for (int k = 0; k < NUM_TERMS; k++) {
        pool->enqueue([k, &mtx, &sum]() {
            mpreal::set_default_prec(PRECISION);  // Set default precision in bits
            // Local GMP variables for factorials
            mpz_class k_fact, three_k_fact, six_k_fact;

            // Compute factorials
            mpz_fac_ui(k_fact.get_mpz_t(), k);
            mpz_fac_ui(three_k_fact.get_mpz_t(), 3 * k);
            mpz_fac_ui(six_k_fact.get_mpz_t(), 6 * k);

            // Compute numerator: +- (6k)! * (545140134 * k + 13591409)
            mpz_class numerator = six_k_fact * (mpz_class(545140134) * k + mpz_class(13591409));
            // Apply alternating sign (-1)^k
            if (k % 2 != 0) mpz_neg(numerator.get_mpz_t(), numerator.get_mpz_t()); ;

            // Compute denominator: (3k)! * (k!)^3 * (640320^(3k+3/2))
            mpreal denominator = pow(mpreal(640320), 3.0 * k + 1.5);
            denominator = denominator * mpz_class(three_k_fact * k_fact * k_fact * k_fact).get_mpz_t();

            mpreal term = numerator.get_mpz_t() / denominator;

            lock_guard<mutex> lock(mtx);
            sum += term;
        });
    }
    pool->shutdown();
    // Compute final pi: pi = 1 / (sum * 12)
    mpreal pi = 1 / (sum * 12);
    return pi;
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky_threaded [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 16);
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);

    auto start = chrono::high_resolution_clock::now();
    mpreal pi = compute_pi_chudnovsky();
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;

    cout << setprecision(DIGITS) << pi << endl;
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}

Incremental Computation

Another technique to significantly improve the performance of our algorithm is to incrementally compute each term from the previous term. This will result in many fewer expensive multiplications compared to computing each term from scratch. For example, consider the factorial function \(x!\). To compute the factorial directly requires multiplying \(x\) factors using \(x-1\) multiplications. If, however we know the value of \((x-1)!\), we can efficiently compute \(x!\) using only a single multiplication, e.g. \(x! = x(x-1)!\).

So all we need to do is figure out what the ratio of one term of the Chudnovsky series compared to the previous term, \(t_k \over t_{k-1}\), might be. So assume that there is a factor ƒ such that:

\[t_{k} = ƒ_{k,1} \cdot t_{k-1}\]

where \(t_k\) is the kth term in the series and \(t_{k -1}\) is the previous term in the series before \(t_k\).

So this factor \(ƒ_{k,1}\) is:

\[ƒ_{k,1} = \frac{t_{k}}{t_{k-1}}\]

First let us factor out the constant \(640329^{1.5}\) factor in our Chudnovsky terms.

\[ \begin{aligned}
\frac{1}{\pi} &= 12 \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k + 3/2}} \\
&= \frac{12}{640320 \sqrt{640320}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}} \\
&= \frac{1}{426880 \sqrt{10005}} \sum^\infty_{k=0} \frac{(-1)^k (6k)! (13591409 + 545140134k)}{(3k)!(k!)^3 640320^{3k}}
\end{aligned} \]

Now we break the summation into two separate summations using the following substitutions:

\[\begin{aligned}a &= \sum^\infty_{k=0} \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}} \\
b &= \sum^\infty_{k=0} \frac{(-1)^k (6k)!k}{(3k)!(k!)^3 640320^{3k}} \end{aligned}\]

then that gives us the following formula for π.

\[ \begin{aligned}\frac{1}{\pi} &= \frac{13591409a + 545140134b}{426880 \sqrt{10005}} \\
\pi &= \frac{426880 \sqrt{10005}}{13591409a + 545140134b} \end{aligned}\]

So now we can compute the factor \(f_{k,1}\) for the terms in series \(a\) using...

\[a_{k-1} = \frac{(-1)^{k-1} (6k-6)!}{(3k-3)!((k-1)!)^3 640320^{3k-3}}\]

and

\[a_k = \frac{(-1)^k (6k)!}{(3k)!(k!)^3 640320^{3k}}\]

we get after simplifying

\[\begin{align}
f^a_{k,1} &= \frac{a_{k}}{a_{k-1}} \\
&= - \frac{(6k-5)(6k-4)(6k-3)(6k-2)(6k-1)6k}{3k(3k-1)(3k-2)k^3 640320^3} \\
&= - \frac{24(6k-5)(2k-1)(6k-1)}{k^3 640320^3} \end{align} \]

Notice the computing the terms in the \(b\) series are easily computed from the \(a\) series terms as

\[ b_k = k \cdot a_k \]

By computing factorial ratios we only multiply the last few factors of a factorial instead of multiplying all the factors that it takes to compute individual factorials. This saves us a lot of time. So our strategy is to use the full Chudnovsky formula for the first term, then simply calculate the factor \(ƒ_{k,1}\) for successive terms so that we can multiply the factor by the previous term that we have already computed.

pi_chudnovsky_incremental.cpp
/**
* @file pi_chudnovsky_incremental.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series.
* Incrementally computes successive terms in the series from the previous
* term to significantly reduce the number of multiply operations.
*
* Syntax: pi_chudnovsky_incremental [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_incremental pi_chudnovsky_incremental.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky_incremental pi_chudnovsky_incremental.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-03-16
* @version 1.0
*/

#include <gmpxx.h>      // GMP C++ Wrapper
#include <iomanip>
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper

int DIGITS = 50;        // in decimal digits
int PRECISION;          // in bits
int NUM_TERMS;

using namespace mpfr;
using namespace std;

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm using fewer multiplies
 * by incrementally computing successive terms in the series from the previous
 * term.
 *
 * @return mpreal A multi-precision value for π.
 */
mpreal compute_pi_chudnovsky() {
    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    mpreal sum(0);
    mpreal a = mpreal(1);
    mpreal b = mpreal(0);
    mpreal sum_a = mpreal(1);
    mpreal sum_b = mpreal(0);
    for (int k = 1; k < NUM_TERMS; k++) {
        mpz_class numerator = -24 * mpz_class(6 * k - 5) * mpz_class(2 * k - 1) * mpz_class(6 * k - 1);
        mpz_class denominator = mpz_class("262537412640768000") * mpz_class(k) * mpz_class(k) * mpz_class(k);

        // Compute terms a & b
        a = a * mpreal(numerator.get_mpz_t()) / mpreal(denominator.get_mpz_t());
        b = k * a;
        // Accumulate thread-local sum
        sum_a += a;
        sum_b += b;
    }

    mpreal pi = 426880 * sqrt(mpreal(10005)) / (mpreal(13591409) * sum_a + mpreal(545140134) * sum_b);
    return pi;
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky_incremental [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 16);
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);

    auto start = chrono::high_resolution_clock::now();
    mpreal pi = compute_pi_chudnovsky();
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;

    cout << setprecision(DIGITS) << pi << endl;
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}

Incremental + Multiprocessor

We can now apply the partial sum technique that we have seen before to the incremental version of our program. We will need to create the partial sums a bit differently so as to benefit from the incremental computation of terms. The idea is to divide the total number of terms into groups of sequential terms and assign these groups to individual threads.

pi_chudnovsky_incr_thread.cpp
/**
* @file pi_chudnovsky_incr_thread.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series.
* Incrementally computes successive terms in the series from the previous
* term to significantly reduce the number of multiply operations.  Takes
* advantage of all available processors to improve performance.
* Requires ThreadPool class.
*
* Syntax: pi_chudnovsky_incr_thread [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_incr_thread pi_chudnovsky_incr_thread.cpp ThreadPool.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky_incr_thread pi_chudnovsky_incr_thread.cpp ThreadPool.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-03-16
* @version 1.0
*/

#include <gmpxx.h>      // GMP C++ Wrapper
#include <iomanip>
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper
#include <mutex>
#include <ThreadPool.hpp>

int DIGITS = 50;        // in decimal digits
int PRECISION;          // in bits
int NUM_TERMS;

using namespace mpfr;
using namespace std;

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm.
 *
 * @return mpreal A multi-precision value for π.
 */
mpreal compute_pi_chudnovsky() {
    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    // Get the number of available processors
    int numThreads = std::thread::hardware_concurrency();
    int termsPerTask = NUM_TERMS / (numThreads * 2);
    termsPerTask = std::max(1, termsPerTask);
    cout << "NUM_TERMS: " << NUM_TERMS << endl;
    cout << "termsPerTask: " << termsPerTask << endl;
    auto pool = ThreadPool::getThreadPool();
    mutex mtx;          // mutex to synchronize writing to sum_a and sum_b
    mpreal sum_a(0);
    mpreal sum_b(0);
    for (int k_start = 0; k_start < NUM_TERMS; k_start += termsPerTask) {
        int k_last = std::min(k_start + termsPerTask, NUM_TERMS);
        pool->enqueue([k_start, k_last, &mtx, &sum_a, &sum_b]() {
            // Set the default precision in bits for each thread
            mpreal::set_default_prec(PRECISION);
            // Local GMP variables for factorials
            mpz_class k_fact, three_k_fact, six_k_fact;

            // Compute factorials
            mpz_fac_ui(k_fact.get_mpz_t(), k_start);
            mpz_fac_ui(three_k_fact.get_mpz_t(), 3 * k_start);
            mpz_fac_ui(six_k_fact.get_mpz_t(), 6 * k_start);

            // Compute numerator: +- (6k)!
            mpz_class numerator = six_k_fact;
            // Apply alternating sign (-1)^k
            if (k_start % 2 != 0) mpz_neg(numerator.get_mpz_t(), numerator.get_mpz_t()); ;

            // Compute denominator: (3k)! * (k!)^3 * (640320^(3k))
            mpz_class denominator;
            mpz_pow_ui(denominator.get_mpz_t(), mpz_class(640320).get_mpz_t(), 3L * k_start);
            denominator = denominator *  k_fact * k_fact * k_fact * three_k_fact;

            // Compute term: numerator / denominator
            mpreal a = mpreal(numerator.get_mpz_t()) / mpreal(denominator.get_mpz_t());
            mpreal b = a * k_start;
            mpreal sum_a_local = a;
            mpreal sum_b_local = b;
            for (int k = k_start + 1; k < k_last; k++) {
                mpz_class numerator = -24 * mpz_class(6 * k - 5) * mpz_class(2 * k - 1) * mpz_class(6 * k - 1);
                mpz_class denominator = mpz_class("262537412640768000") * mpz_class(k) * mpz_class(k) * mpz_class(k);

                // Compute terms a & b
                a = a * mpreal(numerator.get_mpz_t()) / mpreal(denominator.get_mpz_t());
                b = k * a;
                // Accumulate thread-local sum
                sum_a_local += a;
                sum_b_local += b;
            }
            lock_guard<mutex> lock(mtx);
            sum_a += sum_a_local;
            sum_b += sum_b_local;
        });
    }
    pool->shutdown();
    mpreal pi = 426880 * sqrt(mpreal(10005)) / (mpreal(13591409) * sum_a + mpreal(545140134) * sum_b);
    return pi;
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky_incr_thread [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 16);
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);

    auto start = chrono::high_resolution_clock::now();
    mpreal pi = compute_pi_chudnovsky();
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;

    cout << setprecision(DIGITS) << pi << endl;
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}

Binary Splitting

All of these improvements are great with each getting us about one order of magnitude speed improvement. But they all have the same growth pattern between \(O(n^2)\) in the best case and \(O(n^{2.5})\) in the worst case of the simple Chudnovsky. That computational complexity eventually makes these algorithms unusable for large numbers of digits say beyond 1,000,000 digits.

An elegant solution called binary splitting helps address this challenge. Since multiplying large numbers is computationally expensive, binary splitting works by breaking the problem into smaller pieces—splitting large numbers into smaller ones with fewer digits. These smaller numbers are multiplied individually, and their results are then combined to reconstruct the final product of the original large numbers.

The technique applies to all linearly convergent sums of the form...

\[ S = \sum_{n=0}^{\infty}\frac{a(n)}{b(n)}\frac{p(0)…p(n)}{q(0)…q(n)} \ \]

where \(a(n)\), \(b(n)\), \(p(n)\), \(q(n)\) are integers or are polynomials in \(n\) with integer coefficients. If we consider the partial sum of terms \(t_n\) where \(i <= n < j\) then we have...

\[S_{i,j} = \sum_{n=i}^{j-1}\frac{a(i)}{b(i)}\frac{p(i)}{q(i)} + \frac{a(i+1)}{b(i+1)}\frac{p(i)p(i+1)}{q(i)q(i+1)} + \frac{a(i+2)}{b(i+2)}\frac{p(i)p(i+1)p(i+2)}{q(i)q(i+1)q(i+2)} + … + \frac{a(j-1)}{b(j-1)}\frac{p(i)p(i+1)p(i+2)…p(j-1)}{q(i)q(i+1)q(i+2)…q(j-1)}\]

Notice how each additional term is the product of a new \(p/q\) ratio times the previous term's combined \(p/q\). This is the same pattern that we discovered for the Chudnovsky series in the section on incremental computation of terms.

To implement binary splitting we don't compute this series directly but instead compute the integer quantities...

\[ \begin{aligned}
P_{i,j} &= p(i)...p(j− 1) \\
Q_{i,j} & = q(i)...q(j− 1) \\
B_{i,j} &= b(i)...b(j− 1) \\
T_{i,j} &= B_{i,j}Q_{i,j}S_{i,j} \end{aligned} \]

If \( j - i = 1 \) then we will have...

\[\begin{align}
P_{i,j} &= p(i) \\
Q_{i,j} &= q(i) \\
B_{i,j} &= b(i) \\
T_{i,j} &= a(i)p(i)
\end{align}\]

If \( j - i = 2 \) then we will have...

\[\begin{align}
P_{i,j} &= p(i)p(j) \\
Q_{i,j} &= q(i)q(j) \\
B_{i,j} &= b(i)b(j) \\
T_{i,j} &= b(i)p(i)a(j)p(j) + b(j)q(j)a(i)p(i)
\end{align}\]

For partial sums where \(j - i > 2\) we split the partial sum into two sums...

\[\begin{align}
m &= \left\lfloor \frac{i + j}{2} \right\rfloor \\
P_{i,j} &= P_{i,m}P_{m,j} \\
Q_{i,j} &= Q_{i,m}Q_{m,j} \\
B_{i,j} &= B_{i,m}B_{m,j} \\
T_{i,j} &= B_{i,m}P_{i,m}T_{m,j} + B_{m,j} Q_{m,j} T_{i,m}
\end{align}\]

So we use recursion to compute \(S_{i,j}\) by splitting it down to a left and right side \(S\) computing the \(P\), \(Q\), \(B\) and \(T\) values for left and right splits then combining those results into a \(P\), \(Q\), \(B\) and \(T\) for \(S_{i,j}\). Once we have the \(P\), \(Q\), \(B\) and \(T\) for \(S_{i,j}\) we can then compute the partial sum as...

\[S_{i,j} = \frac{T_{i,j}}{B_{i,j}Q_{i,j}}\]

To compute π we set \(a(n) = A + nB\), \(b(n) = 1\), \(p(0) = 1\), \(q(0) = 1\), and \(p(n) =−(6n− 5)(2n− 1)(6n− 1)\), \(q(n) = n3C3/24\) for \(n > 0\) where \(A = 13591409\), \(B = 545140134\) and \(C = 640320\). Using these equations we can compute the first \(N\) terms of the Chudnovsky series summation \(S_{0,N}\). Then the formula for π becomes...

\[ π = \frac{426880 \sqrt{10005}}{S_{0,N}}\]

This gives us an algorithm that is close to \(O(n^{1.15})\). We can only do a little better by adding multithreading modifications to this algorithm to achieve \(O(n^{1.09})\). The reason is that the binary splitting technique is so fast that much of the time to compute π is in the computation of the \(\sqrt{10005}\) to N digits.

Compute Time Revisited
pi_chudnovsky_binary_split.cpp
/**
* @file pi_chudnovsky_binary_split.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series and the binary
* splitting technique of https://www.ginac.de/CLN/binsplit.pdf
*
* Syntax: pi_chudnovsky_binary_split [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_binary_split pi_chudnovsky_binary_split.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky_binary_split pi_chudnovsky_binary_split.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-07-16
* @version 1.0
*/

#include <gmpxx.h>      // GMP C++ Wrapper
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper

int DIGITS = 50;        // in decimal digits
long PRECISION;          // in bits
long NUM_TERMS;

using namespace mpfr;
using namespace std;

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm.
 *
 * @return mpreal A multi-precision value for π.
 */

struct PQBT {
    mpz_class P;
    mpz_class Q;
    mpz_class B;
    mpz_class T;
};

// For π using Chudnovsky series
// A = 13591409, B = 545140134, C = 640320
// a(n) = A + nB, b(n) = 1, p(0) = 1, q(0) = 1, and
// p(n) =−(6n − 5)(2n − 1)(6n − 1)
// q(n) = n^3 x C^3 / 24 for n > 0.
PQBT compute_pi_chudnovsky_binary_split(int n, int m) {
    mpreal::set_default_prec(PRECISION); // Set default precision in bits
    PQBT pqbt;
    if (m - n == 1) {
        if (n == 0) {
            pqbt.P = 1;             // p(0)
            pqbt.Q = 1;             // q(0)
            pqbt.B = 1;             // b(0)
            pqbt.T = 13591409;      // a(0)p(0)
        } else {
            mpz_class mpz_n = mpz_class(n);
            // p(n)
            pqbt.P = -mpz_class(6 * n - 5) * mpz_class(2 * n - 1) * mpz_class(6 * n - 1);
            // q(n)
            pqbt.Q = mpz_class("10939058860032000") * mpz_n * mpz_n * mpz_n;
            // b(n)
            pqbt.B = mpz_class(1);
            // a(n)p(n)
            pqbt.T = (mpz_class(13591409) + mpz_class(545140134) * mpz_n) * pqbt.P;
        }
        return pqbt;
    } else if (m - n == 2) {
        PQBT left = compute_pi_chudnovsky_binary_split(n, m - 1);
        PQBT right = compute_pi_chudnovsky_binary_split(n + 1, m);
        pqbt.P = left.P * right.P;
        pqbt.Q = left.Q * right.Q;
        pqbt.B = left.B * right.B;
        pqbt.T = right.B * right.Q * left.T + left.B * left.P * right.T;
        return pqbt;
    } else {
        int new_m = (m + n) / 2;
        PQBT left = compute_pi_chudnovsky_binary_split(n, new_m);
        PQBT right = compute_pi_chudnovsky_binary_split(new_m, m);
        pqbt.P = left.P * right.P;
        pqbt.Q = left.Q * right.Q;
        pqbt.B = left.B * right.B;
        pqbt.T = right.B * right.Q * left.T + left.B * left.P * right.T;
        return pqbt;
    }
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky_binary_split [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 16);
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);

    auto start = chrono::high_resolution_clock::now();
    PQBT pi_sum = compute_pi_chudnovsky_binary_split(0, NUM_TERMS);
    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    mpz_class BQ = pi_sum.B * pi_sum.Q;
    mpreal pi = 426880 * sqrt(mpreal(10005)) * mpreal(BQ.get_mpz_t()) / mpreal(pi_sum.T.get_mpz_t());
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;

    mp_exp_t exp;
    char* str = mpfr_get_str(nullptr, &exp, 10, DIGITS, pi.mpfr_ptr(), MPFR_RNDN);
    for (int i = 0; i < exp; ++i)
        std::cout << str[i];                         // Integer part
    std::cout << "." << (str + exp) << std::endl;    // Fractional part
    free(str);                                       // Must free str
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}
pi_chudnovsky_binary_split_thread.cpp
/**
* @file pi_chudnovsky_binary_split_thread.cpp
* @brief Program to compute π.
*
* This program can compute π using the Chudnovsky series and the binary
* splitting technique of https://www.ginac.de/CLN/binsplit.pdf
* Requires ThreadPool class.
*
* Syntax: pi_chudnovsky_binary_split_thread [num_digits]
*
* get mpreal.h at https://github.com/advanpix/mpreal.git
*
* Compile with command:
* macOS:
*   install homebrew https://brew.sh
*   put mpreal.h in /opt/homebrew/include
* brew install gcc
* brew install gmp
* brew install mpfr
* g++ -std=c++20 -O3 -I/opt/homebrew/include -L/opt/homebrew/lib -o pi_chudnovsky_binary_split_thread pi_chudnovsky_binary_split_thread.cpp ThreadPool.cpp -lmpfr -lgmp
*
* Linux
*   put mpreal.h in /urs/local/include
* sudo apt install libgmp-dev
* sudo apt install libmpfr-dev
* g++ -std=c++20 -O3 -o pi_chudnovsky_binary_split_thread pi_chudnovsky_binary_split_thread.cpp ThreadPool.cpp -lmpfr -lgmp
*
* @author Richard Lesh
* @date 2025-07-16
* @version 1.0
*/

#include <cassert>
#include <cmath>
#include <gmpxx.h>      // GMP C++ Wrapper
#include <iostream>
#include <mpreal.h>     // MPFR C++ Wrapper
#include <ThreadPool.hpp>
#include <semaphore>

using namespace mpfr;
using namespace std;

#undef OUTPUT_HEX
long DIGITS = 50;        // in decimal/hexadecimal digits
long PRECISION;          // in bits
long NUM_TERMS;
int MAX_THREAD_DEPTH = static_cast<int>(round(log(thread::hardware_concurrency()) / log(2.0)));

/**
 * @brief Compute π.
 *
 * Function to compute π using the Chudnovsky algorithm.
 *

\f[
\frac{1}{\pi} = 12 \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (545140134k + 13591409)}
{(3k)! (k!)^3 (640320)^{3k + 3/2}}
\f]

 * If we simplify the sum by pulling out the constant parts we get...

\f[
\frac{1}{\pi} = \frac{1}{426880\sqrt{10005}} \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (545140134k + 13591409)}
{(3k)! (k!)^3 (640320)^{3k}}
\f]

 * Because there is an addition within the series term, we can break the
 * summation into two separate summations...

\f[
\frac{1}{\pi} = \frac{1}{426880\sqrt{10005}} \left(\sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (545140134k)}
{(3k)! (k!)^3 (640320)^{3k}} + \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)! (13591409)}
{(3k)! (k!)^3 (640320)^{3k}}\right)
\f]

 * Again pulling out the constants from each series we can write our formula as...

\f[
\frac{1}{\pi} = \frac{13591409a + 545140134b}{426880\sqrt{10005}}
\f]

where

\f[
a = \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)!}
{(3k)! (k!)^3 (640320)^{3k}}
\f]

and

\f[
b = \sum_{k=0}^{\infty}
\frac{(-1)^k (6k)!k}
{(3k)! (k!)^3 (640320)^{3k}}
\f]

 * This allows us to write the 'a' term using a recurrence relation.  We can
 * compute the b term from the 'a' term.

\f[
a_k =
\frac{(-1)^k (6k)!}
{(3k)! (k!)^3 (640320)^{3k}}
\f]

\f[
b_k = k \cdot a_k
\f]

and the ratio between the next 'a' over the previous 'a' is...

\f[
\frac{a_{k}}{a_{k-1}} = - \frac{(6k-5)(6k-4)(6k-3)(6k-2)(6k-1)6k}{3k(3k-1)(3k-2)k^3 640320^3} \\
= - \frac{24(6k-5)(2k-1)(6k-1)}{k^3 640320^3}
\f]

 * @return mpreal A multi-precision value for π.
 */

struct PQBT {
    mpz_class P;
    mpz_class Q;
    mpz_class B;
    mpz_class T;
};

// For π using Chudnovsky series
// A = 13591409, B = 545140134, C = 640320
// a(n) = A + nB, b(n) = 1, p(0) = 1, q(0) = 1, and
// p(n) =−(6n− 5)(2n− 1)(6n− 1)
// q(n) = n^3 x C^3 / 24 for n > 0.
// NOLINTNEXTLINE(misc-no-recursion)
PQBT compute_pi_chudnovsky_binary_split(long n, long m, int depth) {
    mpreal::set_default_prec(PRECISION); // Set default precision in bits
    mpfr::mpreal::set_emin(-20 * DIGITS);
    mpfr::mpreal::set_emax(+20 * DIGITS);
    PQBT pqbt;
    if (m - n == 1) {
        if (n == 0) {
            pqbt.P = 1;             // p(0)
            pqbt.Q = 1;             // q(0)
            pqbt.B = 1;             // b(0)
            pqbt.T = 13591409;      // a(0)p(0)
        } else {
            mpz_class mpz_n = mpz_class(n);
            // p(n)
            pqbt.P = -mpz_class(6 * n - 5) * mpz_class(2 * n - 1) * mpz_class(6 * n - 1);
            // q(n)
            pqbt.Q = mpz_class("10939058860032000") * mpz_n * mpz_n * mpz_n;
            // b(n)
            pqbt.B = mpz_class(1);
            // a(n)p(n)
            pqbt.T = (mpz_class(13591409) + mpz_class(545140134) * mpz_n) * pqbt.P;
        }
        return pqbt;
    } else if (m - n == 2) {
        PQBT left = compute_pi_chudnovsky_binary_split(n, m - 1, depth + 1);
        PQBT right = compute_pi_chudnovsky_binary_split(n + 1, m, depth + 1);
        pqbt.P = left.P * right.P;
        pqbt.Q = left.Q * right.Q;
        pqbt.B = left.B * right.B;
        pqbt.T = right.B * right.Q * left.T + left.B * left.P * right.T;
        return pqbt;
    } else {
        long new_m = (m + n) / 2;
        PQBT left, right;
        if (depth < MAX_THREAD_DEPTH) {
            shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool();
            counting_semaphore<1> semaphore(0);
            pool->enqueue([&right, &pool, &semaphore, m, new_m, depth]() {
                pool->setThreadName(to_string(new_m) + "-" + to_string(m));
                right = compute_pi_chudnovsky_binary_split(new_m, m, depth + 1);
                semaphore.release();
            });
            left = compute_pi_chudnovsky_binary_split(n, new_m, depth + 1);
            semaphore.acquire();
        } else {
            right = compute_pi_chudnovsky_binary_split(new_m, m, depth + 1);
            left = compute_pi_chudnovsky_binary_split(n, new_m, depth + 1);
        }
        pqbt.P = left.P * right.P;
        pqbt.Q = left.Q * right.Q;
        pqbt.B = left.B * right.B;
        pqbt.T = right.B * right.Q * left.T + left.B * left.P * right.T;
        return pqbt;
    }
}

/**
 * @brief Main function.
 *
 * The entry point of the program.
 *
 * Syntax: pi_chudnovsky_binary_split_thread [num_digits]
 *
 * @return int Returns 0 on successful execution.
 */
int main(int argc, char **argv) {
    if (argc > 1) {
        char* endptr;
        DIGITS = strtol(argv[1], &endptr, 10);
        if (*endptr != '\0' || endptr == argv[1]) {
            cerr << "Error: Invalid number format\n";
            return 1;
        }
    }
#ifdef OUTPUT_HEX
    PRECISION = DIGITS * 4 + 32;
    DIGITS = ceil(DIGITS * 4 / 3.32193);
#else
    // Precision in bits (~3.3 bits per decimal digit)
    PRECISION = static_cast<long>(DIGITS * 3.32193 + 32);
#endif
    // each additional term contributes about 14.18 decimal digits to π
    NUM_TERMS = (DIGITS / 14 + 2);

    mpreal::set_default_prec(PRECISION);  // Set default precision in bits
    mpfr::mpreal::set_emin(-20 *DIGITS);
    mpfr::mpreal::set_emax(+20 * DIGITS);
    auto start = chrono::high_resolution_clock::now();
    shared_ptr<ThreadPool> pool = ThreadPool::getThreadPool(static_cast<unsigned int>(std::pow(2.0, MAX_THREAD_DEPTH)));
    counting_semaphore<2> semaphore(0);
    mpreal sqrt_10005;
    // get started computing sqrt(10005)
    pool->enqueue([&sqrt_10005, &semaphore] {
        mpreal::set_default_prec(PRECISION);  // Set default precision in bits
        mpfr::mpreal::set_emin(-20 * DIGITS);
        mpfr::mpreal::set_emax(+20 * DIGITS);
        sqrt_10005 = sqrt(mpreal(10005));
        semaphore.release();
    });
    // Compute the summation part of π
    PQBT pi_sum = compute_pi_chudnovsky_binary_split(0, NUM_TERMS, 0);
    semaphore.acquire();
    auto stop = chrono::high_resolution_clock::now();
    chrono::duration<double> elapsed = stop - start;
//    printf("Summation Time: %f seconds\n", elapsed.count());

    mpz_class BQ;
    pool->enqueue([&BQ, &pi_sum, &semaphore] {
        BQ = pi_sum.B * pi_sum.Q;
        semaphore.release();
    });
    mpreal pi;
    pool->enqueue([&pi, &sqrt_10005, &pi_sum, &semaphore] {
        mpreal::set_default_prec(PRECISION);  // Set default precision in bits
        mpfr::mpreal::set_emin(-20 * DIGITS);
        mpfr::mpreal::set_emax(+20 * DIGITS);
        pi = 426880 * sqrt_10005 / mpreal(pi_sum.T.get_mpz_t());
        semaphore.release();
    });
    semaphore.acquire();
    semaphore.acquire();
    assert(!isnan(pi));
    mpreal BQ_real = mpreal(BQ.get_mpz_t());
    assert(!isnan(BQ_real));
    pi = pi * BQ_real;
    assert(!isnan(pi));
    stop = chrono::high_resolution_clock::now();
    elapsed = stop - start;

    mp_exp_t exp;
#ifdef OUTPUT_HEX
    char* str = mpfr_get_str(nullptr, &exp, 16, (size_t)ceil(PRECISION / 4. - 3), pi.mpfr_ptr(), MPFR_RNDN);
#else
    char* str = mpfr_get_str(nullptr, &exp, 10, DIGITS, pi.mpfr_ptr(), MPFR_RNDN);
#endif
    for (int i = 0; i < exp; ++i)
        std::cout << str[i];                         // Integer part
    std::cout << "." << (str + exp) << std::endl;  // Fractional part
    free(str);
    printf("Computation Time: %f seconds\n", elapsed.count());
    return 0;
}

Even with the binary split algorithm, we can't compute digits much beyond 1B digits due to memory limitations. A 1B digit number requires approximately 3.3B bits or 415M bytes so it gets hard even with 64GB of memory to go much further.

See my HiDrive for files with digits of π.