The Open MPI library provides the facilities to write programs in C++ that can be executed on a compute cluster. The basic structure of an Open MPI program is to have one program that can run as the master/coordinating node or as a client node. The first instance of the program will assume the master role and the remaining instances on the cluster will assume the client roles. The Open MPI library facilities this as well as providing a message passing mechanism to communicate between the master and client nodes.
The basic outline of an Open MPI program is illustrated by pseudo-code as follows:
- initialize MPI
- if MASTER then
- create list of tasks to compute
- distribute initial tasks to workers
- terminate any unused workers
- while more tasks to process or workers still working
- receive results from a WORKER
- send new task to that WORKER or send terminate if no more tasks
- else WORKER then
- while true
- receive message from MASTER
- if TERMINATE message then break
- process data sent from MASTER
- send result back to MASTER
- while true
- finalize MPI
Initialize MPI
At the beginning of main()
we must make a call to MPI_Init()
passing the standard argc
and argv
values representing the command line arguments. Not only does this initialize the MPI environment but it also allows specific arguments targeted to MPI to be processed.
int main(int argc, char *argv[]) {
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
Then we can call MPI_Comm_rank()
and MPI_Comm_size()
to get the rank of the current process and the total size of the MPI environment. The size is the number of processes that are created by MPI on the cluster. This is based on the hostfile used when this program is run. The rank is the unique sequential identifier for each process where the MASTER process is rank 0
and the WORKER processes are rank 1
to size - 1
.
Then we branch into the code for the MASTER process and code for the WORKER processes by comparing the rank
to 0
.
MASTER Code
The MASTER code case is responsible for creating the work queue, sending tasks to the WORKER processes, sending TERMINATE messages to workers when there are no more tasks and assembling the results sent from the WORKER processes.
Messages are sent to a WORKER process using the MPI_Send()
function. It takes a number of arguments starting with a buffer for the data to be sent. The second and third arguments specify the number of items in the buffer and the type of items in the buffer. The fourth argument is the rank of the WORKER process to send the message. The fifth argument is a program defined code that identifies the type of the message. Our programs will have TASK, RESULT and TERMINATE message types so we will use 1
, 2
and 3
respectively. The last argument is an MPI communicator that manages a group of processes. Typically this is MPI_COMM_WORLD which can communicate with all the processes on our cluster.
MPI_Send(&tasks[task_index], 1, MPI_INT, i, TAG_TASK, MPI_COMM_WORLD);
We will also use MPI_Send() to also send TERMINATE messages when we want a WORKER process to end.
int dummy = 0;
MPI_Send(&dummy, 0, MPI_INT, i, TAG_TERMINATE, MPI_COMM_WORLD);
To wait for a message to come back from any WORKER, the MASTER process will use the MPI_Recv()
function. It takes a receiving buffer, item count and item type as the first three arguments. The fourth argument should be set to MPI_ANY_SOURCE so that we will accept messages from any of the WORKER processes. The fifth argument will use the code we created for RESULT type messages and the sixth argument is MPI_COMM_WORLD as it was with MPI_Send()
. There is one final argument that allows us to get additional status information from the WORKER process. The status structure has three ints: MPI_SOURCE, MPI_TAG and MPI_ERROR which return the rank of the sending process, the type of the message and an error code that can be useful for debugging, respectively.
int result;
MPI_Status status;
MPI_Recv(&result, 1, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status);
WORKER Code
The WORKER code is setup as an infinite loop. It receives messages with MPI_Recv()
that are from the MASTER process, performs some action based on the data sent and then returns a result to the MASTER process with MPI_Send()
.
The MPI_Recv()
will only accept messages from the MASTER and it will not discriminate based on the type of the message since we will be expecting TASK and TERMINATE message types.
MPI_Recv(&task_value, 1, MPI_INT, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
We can then compare the status.MPI_TAG
to the TERMINATE message type to determine if we should break the loop and return, thus ending the process.
Finalize MPI
At the end of the program we finalize the MPI environment with MPI_Finalize()
. This cleanup function waits until all WORKER processes have finished.
Simple Master/Worker Program that Passes Integers
The following complete program illustrates all the structural pieces of a master/worker program that passes integers back and forth. The MASTER process creates a list of integers as the work queue and feeds these integers to the WORKER processes. The WORKER processes accept the integer, square it and return the result back to the MASTER process that prints the result. There are other print statements that help you visualize what is going on with the message passing back and forth.
mpi_master_worker_int.cpp
/** * @file mpi_master_worker_int.cpp * @brief Demonstrates a simple Open MPI program that passes integers between the * master and workers. * * This demonstartion contains an example of how to implement an Open MPI program * in C++. The master process creates a work queue of integers that it uses to * send the integers to the workers. The workers square the integer and send that * back to the master. * * Compile with command: mpic++ mpi_master_worker.cpp -o mpi_master_worker * Run with command: mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker * * @author Richard Lesh * @date 2025-03-04 * @version 1.0 */ #include <algorithm> #include <iostream> #include <mpi.h> #include <vector> using namespace std; /** * @brief Task processing function. * * This function implements the processing for each worker node. In this * trivial example, the integer sent from the master is squred and returned. * * @param i The input value sent from the master. * @return int Returns the square of the input value. */ int process_task(int i) { return i * i; } /** * @brief Main function. * * The entry point of the program. It handles forking between the master * and worker functionality. * * @return int Returns 0 on successful execution. */ int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); // Initialize Open MPI // Get the rank and size of the process environment. // rank is an integer assigned to each process created where rank 0 is // the master process and ranks 1, 2, 3... are worker processes. // The size is the total number of processes running on the cluster. int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get the rank of the process MPI_Comm_size(MPI_COMM_WORLD, &size); // Get the total number of processes const int MASTER = 0; // Rank of the MASTER process const int TAG_TASK = 1; // Code for a TASK message const int TAG_RESULT = 2; // Code for a RESULT message const int TAG_TERMINATE = 3; // Code for a TERMINATE message if (rank == MASTER) { // Code for the single MASTER process int num_tasks = 100; // Number of tasks to distribute vector<int> tasks; // Setup the task list and fill it for (int i = 0; i < num_tasks; ++i) { tasks.push_back(i + 1); } int num_workers = size - 1; // Number of worker processes int task_index = 0; // Index into the task list int active_workers = num_workers; // Number of workers that are running cout << "Master distributing tasks to " << num_workers << " workers..." << endl; // Distribute initial tasks to workers for (int i = 1; i <= min(num_workers, num_tasks); ++i) { // MPI_Send() is used to send data to workers // int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) // buf is the address of the data to send // count is the count of items in the buffer // datatype is the type of the items in the buffer, i.e. MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR // dest is the rank of the worker process to send the message // tag is the code that identifies the type of the message. Used to distinguish different messages // comm is the communicator that defines the group of processes in which the message is sent, // typically MPI_COMM_WORLD cout << "Master sending task " << task_index << " to worker " << i << "..." << endl; MPI_Send(&tasks[task_index], 1, MPI_INT, i, TAG_TASK, MPI_COMM_WORLD); ++task_index; } // Terminate unused workers in the case the number of workers is more than available tasks if (num_workers > num_tasks) { for (int i = num_tasks + 1; i <= num_workers; ++i) { int dummy = 0; MPI_Send(&dummy, 0, MPI_INT, i, TAG_TERMINATE, MPI_COMM_WORLD); --active_workers; } } // Collect results and assign remaining tasks // Loop while there are tasks left to compute or if there are active workers while (task_index < num_tasks || active_workers > 0) { int result; // Buffer for the result from the worker MPI_Status status; // Status from the worker // MPI_Recv() is used to recieve data messages from the worker // int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, // MPI_Status *status); // buf is the address for the incomming data // count is the count of items the buffer holds // datatype is the type of the items in the buffer, i.e. MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR // source is the rank of the sending process. Use MPI_ANY_SOURCE to accept messages from any process. // comm is the communicator that defines the group of processes from which the message is recieved, // typically MPI_COMM_WORLD // A pointer to an MPI_Status structure that provides information about the received message // (e.g., sender rank, message size, and tag) MPI_Recv(&result, 1, MPI_INT, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status); int worker_rank = status.MPI_SOURCE; // Get the sender rank cout << "Master received result " << result << " from worker " << worker_rank << endl; // Send a new task to the worker, or terminate if no tasks are left if (task_index < num_tasks) { // Send a new task to the worker if there are tasks left cout << "Master sending task " << task_index << " to worker " << worker_rank << "..." << endl; MPI_Send(&tasks[task_index], 1, MPI_INT, worker_rank, TAG_TASK, MPI_COMM_WORLD); ++task_index; } else { // No more tasks; terminate the worker int dummy = 0; MPI_Send(&dummy, 0, MPI_INT, worker_rank, TAG_TERMINATE, MPI_COMM_WORLD); --active_workers; } } } else { // Code for the multiple WORKER processes while (true) { // Loop until we get the terminate message int task_value; MPI_Status status; // Receive a task or termination signal from the master // Note this MPI_Recv() uses MPI_ANY_TAG to accept any type of message from the master MPI_Recv(&task_value, 1, MPI_INT, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // If it is a TERMINATE type message, then exit if (status.MPI_TAG == TAG_TERMINATE) { cout << "Worker " << rank << " received termination signal. Exiting." << endl; break; } // Otherwise it is a task message so process the task cout << "Worker " << rank << " processing task " << task_value << "..." << endl; int result = process_task(task_value); // Send the result back to the master MPI_Send(&result, 1, MPI_INT, MASTER, TAG_RESULT, MPI_COMM_WORLD); } } MPI_Finalize(); // Finalize MPI return 0; }
Place the mpi_master_worker_int.cpp in your OpenMPI folder on your file server that all the nodes share. Then log on to your master node and compile the program. You must compile with the MPI C++ compiler executable "mpic++" which is a wrapper around the GCC compiler with the necessary MPI libraries supplied.
rich@balrog1:~ $ pushd OpenMPI
rich@balrog1:~/OpenMPI $ mpic++ mpi_master_worker_int.cpp -o mpi_master_worker_int
rich@balrog1:~ $ popd
Then you can run the mpi_master_worker_int program on your cluster using the "mpirun" command.
rich@balrog1:~ $ mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker_int
Example Output
rich@balrog1:~ $ mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker_int
Master: Distributing tasks to 31 workers…
Master sending task 0 to worker 1…
Master sending task 1 to worker 2…
Master sending task 2 to worker 3…
Master sending task 3 to worker 4…
Worker 2 processing task 2…
Worker 1 processing task 1…
Worker 3 processing task 3…
Master sending task 4 to worker 5…
Worker 4 processing task 4…
Master sending task 5 to worker 6…
Worker 5 processing task 5…
Master sending task 6 to worker 7…
Worker 6 processing task 6…
Master sending task 7 to worker 8…
Worker 7 processing task 7…
Master sending task 8 to worker 9…
Worker 8 processing task 8…
Master sending task 9 to worker 10…
Worker 9 processing task 9…
Master sending task 10 to worker 11…
Worker 10 processing task 10…
Master sending task 11 to worker 12…
Worker 11 processing task 11…
Master sending task 12 to worker 13…
Worker 12 processing task 12…
Master sending task 13 to worker 14…
Worker 13 processing task 13…
Master sending task 14 to worker 15…
Worker 14 processing task 14…
Master sending task 15 to worker 16…
Worker 15 processing task 15…
Master sending task 16 to worker 17…
Worker 16 processing task 16…
Master sending task 17 to worker 18…
Worker 17 processing task 17…
Master sending task 18 to worker 19…
Worker 18 processing task 18…
Master sending task 19 to worker 20…
Worker 19 processing task 19…
Master sending task 20 to worker 21…
Worker 20 processing task 20…
Master sending task 21 to worker 22…
Worker 21 processing task 21…
Master sending task 22 to worker 23…
Worker 22 processing task 22…
Master sending task 23 to worker 24…
Worker 23 processing task 23…
Master sending task 24 to worker 25…
Worker 24 processing task 24…
Master sending task 25 to worker 26…
Worker 25 processing task 25…
Master sending task 26 to worker 27…
Worker 26 processing task 26…
Master sending task 27 to worker 28…
Worker 27 processing task 27…
Master sending task 28 to worker 29…
Worker 28 processing task 28…
Master sending task 29 to worker 30…
Worker 29 processing task 29…
Master sending task 30 to worker 31…
Worker 30 processing task 30…
Master received result 1 from worker 1
Master sending task 31 to worker 1…
Worker 31 processing task 31…
Master received result 4 from worker 2
Master sending task 32 to worker 2…
Master received result 9 from worker 3
Master sending task 33 to worker 3…
Master received result 16 from worker 4
Master sending task 34 to worker 4…
Master received result 25 from worker 5
Master sending task 35 to worker 5…
Master received result 36 from worker 6
Worker 5 processing task 36…
Worker 2 processing task 33…
Worker 4 processing task 35…
Master sending task 36 to worker 6…
Master received result 49 from worker 7
Master sending task 37 to worker 7…
Master received result 64 from worker 8
Master sending task 38 to worker 8…
Master received result 81 from worker 9
Worker 7 processing task 38…
Worker 6 processing task 37…
Worker 9 processing task 40…
Master sending task 39 to worker 9…
Master received result 100 from worker 10
Master sending task 40 to worker 10…
Master received result 121 from worker 11
Worker 10 processing task 41…
Worker 13 processing task 44…
Master sending task 41 to worker 11…
Master received result 144 from worker 12
Master sending task 42 to worker 12…
Master received result 169 from worker 13
Master sending task 43 to worker 13…
Master received result 196 from worker 14
Worker 12 processing task 43…
Master sending task 44 to worker 14…
Master received result 225 from worker 15
Master sending task 45 to worker 15…
Master received result 256 from worker 16
Master sending task 46 to worker 16…
Master received result 289 from worker 17
Master sending task 47 to worker 17…
Master received result 324 from worker 18
Worker 17 processing task 48…
Worker 16 processing task 47…
Worker 21 processing task 52…
Master sending task 48 to worker 18…
Master received result 361 from worker 19
Master sending task 49 to worker 19…
Master received result 400 from worker 20
Master sending task 50 to worker 20…
Master received result 441 from worker 21
Master sending task 51 to worker 21…
Master received result 484 from worker 22
Worker 20 processing task 51…
Worker 25 processing task 56…
Master sending task 52 to worker 22…
Master received result 529 from worker 23
Master sending task 53 to worker 23…
Master received result 576 from worker 24
Master sending task 54 to worker 24…
Master received result 625 from worker 25
Worker 24 processing task 55…
Master sending task 55 to worker 25…
Master received result 676 from worker 26
Master sending task 56 to worker 26…
Master received result 729 from worker 27
Worker 26 processing task 57…
Worker 27 processing task 58…
Worker 30 processing task 61…
Master sending task 57 to worker 27…
Master received result 784 from worker 28
Master sending task 58 to worker 28…
Master received result 841 from worker 29
Master sending task 59 to worker 29…
Worker 29 processing task 60…
Master received result 900 from worker 30
Master sending task 60 to worker 30…
Master received result 1024 from worker 1
Master sending task 61 to worker 1…
Master received result 961 from worker 31
Master sending task 62 to worker 31…
Master received result 1089 from worker 2
Worker 5 processing task 67…
Master sending task 63 to worker 2…
Master received result 1156 from worker 3
Worker 7 processing task 69…
Worker 2 processing task 64…
Worker 6 processing task 68…
Worker 4 processing task 66…
Worker 9 processing task 71…
Master sending task 64 to worker 3…
Master received result 1225 from worker 4
Master sending task 65 to worker 4…
Master received result 1296 from worker 5
Master sending task 66 to worker 5…
Master received result 1369 from worker 6
Master sending task 67 to worker 6…
Master received result 1444 from worker 7
Master sending task 68 to worker 7…
Master received result 1521 from worker 8
Worker 10 processing task 72…
Master sending task 69 to worker 8…
Master received result 1600 from worker 9
Master sending task 70 to worker 9…
Master received result 1681 from worker 10
Master sending task 71 to worker 10…
Master received result 1764 from worker 11
Worker 13 processing task 75…
Worker 16 processing task 78…
Master sending task 72 to worker 11…
Master received result 1849 from worker 12
Master sending task 73 to worker 12…
Worker 12 processing task 74…
Master received result 1936 from worker 13
Master sending task 74 to worker 13…
Master received result 2025 from worker 14
Master sending task 75 to worker 14…
Master received result 2116 from worker 15
Master sending task 76 to worker 15…
Worker 18 processing task 49…
Master received result 2209 from worker 16
Master sending task 77 to worker 16…
Master received result 2304 from worker 17
Worker 17 processing task 79…
Worker 18 processing task 80…
Worker 20 processing task 82…
Master sending task 78 to worker 17…
Master received result 2401 from worker 18
Master sending task 79 to worker 18…
Master received result 2500 from worker 19
Master sending task 80 to worker 19…
Master received result 2601 from worker 20
Master sending task 81 to worker 20…
Master received result 2704 from worker 21
Master sending task 82 to worker 21…
Worker 21 processing task 83…
Master received result 2809 from worker 22
Master sending task 83 to worker 22…
Master received result 2916 from worker 23
Master sending task 84 to worker 23…
Master received result 3025 from worker 24
Master sending task 85 to worker 24…
Master received result 3136 from worker 25
Worker 24 processing task 86…
Worker 25 processing task 87…
Master sending task 86 to worker 25…
Worker 27 processing task 90…
Master received result 3844 from worker 1
Master sending task 87 to worker 1…
Master received result 3249 from worker 26
Master sending task 88 to worker 26…
Master received result 3364 from worker 27
Master sending task 89 to worker 27…
Master received result 3481 from worker 28
Worker 26 processing task 89…
Master sending task 90 to worker 28…
Master received result 3600 from worker 29
Master sending task 91 to worker 29…
Master received result 3721 from worker 30
Master sending task 92 to worker 30…
Master received result 3969 from worker 31
Worker 29 processing task 92…
Worker 30 processing task 93…
Worker 15 processing task 46…
Worker 15 processing task 77…
Worker 5 processing task 98…
Master sending task 93 to worker 31…
Master received result 4096 from worker 2
Master sending task 94 to worker 2…
Master received result 4225 from worker 3
Master sending task 95 to worker 3…
Master received result 4356 from worker 4
Master sending task 96 to worker 4…
Master received result 4489 from worker 5
Worker 4 processing task 97…
Worker 2 processing task 95…
Worker 14 processing task 45…
Worker 14 processing task 76…
Worker 19 processing task 50…
Worker 19 processing task 81…
Master sending task 97 to worker 5…
Master received result 4624 from worker 6
Worker 6 processing task 99…
Worker 28 processing task 59…
Worker 28 processing task 91…
Master sending task 98 to worker 6…
Master received result 4761 from worker 7
Worker 7 processing task 100…
Worker 8 processing task 39…
Worker 8 processing task 70…
Worker 8 received termination signal. Exiting.
Master sending task 99 to worker 7…
Master received result 4900 from worker 8
Master received result 5041 from worker 9
Master received result 5184 from worker 10
Worker 9 received termination signal. Exiting.
Master received result 5329 from worker 11
Master received result 5476 from worker 12
Worker 11 processing task 42…
Worker 11 processing task 73…
Worker 11 received termination signal. Exiting.
Worker 10 received termination signal. Exiting.
Worker 13 received termination signal. Exiting.
Worker 14 received termination signal. Exiting.
Worker 17 received termination signal. Exiting.
Master received result 5625 from worker 13
Master received result 5776 from worker 14
Worker 15 received termination signal. Exiting.
Master received result 5929 from worker 15
Master received result 6084 from worker 16
Master received result 6241 from worker 17
Master received result 7744 from worker 1
Worker 12 received termination signal. Exiting.
Worker 18 received termination signal. Exiting.
Worker 16 received termination signal. Exiting.
Worker 19 received termination signal. Exiting.
Worker 21 received termination signal. Exiting.
Master received result 6400 from worker 18
Master received result 6561 from worker 19
Worker 20 received termination signal. Exiting.
Worker 1 processing task 32…
Worker 1 processing task 62…
Worker 1 processing task 88…
Worker 1 received termination signal. Exiting.
Worker 22 processing task 53…
Worker 22 processing task 84…
Worker 22 received termination signal. Exiting.
Worker 3 processing task 34…
Worker 3 processing task 65…
Worker 3 processing task 96…
Master received result 6724 from worker 20
Master received result 6889 from worker 21
Master received result 7056 from worker 22
Master received result 7225 from worker 23
Worker 23 processing task 54…
Worker 23 processing task 85…
Worker 23 received termination signal. Exiting.
Worker 25 received termination signal. Exiting.
Master received result 7396 from worker 24
Master received result 7569 from worker 25
Master received result 7921 from worker 26
Master received result 8100 from worker 27
Master received result 8281 from worker 28
Master received result 8464 from worker 29
Master received result 8649 from worker 30
Master received result 8836 from worker 31
Worker 29 received termination signal. Exiting.
Worker 27 received termination signal. Exiting.
Worker 24 received termination signal. Exiting.
Worker 26 received termination signal. Exiting.
Worker 30 received termination signal. Exiting.
Worker 28 received termination signal. Exiting.
Worker 31 processing task 63…
Worker 31 processing task 94…
Worker 31 received termination signal. Exiting.
Worker 5 received termination signal. Exiting.
Master received result 9025 from worker 2
Master received result 9216 from worker 3
Master received result 9604 from worker 5
Master received result 9409 from worker 4
Master received result 9801 from worker 6
Master received result 10000 from worker 7
Worker 7 received termination signal. Exiting.
Worker 2 received termination signal. Exiting.
Worker 6 received termination signal. Exiting.
Worker 3 received termination signal. Exiting.
Worker 4 received termination signal. Exiting.
Simple Master/Worker Program that Passes a Structure
The following modified program illustrates a slightly more complex situation where a structure containing multiple data items is passed from the MASTER to the WORKER processes. The key difference is that we must define a structure type to act as the buffer to send to the WORKER processes.
// This is the structure that contains all the data to send to the worker
struct TaskData {
int task_id;
double task_data;
};
Then we create a MPI type descriptor that we use to describe our structure to the MPI environment.
MPI_Datatype MPI_Task_Descriptor;
int block_lengths[2] = {1, 1}; // 1 int, 1 double
MPI_Aint offsets[2]; // TaskData member offsets
offsets[0] = offsetof(TaskData, task_id);
offsets[1] = offsetof(TaskData, task_data);
MPI_Datatype types[2] = {MPI_INT, MPI_DOUBLE}; // MPI types of our members
And finally we use these two components to send data of type TaskData to the WORKER processes.
MPI_Send(&tasks[task_index], 1, MPI_Task_Descriptor, i, TAG_TASK, MPI_COMM_WORLD);
mpi_master_worker.cpp
/** * @file mpi_master_worker.cpp * @brief Demonstrates a simple Open MPI program that passes a structure between the * master and workers. * * This demonstartion contains an example of how to implement an Open MPI program * in C++. The master process creates a work queue of structs that it uses to * send to the workers. The workers square the double passed in the struct and send * that back to the master. * * Compile with command: mpic++ mpi_master_worker.cpp -o mpi_master_worker * Run with command: mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker * * @author Richard Lesh * @date 2025-03-04 * @version 1.0 */ #include <algorithm> #include <chrono> // For chrono::seconds #include <cstring> // For memset() #include <iostream> #include <math.h> #include <mpi.h> #include <string> #include <thread> // For this_thread::sleep_for() #include <vector> #include <unistd.h> // For gethostname() using namespace std; // This is the structure that contains all the data to send to the worker struct TaskData { int task_id; double task_data; }; /** * @brief Task processing function. * * This function implements the processing for each worker node. In this * trivial example, the double sent from the master is squared and returned. * Will also pause for 500 milliseconds to simulate a complex processing task. * * @param task The task structure sent from the master. * @return double Returns the square of the double data in the task. */ double process_task(TaskData &task) { this_thread::sleep_for(chrono::milliseconds(500)); return task.task_data * task.task_data; } /** * @brief Function to get the host's name. * * This function returns the hostname. * * @return string Returns the hostname of the host if possble otherwise it returns * the empty string. */ string get_hostname() { char hostname[256]; // Buffer to store the hostname memset(hostname, 0, sizeof(hostname)); // Clear the buffer // Get the hostname if (gethostname(hostname, sizeof(hostname)) == 0) { return string(hostname); } else { cerr << "Error getting hostname" << endl; return string(); } } /** * @brief Main function. * * The entry point of the program. It handles forking between the master * and worker functionality. * * @return int Returns 0 on successful execution. */ int main(int argc, char *argv[]) { MPI_Init(&argc, &argv); // Initialize MPI // Get the rank and size of the process environment. // rank is an integer assigned to each process created where rank 0 is // the master process and ranks 1, 2, 3... are worker processes. // The size is the total number of processes running on the cluster. int rank, size; MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get the rank of the process MPI_Comm_size(MPI_COMM_WORLD, &size); // Get the total number of processes const int MASTER = 0; // Rank of the MASTER process const int TAG_TASK = 1; // Code for a TASK message const int TAG_RESULT = 2; // Code for a RESULT message const int TAG_TERMINATE = 3; // Code for a TERMINATE message // Define the custom MPI datatype to pass to the workers // It will have two members: an int for the sequential task id and a double MPI_Datatype MPI_Task_Descriptor; int block_lengths[2] = {1, 1}; // 1 int, 1 double MPI_Aint offsets[2]; // TaskData member offsets offsets[0] = offsetof(TaskData, task_id); offsets[1] = offsetof(TaskData, task_data); MPI_Datatype types[2] = {MPI_INT, MPI_DOUBLE}; // MPI types of our members // Fill in the MPI_Task_Descriptor MPI_Type_create_struct(2, block_lengths, offsets, types, &MPI_Task_Descriptor); MPI_Type_commit(&MPI_Task_Descriptor); // Tell MPI about the task type if (rank == MASTER) { // Code for the single MASTER process int num_tasks = 100; // Number of tasks to distribute vector<TaskData> tasks; // Setup the task list and fill it for (int i = 0; i < num_tasks; ++i) { // Generate tasks TaskData task; task.task_id = i; task.task_data = i * M_PI; tasks.push_back(task); } int num_workers = size - 1; // Number of worker processes int task_index = 0; // Index into the task list int active_workers = num_workers; // Number of workers that are running cout << "Master distributing tasks to " << num_workers << " workers..." << endl; // Distribute initial tasks to workers for (int i = 1; i <= min(num_workers, num_tasks); ++i) { // MPI_Send() is used to send data to workers // int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) // buf is the address of the data to send // count is the count of items in the buffer // datatype is the type of the items in the buffer, in this case we use our MPI_Task_Descriptor // dest is the rank of the worker process to send the message // tag is the code that identifies the type of the message. Used to distinguish different messages // comm is the communicator that defines the group of processes in which the message is sent, // typically MPI_COMM_WORLD cout << "Master sending task " << task_index << " to worker " << i << "..." << endl; MPI_Send(&tasks[task_index], 1, MPI_Task_Descriptor, i, TAG_TASK, MPI_COMM_WORLD); ++task_index; } // Terminate unused workers in the case the number of workers is more than available tasks if (num_workers > num_tasks) { for (int i = num_tasks + 1; i <= num_workers; ++i) { int dummy = 0; MPI_Send(&dummy, 0, MPI_INT, i, TAG_TERMINATE, MPI_COMM_WORLD); --active_workers; } } // Collect results and assign remaining tasks // Loop while there are tasks left to compute or if there are active workers while (task_index < num_tasks || active_workers > 0) { double result; // Buffer for the result from the worker MPI_Status status; // Status from the worker // MPI_Recv() is used to recieve data messages from the worker // int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, // MPI_Status *status); // buf is the address for the incomming data // count is the count of items the buffer holds // datatype is the type of the items in the buffer, i.e. MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_CHAR // source is the rank of the sending process. Use MPI_ANY_SOURCE to accept messages from any process. // comm is the communicator that defines the group of processes from which the message is recieved, // typically MPI_COMM_WORLD // A pointer to an MPI_Status structure that provides information about the received message // (e.g., sender rank, message size, and tag) MPI_Recv(&result, 1, MPI_DOUBLE, MPI_ANY_SOURCE, TAG_RESULT, MPI_COMM_WORLD, &status); int worker_rank = status.MPI_SOURCE; // Get the sender rank cout << "Master received result " << result << " from worker " << worker_rank << endl; // Send a new task to the worker, or terminate if no tasks are left if (task_index < num_tasks) { // Send a new task to the worker if there are tasks left cout << "Master sending task " << task_index << " to worker " << worker_rank << "..." << endl; MPI_Send(&tasks[task_index], 1, MPI_Task_Descriptor, worker_rank, TAG_TASK, MPI_COMM_WORLD); ++task_index; } else { // No more tasks; terminate the worker int dummy = 0; MPI_Send(&dummy, 1, MPI_INT, worker_rank, TAG_TERMINATE, MPI_COMM_WORLD); --active_workers; } } } else { // Code for the multiple WORKER processes string hostname = get_hostname(); while (true) { // Loop until we get the terminate message TaskData task; MPI_Status status; // Receive a task or termination signal from the master // Note this MPI_Recv() uses MPI_ANY_TAG to accept any type of message from the master MPI_Recv(&task, 1, MPI_Task_Descriptor, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status); // If it is a TERMINATE type message, then exit if (status.MPI_TAG == TAG_TERMINATE) { cout << "Worker " << rank << "(" << hostname << ") received termination signal. Exiting." << endl; break; } // Otherwise it is a task message so process the task cout << "Worker " << rank << "(" << hostname << ") processing task " << task.task_id << "..." << endl; double result = process_task(task); // Send the result back to the master MPI_Send(&result, 1, MPI_DOUBLE, MASTER, TAG_RESULT, MPI_COMM_WORLD); } } MPI_Finalize(); // Finalize MPI return 0; }
Place the mpi_master_worker.cpp in your OpenMPI folder on your file server that all the nodes share. Then log on to your master node and compile the program. You must compile with the MPI C++ compiler executable "mpic++" which is a wrapper around the GCC compiler with the necessary MPI libraries supplied.
rich@balrog1:~ $ pushd OpenMPI
rich@balrog1:~/OpenMPI $ mpic++ mpi_master_worker.cpp -o mpi_master_worker
rich@balrog1:~ $ popd
Then you can run the mpi_master_worker program on your cluster using the "mpirun" command.
rich@balrog1:~ $ mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker
Example Output
rich@balrog1:~ $ mpirun --hostfile OpenMPI/nodes.txt OpenMPI/mpi_master_worker
Master distributing tasks to 31 workers…
Master sending task 0 to worker 1…
Master sending task 1 to worker 2…
Master sending task 2 to worker 3…
Master sending task 3 to worker 4…
Worker 1(balrog1) processing task 0…
Worker 3(balrog1) processing task 2…
Worker 2(balrog1) processing task 1…
Master sending task 4 to worker 5…
Worker 4(balrog2) processing task 3…
Master sending task 5 to worker 6…
Worker 5(balrog2) processing task 4…
Master sending task 6 to worker 7…
Worker 6(balrog2) processing task 5…
Master sending task 7 to worker 8…
Worker 7(balrog2) processing task 6…
Master sending task 8 to worker 9…
Worker 8(balrog3) processing task 7…
Master sending task 9 to worker 10…
Worker 9(balrog3) processing task 8…
Master sending task 10 to worker 11…
Worker 10(balrog3) processing task 9…
Master sending task 11 to worker 12…
Worker 11(balrog3) processing task 10…
Master sending task 12 to worker 13…
Worker 12(balrog4) processing task 11…
Master sending task 13 to worker 14…
Worker 13(balrog4) processing task 12…
Master sending task 14 to worker 15…
Worker 14(balrog4) processing task 13…
Master sending task 15 to worker 16…
Worker 15(balrog4) processing task 14…
Master sending task 16 to worker 17…
Worker 16(balrog5) processing task 15…
Master sending task 17 to worker 18…
Worker 17(balrog5) processing task 16…
Master sending task 18 to worker 19…
Worker 18(balrog5) processing task 17…
Master sending task 19 to worker 20…
Worker 19(balrog5) processing task 18…
Master sending task 20 to worker 21…
Worker 20(balrog6) processing task 19…
Master sending task 21 to worker 22…
Worker 21(balrog6) processing task 20…
Master sending task 22 to worker 23…
Worker 22(balrog6) processing task 21…
Master sending task 23 to worker 24…
Worker 23(balrog6) processing task 22…
Master sending task 24 to worker 25…
Worker 24(balrog7) processing task 23…
Master sending task 25 to worker 26…
Worker 25(balrog7) processing task 24…
Master sending task 26 to worker 27…
Worker 26(balrog7) processing task 25…
Master sending task 27 to worker 28…
Worker 27(balrog7) processing task 26…
Master sending task 28 to worker 29…
Worker 28(balrog8) processing task 27…
Master sending task 29 to worker 30…
Worker 29(balrog8) processing task 28…
Master sending task 30 to worker 31…
Worker 30(balrog8) processing task 29…
Worker 31(balrog8) processing task 30…
Master received result 0 from worker 1
Worker 1(balrog1) processing task 31…
Worker 2(balrog1) processing task 32…
Worker 3(balrog1) processing task 33…
Master sending task 31 to worker 1…
Master received result 9.8696 from worker 2
Master sending task 32 to worker 2…
Master received result 39.4784 from worker 3
Master sending task 33 to worker 3…
Master received result 88.8264 from worker 4
Master sending task 34 to worker 4…
Worker 4(balrog2) processing task 34…
Master received result 157.914 from worker 5
Master sending task 35 to worker 5…
Worker 5(balrog2) processing task 35…
Master received result 246.74 from worker 6
Master sending task 36 to worker 6…
Worker 6(balrog2) processing task 36…
Master received result 355.306 from worker 7
Master sending task 37 to worker 7…
Worker 7(balrog2) processing task 37…
Master received result 483.611 from worker 8
Master sending task 38 to worker 8…
Worker 8(balrog3) processing task 38…
Master received result 631.655 from worker 9
Master sending task 39 to worker 9…
Worker 9(balrog3) processing task 39…
Master received result 799.438 from worker 10
Master sending task 40 to worker 10…
Worker 10(balrog3) processing task 40…
Master received result 986.96 from worker 11
Master sending task 41 to worker 11…
Worker 11(balrog3) processing task 41…
Master received result 1194.22 from worker 12
Master sending task 42 to worker 12…
Worker 12(balrog4) processing task 42…
Master received result 1421.22 from worker 13
Master sending task 43 to worker 13…
Worker 13(balrog4) processing task 43…
Master received result 1667.96 from worker 14
Master sending task 44 to worker 14…
Worker 14(balrog4) processing task 44…
Master received result 1934.44 from worker 15
Master sending task 45 to worker 15…
Worker 15(balrog4) processing task 45…
Master received result 2220.66 from worker 16
Master sending task 46 to worker 16…
Worker 16(balrog5) processing task 46…
Master received result 2526.62 from worker 17
Master sending task 47 to worker 17…
Worker 17(balrog5) processing task 47…
Master received result 2852.32 from worker 18
Master sending task 48 to worker 18…
Worker 18(balrog5) processing task 48…
Master received result 3197.75 from worker 19
Master sending task 49 to worker 19…
Worker 19(balrog5) processing task 49…
Master received result 3562.93 from worker 20
Master sending task 50 to worker 20…
Worker 20(balrog6) processing task 50…
Master received result 3947.84 from worker 21
Master sending task 51 to worker 21…
Worker 21(balrog6) processing task 51…
Worker 22(balrog6) processing task 52…
Master received result 4352.5 from worker 22
Master sending task 52 to worker 22…
Worker 23(balrog6) processing task 53…
Master received result 4776.89 from worker 23
Master sending task 53 to worker 23…
Master received result 5221.02 from worker 24
Worker 24(balrog7) processing task 54…
Master sending task 54 to worker 24…
Master received result 5684.89 from worker 25
Master sending task 55 to worker 25…
Worker 25(balrog7) processing task 55…
Master received result 6168.5 from worker 26
Master sending task 56 to worker 26…
Worker 26(balrog7) processing task 56…
Master received result 6671.85 from worker 27
Master sending task 57 to worker 27…
Worker 27(balrog7) processing task 57…
Master received result 7194.94 from worker 28
Master sending task 58 to worker 28…
Worker 28(balrog8) processing task 58…
Master received result 7737.77 from worker 29
Master sending task 59 to worker 29…
Worker 29(balrog8) processing task 59…
Master received result 8300.34 from worker 30
Master sending task 60 to worker 30…
Worker 30(balrog8) processing task 60…
Master received result 8882.64 from worker 31
Master sending task 61 to worker 31…
Worker 31(balrog8) processing task 61…
Master received result 9484.69 from worker 1
Master sending task 62 to worker 1…
Master received result 10106.5 from worker 2
Master sending task 63 to worker 2…
Worker 1(balrog1) processing task 62…
Worker 2(balrog1) processing task 63…
Master received result 10748 from worker 3
Master sending task 64 to worker 3…
Worker 3(balrog1) processing task 64…
Master received result 11409.3 from worker 4
Master sending task 65 to worker 4…
Worker 4(balrog2) processing task 65…
Master received result 12090.3 from worker 5
Master sending task 66 to worker 5…
Worker 5(balrog2) processing task 66…
Master received result 12791 from worker 6
Master sending task 67 to worker 6…
Worker 6(balrog2) processing task 67…
Master received result 13511.5 from worker 7
Master sending task 68 to worker 7…
Worker 7(balrog2) processing task 68…
Master received result 14251.7 from worker 8
Master sending task 69 to worker 8…
Worker 8(balrog3) processing task 69…
Master received result 15011.7 from worker 9
Master sending task 70 to worker 9…
Worker 9(balrog3) processing task 70…
Master received result 15791.4 from worker 10
Master sending task 71 to worker 10…
Worker 10(balrog3) processing task 71…
Master received result 16590.8 from worker 11
Master sending task 72 to worker 11…
Worker 11(balrog3) processing task 72…
Master received result 17410 from worker 12
Master sending task 73 to worker 12…
Worker 12(balrog4) processing task 73…
Master received result 18248.9 from worker 13
Master sending task 74 to worker 13…
Worker 13(balrog4) processing task 74…
Master received result 19107.6 from worker 14
Master sending task 75 to worker 14…
Worker 14(balrog4) processing task 75…
Master received result 19985.9 from worker 15
Master sending task 76 to worker 15…
Worker 15(balrog4) processing task 76…
Master received result 20884.1 from worker 16
Master sending task 77 to worker 16…
Worker 16(balrog5) processing task 77…
Master received result 21802 from worker 17
Master sending task 78 to worker 17…
Worker 17(balrog5) processing task 78…
Master received result 22739.6 from worker 18
Master sending task 79 to worker 18…
Worker 18(balrog5) processing task 79…
Master received result 23696.9 from worker 19
Master sending task 80 to worker 19…
Worker 19(balrog5) processing task 80…
Master received result 24674 from worker 20
Master sending task 81 to worker 20…
Worker 20(balrog6) processing task 81…
Master received result 25670.8 from worker 21
Master sending task 82 to worker 21…
Worker 21(balrog6) processing task 82…
Master received result 26687.4 from worker 22
Master sending task 83 to worker 22…
Worker 22(balrog6) processing task 83…
Master received result 27723.7 from worker 23
Master sending task 84 to worker 23…
Worker 23(balrog6) processing task 84…
Master received result 28779.8 from worker 24
Master sending task 85 to worker 24…
Worker 24(balrog7) processing task 85…
Master received result 29855.6 from worker 25
Master sending task 86 to worker 25…
Worker 25(balrog7) processing task 86…
Master received result 30951.1 from worker 26
Master sending task 87 to worker 26…
Worker 26(balrog7) processing task 87…
Master received result 32066.3 from worker 27
Master sending task 88 to worker 27…
Worker 27(balrog7) processing task 88…
Master received result 33201.3 from worker 28
Master sending task 89 to worker 28…
Worker 28(balrog8) processing task 89…
Master received result 34356.1 from worker 29
Master sending task 90 to worker 29…
Worker 29(balrog8) processing task 90…
Master received result 35530.6 from worker 30
Master sending task 91 to worker 30…
Worker 30(balrog8) processing task 91…
Master received result 36724.8 from worker 31
Master sending task 92 to worker 31…
Worker 31(balrog8) processing task 92…
Master received result 37938.8 from worker 1
Master sending task 93 to worker 1…
Worker 1(balrog1) processing task 93…
Master received result 39172.5 from worker 2
Master sending task 94 to worker 2…
Worker 2(balrog1) processing task 94…
Master received result 40425.9 from worker 3
Master sending task 95 to worker 3…
Worker 3(balrog1) processing task 95…
Master received result 41699.1 from worker 4
Master sending task 96 to worker 4…
Worker 4(balrog2) processing task 96…
Master received result 42992 from worker 5
Master sending task 97 to worker 5…
Worker 5(balrog2) processing task 97…
Master received result 44304.7 from worker 6
Master sending task 98 to worker 6…
Worker 6(balrog2) processing task 98…
Master received result 45637.1 from worker 7
Master sending task 99 to worker 7…
Worker 7(balrog2) processing task 99…
Master received result 46989.2 from worker 8
Worker 8(balrog3) received termination signal. Exiting.
Master received result 48361.1 from worker 9
Worker 9(balrog3) received termination signal. Exiting.
Master received result 49752.7 from worker 10
Worker 10(balrog3) received termination signal. Exiting.
Master received result 51164 from worker 11
Worker 11(balrog3) received termination signal. Exiting.
Master received result 52595.1 from worker 12
Worker 12(balrog4) received termination signal. Exiting.
Master received result 54046 from worker 13
Worker 13(balrog4) received termination signal. Exiting.
Master received result 55516.5 from worker 14
Worker 14(balrog4) received termination signal. Exiting.
Master received result 57006.8 from worker 15
Worker 15(balrog4) received termination signal. Exiting.
Master received result 58516.9 from worker 16
Worker 16(balrog5) received termination signal. Exiting.
Master received result 60046.7 from worker 17
Worker 17(balrog5) received termination signal. Exiting.
Master received result 61596.2 from worker 18
Worker 18(balrog5) received termination signal. Exiting.
Master received result 63165.5 from worker 19
Worker 19(balrog5) received termination signal. Exiting.
Master received result 64754.5 from worker 20
Worker 20(balrog6) received termination signal. Exiting.
Master received result 66363.2 from worker 21
Worker 21(balrog6) received termination signal. Exiting.
Master received result 67991.7 from worker 22
Worker 22(balrog6) received termination signal. Exiting.
Master received result 69639.9 from worker 23
Worker 23(balrog6) received termination signal. Exiting.
Master received result 71307.9 from worker 24
Worker 24(balrog7) received termination signal. Exiting.
Master received result 72995.6 from worker 25
Worker 25(balrog7) received termination signal. Exiting.
Master received result 74703 from worker 26
Worker 26(balrog7) received termination signal. Exiting.
Master received result 76430.2 from worker 27
Worker 27(balrog7) received termination signal. Exiting.
Master received result 78177.1 from worker 28
Worker 28(balrog8) received termination signal. Exiting.
Master received result 79943.8 from worker 29
Worker 29(balrog8) received termination signal. Exiting.
Master received result 81730.2 from worker 30
Worker 30(balrog8) received termination signal. Exiting.
Master received result 83536.3 from worker 31
Worker 31(balrog8) received termination signal. Exiting.
Master received result 85362.2 from worker 1
Master received result 87207.8 from worker 2
Worker 2(balrog1) received termination signal. Exiting.
Worker 1(balrog1) received termination signal. Exiting.
Master received result 89073.2 from worker 3
Worker 3(balrog1) received termination signal. Exiting.
Master received result 90958.3 from worker 4
Worker 4(balrog2) received termination signal. Exiting.
Master received result 92863.1 from worker 5
Worker 5(balrog2) received termination signal. Exiting.
Master received result 94787.7 from worker 6
Worker 6(balrog2) received termination signal. Exiting.
Master received result 96732 from worker 7
Worker 7(balrog2) received termination signal. Exiting.
Resources
Open MPI: Open Source High Performance Computing
Raspberry Pi 5 Cluster