The Message Passing Interface (MPI) is an Application Program Interface that defines a model of parallel computing where each parallel process has its own local memory, and data must be explicitly shared by passing messages between processes. Using MPI allows programs to scale beyond the processors and shared memory of a single compute server, to the distributed memory and processors of multiple compute servers combined together.
An MPI parallel code requires some changes from serial code, as MPI function calls to communicate data are added, and the data must somehow be divided across processes.
Hello World Example
Here is an example MPI program called mpihello.c:
#include <mpi.h> //line 1
#include <stdio.h> //line 2
int main( int argc, char **argv ) { //line 3
int rank, size; //line 4
MPI_Init( &argc, &argv ); //line 5
MPI_Comm_rank( MPI_COMM_WORLD, &rank ); //line 6
MPI_Comm_size( MPI_COMM_WORLD, &size ); //line 7
printf( "Hello from process %d/%d\n", rank, size ); //line 8
MPI_Finalize( ); //line 9
return 0; //line 10
}
This hello world program has a considerable amount added to it from the standard C example. There are a number of things to point out:
- line 1: We include the MPI header here to have access to the various MPI functions.
- line 5: Here we initialize the MPI execution environment. This must be done at the start of the program.
- line 6: Each MPI process is assigned a unique integer ID starting at 0. This function retrieves the local ID and stores it in the local variable rank.
- line 7: This function retrieves the total number of MPI processes running and stores it in the local variable size.
- line 8: Here we print hello world with the environment information we gathered in the previous lines.
- line 9: Here we finalize the MPI execution environment. This must be done before the end of the program.
Running an MPI Program
To compile and run the above mpihello.c program:
wget https://carleton.ca/rcs/wp-content/uploads/mpihello.c
mpicc -o mpihello mpihello.c
mpirun -np 4 ./mpihello
You should notice a few major differences between compiling and running this program compared to a standard hello world C program. First we need to use mpicc to compile. Second, we must use mpirun to run our program. The most important parameter that mpirun needs is np (the number of processes). In this case np = 4, so there will be four MPI processes run.
Message Passing Example
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if ( rank == 0 ) {
/* Rank 0 sends an integer to each of the other process ranks */
int i;
int value = 0;
for (i = 1; i < size; i++) {
value = value + i;
MPI_Send(&value, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
}
else {
/* All other process ranks receive one number from Rank 0 */
int value;
MPI_Status status;
MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
printf("Rank %d received value %d\n", rank, value);
}
MPI_Finalize();
return 0;
}
To compile and run the above mpiexample.c program:
wget https://carleton.ca/rcs/wp-content/uploads/mpiexample.c
mpicc -o mpiexample mpiexample.c
mpirun -np 4 ./mpiexample
NOTE: this program will work for any number of processes greater than or equal to 2.
In this example we have the process with rank 0 send an integer to all of the other processes which will then print them to the console. In this example we use MPI_Comm_rank to determine the individual process’ role in the program and MPI_Comm_size to determine how many messages need to be sent. This example uses what is called “blocking” sends and receives. In the following sections a list of other communication operations are discussed.
Measuring Running Time
It is often desirable to measure the running time of your program. To do this in an MPI program you can use the function MPI_Wtime(). This function returns a floating-point number of seconds, representing elapsed wall-clock time since some time in the past. Here is an example of how one would use this function to measure the running time of a given piece of code:
{
double starttime, endtime;
starttime = MPI_Wtime();
.... code to be timed ...
endtime = MPI_Wtime();
printf("Total run time: %f seconds\n",endtime-starttime);
}
MPI Communication Functions
Blocking Communication Functions
Blocking communication are routines where the completion of the call is dependent on certain “events”. For sends, the data must be successfully sent or safely copied to system buffer space and for receives, the data must be safely stored in the receive buffer. Generally speaking, this should be the go-to option for direct communication. The functions are as follows:
- MPI_Send(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
- MPI_Recv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
Non-blocking Communication Functions
A communication routine is non-blocking if the call returns without waiting for the communications to complete. It is the programmer’s responsibility to insure that the buffer is free for reuse.
These are primarily used to increase performance by overlapping computation with communication. It is recommended to first get your program working using blocking communication before attempting to use non-blocking functions. The functions are as follows:
- MPI_Isend(void* buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request)
- MPI_Irecv(void* buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request)
Collective Communication Operations
There are three categories of functions that fall under collective communication operations. They are:
- Synchronization – processes wait until all members of the group have reached the synchronization point. The function used to do this is:
- MPI_Barrier (comm)
This causes each process, when reaching the MPI_Barrier call, to block until all tasks in the group reach the same MPI_Barrier call.
- MPI_Barrier (comm)
- Data movement – The functions that fall into this category are as follows:
- MPI_Bcast: Broadcast sends a message from the process with rank “root” to all other processes in the group.
- MPI_Scatter: The scatter operation performs a one-to-all communication. It splits the message into n equal segments with the ith segment sent to the ith process in the group
- MPI_Gather: This function is the logical opposite to MPI_Scatter, it performs an all-to-one communication where a total of n data items are collected at a single process for all of the other processes.
- MPI_Alltoall: This function is an all-to-all version of MPI_Scatter, where every process is sending and receiving n data segments
- MPI_Alltoallv: This is a generalization of MPI_Alltoall where each process sends/receives a customizable amount of data to/from each process.
- Collective computation (reductions) – one member of the group collects data from the other members and performs an operation (min, max, add, multiply, etc.) on that data.
The function to do this is:- MPI_Reduce: One of the parameters of this function is the operation to be performed. Examples of this are: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, etc.
For more information on these functions (including the necessary parameters), other available MPI functions and othet general MPI queries please refer to the official OpenMPI documentation or get in touch with someone in RCS.