Co-design at POP CoE project

MPI endpoint contention

MPI processes often have to communicate with a list of neighbours. Depending on the order of send and recieve calls it may happen that many processes get “synchronized” in that all of them try to send at the same time to the same given destination, resulting in the limited incoming bamdwidth at the destination becoming a limiter for the overall communication performance.

The pattern arises in the code structure sketched in the following figure. This approach of programming communications is fairly typical of many codes.

rank_id_t neighbors[N]; // ordered list of neighbors of this rank

for (int i=0; i < N; i++) {
   send(neighbors[i]);
}

Where neighbors is the list of neighbors of the process, and N is the number of neighbors. Typically the list is ordered from lower rank to higher rank neighbors. The result is that all neighbors of rank 0 send their first message to it, overloading its receive bandwidth.

Recommended best-practices: