The amount of communication imbalance exposed by the Communication Imbalance kernel will depend on how the neighbor lists are initialized.
The following Figure shows an example of a 24 processors run in Marenostrum IV where every process communicates to approximately 8 others plus to rank 0, according with the original version of the algorithm.
The trace shows how the imbalance in the number of MPI_Irecvs
and MPI_Sends
done by rank 0 propagates through the balanced compute and is paid by all other
threads in the MPI_Allreduce
.