Communication Imbalance

Program's name: Communication Imbalance
Available version(s): Programming language(s): C ·
Programming model(s): MPI ·

The Communication Imbalance kernel is a synthetic program which reproduces a communication pattern in between several MPI processes. Initially it computes a connectivity matrix which represents from/to which ranks will comunicate to one each other, and it also preassigns a given number of elements to each rank.

The kernel starts then the setup phase: it creates the local data structures according wiht the number of elements assigned to the current process, and it also creates the communication buffers (send, receive separately) according with the connectivity matrix. Rigth after the setup phase it starts the main loop of the algorithm.

The structure of the main algorithm loop is divided in different sub-phases. It start with an inter-process data exchange (in this phase receive operations are non-blocking services). After the communication phase, the program starts a data independent computational phase (overlapped with the previous receive operations); right after this local computation, the program waits for the finalization of the on-fly receive operations to start the a second computational phase and finally, execute the collective operation that will synchronize all processes at the end of each computational step.

The following pseudo-code summarizes this behavior:

   for (int step = 0; step < N_STEPS; step++) {

      for (int i = 0; i<n_neighs; i++)
           MPI_Irecv(&r_data[i][0], ...,  neighbors[i], ... , &requests[i]);

      for (int i = 0; i<n_neighs; i++) 
        MPI_Send(&s_data[i][0], ..., neighbors[i], ... );

      compute_local(n_elems, ... );

      MPI_Waitall(n_neighs, requests, status);  // waiting for all the receives

      work(); // Some additional code before collective. Might process data just arrived

      MPI_Allreduce(...);
   }

The main issue within that kernel is the work distribution among ranks (i.e., number of elements assigned to each rank), and the structure of the connectivity matrix (i.e., the number of neighbors per rank). These parameters could be manually changed in the source code.