Home Resources for Co-Design Patterns Non-blocking-sends-preventing-computation

Wait for non-blocking send operations preventing computational progress

Usual symptom(s):

Transfer Efficiency: The Transfer Efficiency (TE) measures inefficiencies due to time in data transfer. (more...)

MPI programmers often use non-blocking calls to overlap communication and computation. In such codes, the MPI process communicates with its neighbors through a sequence of stages: 1) an optional pack of data (if needed); 2) a set of send/receive non-blocking operations, which potentially could overlap one to each other; 3) wait for communications (potentially splitting for send and receive requests; and 4) the computational phase.

A frequent code structure could be represented as follows:

for(it=0; i<ITERS; it++) {
   pack(data, s_buffer);

   for (n=0; n<n_neigbours; n++) {
      MPI_Irecv (n, r_buffer[n], irecv_req[n]);
      MPI_Isend (n, s_buffer[n], isend_req[n]);
   }

   MPI_waitall (irecv_reqs);
   MPI_waitall (isend_reqs);

   unpack(r_buffer, data);

   Computation(data);  // Parallelized with OpenMP

} // End of the loop on ITERS

The main problem of the aforementioned pseudo-code is that programmers treat equally the received operations, needed for the computational phase, and the send operations, which in general are not required in order to start the Computational phase. This waitall may unnecessarily delay the continuation of the program.

This pattern was observed, for example, in the analysis of the IFS weather code, and the PySDC parallel-in-time solver.

The following Figure shows the details of one of the communication phases on the IFS application:

ifs_send-waits

The light green MPI calls are the waitall operations for the non-blocking sends made by the process. They are quite long (tens of ms in a 420 MPI processes x 4 OpenMP threads run). It might me good to explore the possibility of postponing them and advance the computation that follows them. Doing the waitalls just before the following alltoallv phase (corresponding to the gold color in the Figure) has the potential of reducing the time in these calls.

Recommended best-practice(s):

Postpone the execution of non-blocking send waits operations

Related program(s):

False communication-computation overlap (original)

Related report(s):

IFS-FVM

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No 676553 (POP1) and 824080 (POP2).

Currently, the project receives funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101143931 (POP3). The JU receives support from the European Union's Horizon Europe research and innovation programme and Spain, Germany, France, Portugal and the Czech Republic.