The postpone-wait version of the False communication-computation overlap kernel changes when the MPI_Waitall for the MPI_Isend is done. The call is postponed until the computation ends, succesfully overlapping the communication and the computation.
The following code snippet shows the change introduced to the code:
pack(...);
for(n = 0; n < n_neighbours; n++){
MPI_Irecv(r_buffer, rSIZE, ..., neighbours[n], ..., &irecv_req[n]);
MPI_Isend(s_buffer, sSIZE, ..., neighbours[n], ..., &isend_req[n]);
}
if(n_neighbours){
MPI_Waitall(n_neighbours, irecv_req, irecv_stat);
}
unpack(...);
computation(...);
MPI_Waitall(n_neighbours, isend_req, isend_stat); //This wait here!