BEM4I miniApp - Serialized computation and communication

Another issue in the BEM4I kernel might be seen in a very long computation part which is followed by a quite long collective communication (MPI_Allreduce function). During this MPI communication, all threads are doing nothing. Since this presented code appears multiple-times within each iteration of the GMRES solver, we may expect that it will be repeated a thousand times.

Here we present useful computation and MPI communication during one matrix-vector multiplication.

OneIteration_longComputation_comm

OneIteration_comm