A frequent practice in hybrid programming is to only parallelize with OpenMP the main computational regions. The communication phases are left as in the original MPI program and thus execute in order in the main thread while other threads are idling. This may limit the scalability of hybrid programs and often results in the hybrid code being slower than an equivalent pure MPI code using the same total number of cores. (more...)
This is one of the several alternatives to parallelize the packing and unpacking operations when using a Message Passing Interface. The main idea consist on encapsulating send and receives calls within an unstructured task, giving the opportunity to overlap those communication with computation or with other communication.