BEM4I miniApp - Very fine grained chunks

Here we present a parallel functions configuration (from Paraver tool) focused on the kernel (matrix-vector multiplication).

OneIteration

The first issue is clearly seen from a basic analysis of this trace. We can see four computational blocks (green) related to four individual matrices \(K\), \(K'\), \(V\), and \(D\) that are working nicely. In the end, the last parallel loop has poor effectivity that is illustrated by black regions. In the second image presenting the states of the application, we can see that the in-effectivity is caused by a Scheduling and Fork/Join state. In other words, the OpenMP spends excessive time controlling the distribution of work among threads.

OneIteration_states