The following pseudo-code shows the proposed best-practice, namely, to use the COLLAPSE
clause to fuse the outer loops
!$OMP PARALLEL DO COLLAPSE(2) DEFAULT(NONE) SHARED(...)
DO K = 1, Nk
DO J = 1, Nj
DO I = 1, Ni
!! work to do
END DO
END DO
END DO
The COLLAPSE(2)
clause allows the parallelization of the perfectly nested first two loops where the compiler forms a
single loop with size Nk*Nj
and then parallelizes it. In this way the work from the outer two loops
is better balanced among the threads.
There are specific conditions to be fulfilled to use the COLLAPSE
clause (see, for more details on the conditions, this link for an exhaustive list of rules). The most important ones are: