Dynamic loop scheduling

Pattern addressed: Load imbalance due to computational complexity (unknown a priori)

In some algorithms it is possible to divide the whole computation into smaller, independent subparts. These subparts may then be executed in parallel by different workers. Even though the data, which is worked on in each subpart, might be well balanced in terms of memory requirements there may be a load imbalance of the workloads. This imbalance may occur if the computational complexity of each subpart depends on the actual data and cannot be estimated prior to execution. (more...)

Required condition: When the number of iterations is greater number of workers

When parallelizing a loop which independent iterations may have different execution time, the number of iterations is large (compared to the number of cores), and the granularity of each iteration may be small. A parallelization like:

#pragma omp parallel for schedule(dynamic, nn)
 for(i=0; i<N; i++) {
      do_work(i); //actual amount of work depends on I but not known a priori
}

Will result in a balanced execution. The dynamic chunk nn may be tuned to avoid overhead while keeping the balancing properties of the dynamic schedule.

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No 676553 (POP1) and 824080 (POP2).

Currently, the project receives funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101143931 (POP3). The JU receives support from the European Union's Horizon Europe research and innovation programme and Spain, Germany, France, Portugal and the Czech Republic.