Dynamic loop scheduling

Pattern addressed: Load imbalance due to computational complexity (unknown a priori)

In some algorithms it is possible to divide the whole computation into smaller, independent subparts. These subparts may then be executed in parallel by different workers. Even though the data, which is worked on in each subpart, might be well balanced in terms of memory requirements there may be a load imbalance of the workloads. This imbalance may occur if the computational complexity of each subpart depends on the actual data and cannot be estimated prior to execution. (more...)

Required condition: When the number of iterations is greater number of workers

When parallelizing a loop which independent iterations may have different execution time, the number of iterations is large (compared to the number of cores), and the granularity of each iteration may be small. A parallelization like:

#pragma omp parallel for schedule(dynamic, nn)
 for(i=0; i<N; i++) {
      do_work(i); //actual amount of work depends on I but not known a priori
}

Will result in a balanced execution. The dynamic chunk nn may be tuned to avoid overhead while keeping the balancing properties of the dynamic schedule.