Some OpenMP application developers manually distribute the iterations of loops in parallel regions instead of using OpenMP’s worksharing constructs. This can prevent the runtime from using optimized schedules.
An example of such a situation is shown in the following code snippet:
#pragma omp parallel default(shared) num_threads(nThreads)
{
// identify thread ID
const int threadID = omp_get_thread_num();
// compute start index with partSize offset
int start = threadID*partSize;
int end;
// compute end index, where in the most of
// the cases the very last thread is less loaded
end = (threadID < nThreads - 1) ? \
(threadID + 1) * partSize - 1: \
(int) (nSize - 1);
for (int p = start; p <=end; ++p)
{
//do work
}
}
In the case of unequal work in each iteration, this can have a significant performance impact.
This issue usually occurs when transitioning a code from pThreads to OpenMP and is considered a bad practice in general, as OpenMP provides a range of work-sharing constructs to optimize the scheduling (though there might be some corner-cases where a manual control over loop iterations might be beneficial or even necessary).
Recommended best-practice(s):