The OpenMP standard provides a critical section construct, which only allows one thread to execute the block of code within the construct. This feature allows blocks of code to be protected from race conditions, for example with write accesses into a shared array or incrementing a shared counter. However, usage of this construct, especially within parallel loops, can severely reduce performance. This is due to serialisation of the execution causing threads to “queue” to enter the critical region, as well as introducing large lock-management overheads required to manage the critical region.
To illustrate this example, consider the following pseudo-code which shows an example usage of the critical section:
#pragma omp parallel for
for ( int i = 0; i < Ni; i++ ) {
// work on arrays
#pragma omp critical
{
// critical block. do some work to avoid race conditions
}
}
The above kernel was found in a seismic code which solves the 3D wave equation. The average ratio between the time to do the computation outside the critical section and the time in the critical section is around 8, which has been calculated empirically.
Recommended best-practice(s):