Replacing critical section with reduction
OpenMP critical section:
The OpenMP standard provides a critical section construct, which only allows
one thread to execute the block of code within the construct. This feature
allows blocks of code to be protected from race conditions, for example with
write accesses into a shared array or incrementing a shared counter. However,
usage of this construct, especially within parallel loops, can severely reduce
performance. This is due to serialisation of the execution causing threads to
“queue” to enter the critical region, as well as introducing large
lock-management overheads required to manage the critical region.
(more...)
When the critical section corresponds to a recurrent operation
This best practice recommends that if the critical block is performing a
reduction operation, this be replaced by the OpenMP reduction clause which has
a much lower overhead than a critical section.
Consider the following pseudo-code:
sum = 0.0;
#pragma omp parallel for
for ( int i = 0; i < Ni; i++ ) {
// work on array[:]
#pragma omp critical
sum += array[i];
}
This could be re-written to:
#pragma omp parallel for reduction(+:sum)
for ( int i = 0; i < Ni; i++ ) {
// work on array [:]
sum += array[i];
}