In the early stages of scientific code development, developers (possibly scientists) tend to create a naive and easy-to-read implementation of a required algorithm. Such an implementation have a great chance to be inefficient in some performance aspect.
Because this issue can occur in any type of algorithms, the inefficiency can affect any or even none of the POP efficiency metrics. The intuitive and the most straightforward way to identify the problem is to recognize a well-known algorithm as a bottleneck - the most time-consuming part - of the application. From the particular metrics, an inefficiency of a math algorithm could typically be observed in a load balance efficiency, low IPC values, an unexpectedly high number of instructions, or under-utilization of cores.
One can imagine the well-known algorithms from many different math areas such as linear algebra operations, FFT, math functions (trigonometry, exponential, etc.), statistics, random number generation, image processing, machine learning, etc.
One example of this pattern was identified in the finite element solver SIFEL, where the most time-consuming routine was the naive implementation of the Local Schur complement method based on matrix factorization. In this case, the metrics affected by the inefficient implementation were low IPC values and a very high number of instructions.
Recommended best-practice(s): Related program(s):