POP has defined a methodology for analysis of parallel codes to provide a quantitative way of measuring relative impact of the different factors inherent in parallelisation. The the methodology uses a hierarchy of metrics each one reflecting a common cause of inefficiency in parallel programs. These metrics then allow comparison of parallel performance (e.g., over a range of thread/process counts, across different machines, or at different stages of optimisation and tuning) to identify which characteristics of the code contribute to inefficiency.
The metrics are then calculated as efficiencies between 0 and 1, with higher numbers being better. In general, we regard efficiencies above 0.8 as acceptable, whereas lower values indicate performance issues that need to be explored in detail.Global Efficiency: At the top level of the POP hierarchy we can find the Global Efficiency (GE). It measures the overall quality of the parallelisation. Programs with Global Efficiency issues: FFTXlib · Computation Efficiency: An important sub-metric within the Global Efficiency ratio is the Computation Efficiency (CompE), which are the ratios of total time in useful computation summed over all processes. Programs with Computation Efficiency issues: FFTXlib · Instruction Efficiency: Instruction Efficiency is the ratio of total number of useful instructions for a reference case (e.g., 1 processor) compared to values when increasing the numbers of processes. Programs with Instruction Efficiency issues: IPC Efficiency: IPC Efficiency compares IPC to the reference, where lower values indicate that rate of computation has slowed. Programs with IPC Efficiency issues: Frequency Efficiency: Frequency Efficiency compares processor frequencies to the reference, where lower values indicate that rate of freqencies have decreased. Typically this effect is produced by increasing the load within the socked, which implies a reduction in the frequency to reduce power consumption. Programs with Frequency Efficiency issues: Parallel Efficiency: Parallel Efficiency (PE) reveals the inefficiency in splitting computation over processes and then communicating data between processes. As with GE, PE is a compound metric whose components reflects two important factors in achieving good parallel performance in code. Programs with Parallel Efficiency issues: CalculiX solver · Load Balance Efficiency: Load Balance (LB) is computed as the ratio between average useful computation time (across all processes) and maximum useful computation time (also across all processes). Programs with Load Balance Efficiency issues: CalculiX solver · Communication Efficiency: Communication Efficiency (CommE) is the maximum across all processes of the ratio between useful computation time and total runtime. Programs with Communication Efficiency issues: Serialization Efficiency: Serialisation Efficiency (SerE) measures inefficiency due to idle time within communications (i.e. time where no data is transferred) and is expressed as. Programs with Serialization Efficiency issues: Transfer Efficiency: Transfer Efficiency (TE) measures inefficiencies due to time in data transfer. Programs with Transfer Efficiency issues: