POP has defined a methodology for analysis of parallel codes to provide a quantitative way of measuring relative impact of the different factors inherent in parallelisation. The the methodology uses a hierarchy of metrics each one reflecting a common cause of inefficiency in parallel programs. These metrics then allow comparison of parallel performance (e.g., over a range of thread/process counts, across different machines, or at different stages of optimisation and tuning) to identify which characteristics of the code contribute to inefficiency.
The metrics are then calculated as efficiencies between 0 and 1, with higher numbers being better. In general, we regard efficiencies above 0.8 as acceptable, whereas lower values indicate performance issues that need to be explored in detail.Global Efficiency: At the top level of the POP hierarchy we can find the Global Efficiency (GE). It measures the overall quality of the parallelisation. Computation Efficiency: An important sub-metric within the Global Efficiency ratio is the Computation Efficiency (CompE), which are the ratios of total time in useful computation summed over all processes. Instruction Efficiency: Instruction Efficiency is the ratio of total number of useful instructions for a reference case (e.g., 1 processor) compared to values when increasing the numbers of processes. IPC Efficiency: IPC Efficiency compares IPC to the reference, where lower values indicate that rate of computation has slowed. Frequency Efficiency: Frequency Efficiency compares processor frequencies to the reference, where lower values indicate that rate of frequencies have decreased. Typically this effect is produced by increasing the load within the socked, which implies a reduction in the frequency to reduce power consumption. Parallel Efficiency: Parallel Efficiency (PE) reveals the inefficiency in splitting computation over processes and then communicating data between processes. As with GE, PE is a compound metric whose components reflects two important factors in achieving good parallel performance in code. Load Balance Efficiency: Load Balance (LB) is computed as the ratio between average useful computation time (across all processes) and maximum useful computation time (also across all processes). Communication Efficiency: Communication Efficiency (CommE) is the maximum across all processes of the ratio between useful computation time and total runtime. Serialization Efficiency: Serialisation Efficiency (SerE) measures inefficiency due to idle time within communications (i.e. time where no data is transferred) and is expressed as. Transfer Efficiency: Transfer Efficiency (TE) measures inefficiencies due to time in data transfer.