A Centre of Excellence in HPC

The dimensions of the arrays are 4x10x5481x4x48, which are traversed two times (one in `compute_step_1`

and another in `compute_step_2`

) via nested for-loops. Since one element of the first dimension is passed, the program ends up executing around 3x10x5481x4x48 iterations. All experiments were run with one core on an Intel i5-8365U CPU @ 1.60GHz processor.

The following Paraver image shows the execution structure of the pattern using 1 thread. 99% of the time is spent in the most expensive function, `compute_step_1`

.

In the next chart different performance metrics of the kernel are displayed.

Master | |
---|---|

Total elapsed Time [s] | 1545.69 |

compute_step_1 elapsed time [s] | 1544.10 |

compute_step_2 elapsed time [s] | 0.56 |

Total instructions | 9.62e12 |

compute_step_1 instructions | 9.62e12 |

compute_step_2 instructions | 4.87e9 |

Total average IPC | 2.02 |

compute_step_1 average IPC | 2.02 |

compute_step_2 average IPC | 2.63 |

Although IPC is good, the total number of instructions is huge. The naive Python version of this kernel is executing approximately 3.05e5 instructions per iteration in `compute_step_1`

, a number that will decrease drastically in the next versions.