One of the proposals of the best-practice is to combine Numba JIT compiler with Numpy’s routines. The outcome of this optimization is shown below.
Master | Numba | Numpy | Numba+Numpy | |
---|---|---|---|---|
Total elapsed Time [s] | 1545.69 | 261.70 | 1.67 | 15.51 |
compute_step_1 elapsed time [s] | 1544.10 | 253.29 | 1.59 | 3.47 |
compute_step_2 elapsed time [s] | 0.56 | 2.9e-3 | 0.02 | 0.01 |
Total instructions | 9.62e12 | 1.24e12 | 1.34e10 | 9.84e10 |
compute_step_1 instructions | 9.62e12 | 1.19e12 | 1.33e10 | 2.53e10 |
compute_step_2 instructions | 4.87e9 | 5.25e6 | 9.19e7 | 9.19e7 |
Total average IPC | 2.02 | 1.72 | 2.60 | 2.02 |
compute_step_1 average IPC | 2.02 | 1.71 | 2.65 | 2.28 |
compute_step_2 average IPC | 2.63 | 1.53 | 1.88 | 2.38 |
By combining Numba+Numpy we achieved a speedup of 99.66X. This version is slower than the pure Numpy one for two reasons: one, we can’t compile entirely the code with Numba without relying on Python’s interpreter because Numba doesn’t support all Numpy functions; two, Numba causes compilation overheads.