Python loops (numba+numpy)

One of the proposals of the best-practice is to combine Numba JIT compiler with Numpy’s routines. The outcome of this optimization is shown below.

Compute Structure (master)

  Master Numba Numpy Numba+Numpy
Total elapsed Time [s] 1545.69 261.70 1.67 15.51
compute_step_1 elapsed time [s] 1544.10 253.29 1.59 3.47
compute_step_2 elapsed time [s] 0.56 2.9e-3 0.02 0.01
Total instructions 9.62e12 1.24e12 1.34e10 9.84e10
compute_step_1 instructions 9.62e12 1.19e12 1.33e10 2.53e10
compute_step_2 instructions 4.87e9 5.25e6 9.19e7 9.19e7
Total average IPC 2.02 1.72 2.60 2.02
compute_step_1 average IPC 2.02 1.71 2.65 2.28
compute_step_2 average IPC 2.63 1.53 1.88 2.38

By combining Numba+Numpy we achieved a speedup of 99.66X. This version is slower than the pure Numpy one for two reasons: one, we can’t compile entirely the code with Numba without relying on Python’s interpreter because Numba doesn’t support all Numpy functions; two, Numba causes compilation overheads.