Python loops (numpy)

One of the proposals of the best-practice is to use Numpy’s routines to rely our operations on optimized functions. In this particular case, we are interested in using Numpy’s vectorization in order to remove all for-loops. This way, Numpy takes care of computing all elements of our arrays in just one call. The outcome of this optimization is shown below.

Compute Structure (numpy)

  Master Numpy
Total elapsed Time [s] 1545.69 1.67
compute_step_1 elapsed time [s] 1544.10 1.59
compute_step_2 elapsed time [s] 0.56 0.02
Total instructions 9.62e12 1.34e10
compute_step_1 instructions 9.62e12 1.33e10
compute_step_2 instructions 4.87e9 9.19e7
Total average IPC 2.02 2.60
compute_step_1 average IPC 2.02 2.65
compute_step_2 average IPC 2.63 1.88

By using Numpy vectorization we achieved a speedup of 925.56X. Since in this version we have replaced Python’s for-loop with Numpy vectorization, we can realize how slow are generic Python loops.