One of the proposals of the best-practice is to use Numpy’s routines to rely our operations on optimized functions. In this particular case, we are interested in using Numpy’s vectorization in order to remove all for-loops. This way, Numpy takes care of computing all elements of our arrays in just one call. The outcome of this optimization is shown below.
Master | Numpy | |
---|---|---|
Total elapsed Time [s] | 1545.69 | 1.67 |
compute_step_1 elapsed time [s] | 1544.10 | 1.59 |
compute_step_2 elapsed time [s] | 0.56 | 0.02 |
Total instructions | 9.62e12 | 1.34e10 |
compute_step_1 instructions | 9.62e12 | 1.33e10 |
compute_step_2 instructions | 4.87e9 | 9.19e7 |
Total average IPC | 2.02 | 2.60 |
compute_step_1 average IPC | 2.02 | 2.65 |
compute_step_2 average IPC | 2.63 | 1.88 |
By using Numpy vectorization we achieved a speedup of 925.56X. Since in this version we have replaced Python’s for-loop with Numpy vectorization, we can realize how slow are generic Python loops.