Inefficient Python loops

With Python it is very easy to unconsciously produce extremely inefficient code as it is an interpreted language. One need to put special attention on the data types and sentences used in order to mitigate interpreter’s overhead since generic Python objects are several orders of magnitude slower than other alternatives. Therefore, after the prototyping phases when developing Python software, users need to identify the heaviest compute functions and apply to them the most suitable optimization.

This pattern appears mostly in form of loops. Each loop increment does not translate to an extra instruction, but perhaps to several hundreds or thousands of instructions because the interpreter needs to check boundaries limits, exit condition, data types, etc. Take as an example the next code.

def traverse_and_compute(arr):
    for i in range(len(arr)):
        for j in range(len(arr[i])):
            if (arr[i][j] % 2) == 0:
                arr[i][j] = (arr[i][j] + 1) / 2
            else:
                arr[i][j] = 0

If this piece of code belonged to a compiled programming language like C or Fortran, the compiler would take care to analyse and squeeze the performance of it. Python, though, will do none of that and interpret each sentence as dictated.

For a numpy.array arr containing 1 million integers Python 3.8.5 executed 1,26e10 instructions in traverse_and_compute, meaning that each iteration took 12.600 instructions, a rather exaggerated amount of work if we compare it with an equivalent function compiled with GCC 10.2.1 and O0 optimizations, which took approximately 30 instructions per iteration.

Recommended best-practice(s): Related program(s):
  • Python loops (original)
  • Python loops (numba+numpy)
  • Python loops (numba)
  • Python loops (numpy)