The kernel Python loops is a synthetic program based on a real world HPC Python script that reproduces an inefficient way to write loop-compute algorithms in Python.
The kernel initially reads the input arrays from a file. Then, it executes two compute steps where the input arrays are traversed doing some basic computation on each element. In the second step two output arrays are allocated where the final results are stored. These two output arrays are finally stored in a binary file so one can compare the correctness of the solution when modifying the code. The first compute step is much more expensive than the second because it has 5 levels of for-loops and calls some matrix operations like transpose and ravel, while the second step has only 4 for-loops levels and only does matrix multiplications and additions.
The operations of both compute steps are trivial. The only important things to know are:
The following pseudo-code summarizes what compute steps look like:
def compute_step_1:
for i in N
#load_temporal_data
for j in M
#load_temporal_data
for k in P
for l in Z
#load_temporal_data
for f in H
#matrix_operations
m.transpose()
m.ravel()
m.sum()
m += a/b*c
return m
def compute_step_2
for j in M
for k in P
for l in Z
#initializes_output_arrays
for f in H
if condition:
output1 += a*b
output2 += a*b
else:
output1 = 0
output2 = a
return [output1, output2]
The main issue of this kernel is that it implements the algorithm in a very naive way. As it is right now, it is entirely executed by Python’s interpreter, which is incredibly slower than any compiled code.