The spatio-temporal structure of the behavior for the whole run is shown in Figure 1 on a view of the length of useful computations for the execution with 8 threads. It shows a large initialization phase where the input data is read. This phase is not parallelized because for a real execution its weight is very small. The parallel area corresponds to the main computation and we can already identify loops of very different granularity (color). The structure of the total execution for the different runs is very similar across the different traces obtained.