This experiments is obtained with the branch chunksize_overlap
that implements changes on the last and originaly badly scheduled loop from the chunksize
branch, and also implements computation and communication on segments for the first four computationaly large loops with matrices K
, K'
, V
and D
.
.. todo images .. Images of one iteration with 1, 4, 8 and 16 segments
Table with MPI data - timings for MPI_Waitall, MPI_Iallreduce, and Outside of MPI
Table with efficiencies, instructions and IPC for all different settings