This kernel is to showcase the importance of memory coalescing with a simple matrix multiplication. In this case two 1024x1024 matrices of random integers get multiplied. The matrix multiplication is implemented the naive way, i.e. the threads get distributed in a two-dimensional grid and iterate over the row and down the column. No optimization techniques like blocking or tiling were used.
The following experiments have been registered: