This verion of the matrix multiplication kernel is structurally identical to the original verison, i.e. the matrix multiplication is still implemented in the naive way.
However, in this version we explicitly assure the compiler that it is safe to perform optimizations of the memory access pattern by marking the input arrays with the __restrict
keyword.
This simple change yields a nearly 3x improvement in runtime.