This version of the GPU SAXPY kernel is launched with an srun
command with no additional parameters.
srun ./kernel.exe 8000000000
The default CPU binding for MPI tasks configured on the system is then used. Depending on the system configuration this might not be optimal. The offloading call may be executed by a CPU core that is not on the same NUMA domain that the target GPU is connected to. This leads to higher latencies and lower bandwidth for data transfers between CPU and GPU. In the worst case the CPU handling the data transfer to the GPU is on a different socket which increases this effect. Depending on the configuration the processes might even get moved to other cores.
The following experiments have been registered: