This version of the GPU SAXPY kernel is launched with an srun
command with the additional --cpu-bind
parameter. It binds the processes to cores in the NUMA domains the GPUs are connected to. An example call for a system with 4 GPUs connected to NUMA domains 0,2,4 and 6 is given below:
srun --cpu-bind=map_ldom:0,2,4,6 ./kernel.exe 8000000000
This binding ensures that the GPU bandwidth and latency is not limited by GPU affinity effects. To obtain the correct NUMA domains to use on a system, the GPU vendor tools ‘nvidia-smi topo -m’ (NVIDIA GPUs) or ‘rocm-smi –showtoponuma’ (AMD GPUs) can be used. They give information on the topology of the system.
The following experiments have been registered: