GPU SAXPY with optimal cpu binding

Version's name: GPU SAXPY with optimal cpu binding ; a version of the GPU SAXPY program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]
Patterns and behaviours: Implemented best practices: Appropriate process/thread mapping to GPUs ·

This version of the GPU SAXPY kernel is launched with an srun command with the additional --cpu-bind parameter. It binds the processes to cores in the NUMA domains the GPUs are connected to. An example call for a system with 4 GPUs connected to NUMA domains 0,2,4 and 6 is given below:

srun --cpu-bind=map_ldom:0,2,4,6 ./kernel.exe 8000000000

This binding ensures that the GPU bandwidth and latency is not limited by GPU affinity effects. To obtain the correct NUMA domains to use on a system, the GPU vendor tools ‘nvidia-smi topo -m’ (NVIDIA GPUs) or ‘rocm-smi –showtoponuma’ (AMD GPUs) can be used. They give information on the topology of the system.

The following experiments have been registered: