CUDA

CUDA - Compute Unified Device Architecture is a general purpose parallel computing platform and scalable programming model for NVIDIA graphics processing units (GPUs). It allows C/C++ and Fortran developers to design specific device functions called kernels that are executed in parallel by groups of threads and thus efficiently utilize a large number of CUDA cores available on current GPUs. The CUDA programming model comprises a hierarchy of thread groups, a hierarchy of shared/private memories with separate memory space and synchronization mechanisms. These abstractions provide fine-grained data and thread parallelism, nested within coarse-grained data and task parallelism, e.g. a problem can be partitioned into coarse sub-problems that can be solved independently in parallel by blocks of threads, and each sub-problem can be further divided into finer pieces that can be solved cooperatively in parallel by all threads within the block.

Related program(s): GPU-Kernel
Related report(s):

QuantumTransport

k-Wave

Tsunami-HySEA

DARKNET (YoloV3)

AFiD

ABINIT

SPECFEM3D (MPI+CUDA) on Leonardo-B

Xshells

SPECFEM3D (MPI+CUDA) on Leonardo-B performance assessment report (v2)