CUDA - Compute Unified Device Architecture is a general purpose parallel computing platform and scalable programming model for NVIDIA graphics processing units (GPUs). It allows C/C++ and Fortran developers to design specific device functions called kernels that are executed in parallel by groups of threads and thus efficiently utilize a large number of CUDA cores available on current GPUs. The CUDA programming model comprises a hierarchy of thread groups, a hierarchy of shared/private memories with separate memory space and synchronization mechanisms. These abstractions provide fine-grained data and thread parallelism, nested within coarse-grained data and task parallelism, e.g. a problem can be partitioned into coarse sub-problems that can be solved independently in parallel by blocks of threads, and each sub-problem can be further divided into finer pieces that can be solved cooperatively in parallel by all threads within the block.

Related program(s): GPU-Kernel
Related report(s):