List of Programming Models


CUDA - Compute Unified Device Architecture is a general purpose parallel computing platform and scalable programming model for NVIDIA graphics processing units (GPUs). It allows C/C++ and Fortran developers to design specific device functions called kernels that are executed in parallel by groups of threads and thus efficiently utilize a large number of CUDA cores available on current GPUs. The CUDA programming model comprises a hierarchy of thread groups, a hierarchy of shared/private memories with separate memory space and synchronization mechanisms. These abstractions provide fine-grained data and thread parallelism, nested within coarse-grained data and task parallelism, e.g. a problem can be partitioned into coarse sub-problems that can be solved independently in parallel by blocks of threads, and each sub-problem can be further divided into finer pieces that can be solved cooperatively in parallel by all threads within the block.

Programs: GPU-Kernel ·


MPI - Message Passing Interface is a standardized and portable message-passing communication protocol for programming parallel computers. MPI provides communicators, point-to-point communication, collective communication, derived datatypes, and some modern concepts as one-sided communication, dynamic process management, I/O. There are several well-tested and efficient implementations of MPI, such as MPICH, Open MPI. This library uses compiler wrappers mpicc, mpic++ and mpif90 for C, C++ and Fortran, respectively.

Programs: BEM4I miniApp · BLAS Tuning · Communication computation trade-off · Communication Imbalance · DuMuX DUNE kernel · False communication-computation overlap · FFTXlib · Parallel File I/O · RankDLB · Sam(oa)² ·


OmpSs extends OpenMP with compiler directives for asynchronous parallelism and heterogeneous architectures (i.e., GPUs, FPGAs, accelerators). Also, it can be understood as an extension of accelerator-based APIs like CUDA or OpenCL. A detailed description can be found at


oneAPI offers an open, unified programming model to simplify the development and deployment of data-centric workloads across CPUs, GPUs, FPGAs and other types of hardware architectures.


OpenACC - Open Accelerators - is a directive-based high-level programming model similar to the OpenMP but intended for accelerators. The parallel regions are decorated with compiler directives that enable portability of the code to a wide range of accelerators. The OpenACC accelerator model abstracts multiple levels of parallelism of processors and the hierarchy of memories. It allows offloading both data and computation from a host device to accelerator device, where the devices can be different but even the same architectures with separate or shared memory space.


OpenCL - Open Computing Language is a standard for programming heterogeneous systems, e.g. CPU and GPU, and supports data and task parallelism. It defines abstract platform, execution, memory and programming models that describe features and behaviour of the target system. The heterogeneous system consists of a host system and one or more OpenCL devices containing processing elements. The host application controls communication with devices and both in order and out of order execution of compute kernels instances in the form of threads. The kernels are written using OpenCL C/C++ extension.

Programs: CPU to GPU ·


OpenMP is an implementation of multithreading, where a master thread creates a specific number of child threads (slaves). The system splits a master’s computational task into smaller ones and distributes them to threads. These threads then compute concurrently. Each thread is executed on a different processor. Parts of code that should run in parallel (parallel regions) are distinguished by specific compiler directives inserted into the code.

Programs: Alya assembly · BEM4I miniApp · CalculiX solver · FFTXlib · juKKR kloop · JuPedSim · OMP Collapse · OpenMP Critical · Sam(oa)² ·


PGAS - Partitioned Global Address Space assumes a global memory address space that is logically partitioned and a portion of it is local to each process, thread, or processing element. This can facilitate the development of productive programming languages that can reduce the time to solution, i.e. both development time and execution time. Languages based on PGAS are Unified Parallel C, Co-Array Fortran, Titanium, X-10, Chapel and others.

Posix threads

The Portable Operating System Interface (POSIX) defines an Application Programming Interface (API) for thread programming. Implementations of this interface exist for a large number of UNIX-like operating systems (GNU/Linux, Solaris, FreeBSD, OpenBSD, NetBSD, OS X), as well as for Microsoft Windows and others. Libraries that implement this standard (and functions of this standard) are usually called Pthreads.

Programs: CalculiX solver ·