Python is an interpreted, high-level, general-purpose programming language. It is dynamically typed and garbage-collected. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python’s design philosophy emphasizes code readability, and aims to help programmers write clear, logical code for small and large-scale projects. CPython is the open-source reference implementation of the Python programming language.

Python provides built-in parallel computation capabilities in form of multithreading and multiprocessing modules that are part of the standard library. The threading approach, in general, is limited by the Global Interpreter Lock that practically allows running only one thread in time, thus it is useful mainly for I/O. This can be overcome using the multiprocessing module, but that is inefficient due to serialization and memory duplications and works only for a single node, hence also does not fit the HPC use cases.

On top of that, there are several third-party packages and frameworks that are commonly used and suitable for HPC workloads. The most famous are Numpy and Scipy packages for scientific computations. They can be linked with BLAS and LAPACK libraries (ATLAS, MKL, GOTO, etc.) and contain compiled OpenMP-multithreaded functions that are used automatically without user interference. PyMP is an OpenMP-like multiprocessing framework based on the UNIX fork method. In order to significantly increase the speed of python codes and approach the speeds of C or Fortran, there is the Numba module that performs JIT compilation using LLVM compiler and supports CPU parallelization, SIMD vectorization and also GPU acceleration.

For a distributed multi-node computations there is a low-level MPI implementation called MPI4Py (MPI for Python). Parallel Python is a pure-python module for job-based SMP and multi-node parallelization with dynamic load balancing and fault-tolerance capabilities based on inter-process communication (IPC). There are also two modern general-purpose scalable frameworks for running distributed and/or GPU accelerated Python applications with integrated inter-node and intra-node communication, efficient data transfers, resistance for failures and task scheduling - Dask and Ray.

In order to leverage accelerators, there is CUDA and OpenCL integration for Python in PyCUDA and PyOpenCL packages respectively. There is also a quite exhaustive list of other Python parallel processing software packages:

Related program(s): Python loops
Related report(s):