BEM4I is a library of parallel boundary element based solvers developed at IT4Innovations National Supercomputing Center. It supports solutions of the Laplace, Helmholtz, Lame, and wave equations. The library implements OpenMP and hybrid OpenMP/MPI parallelization. The development is focused on an efficient implementation utilizing multi- and many-core architecture. System matrices assembled within the BEM are generally dense and the library uses Adaptive Cross Approximation technique to approximate them. The resulted linear system is solved by the appropriate iterative solver based on the quality of the system matrix. For Helmholtz and wave equations, the solver is the GMRES method, for Laplace and Lame it can be the CG method.

CalculiX IOCalculiX is a free three dimensional structural finite element analysis program. It supports linear and non-linear calculations of static, dynamic and thermal problems. The code is written in C and Fortran. Parallelization is achieved using the pthread programming model.

CalculiX solverCalculiX is a free three dimensional structural finite element analysis program. It supports linear and non-linear calculations of static, dynamic and thermal problems. The code is written in C and Fortran. Parallelization is achieved using the pthread programming model.

Communication ImbalanceThe *Communication Imbalance* kernel is a synthetic program which reproduces a
communication pattern in between several MPI processes. Initially it computes a
connectivity matrix which represents from/to which ranks will comunicate to one
each other, and it also preassigns a given number of elements to each rank.

DuMuX DUNE is a free and open-source simulator for flow and transport processes in porous media written in C++. This is the DuMuX DUNE kernel, which implement one of the communication and computation patterns found in DuMuX DUNE. The kernel implements a sparse alltoallv communication pattern where computation is performed on the individual communicated buffers.

FFTXlibFFTXlib is the stand-alone kernel that represents the *Fast Fourier
Transformation* (FFT) algorithm used in the *Quantum ESPRESSO* application, one
of the most used plane-wave *Density Functional Theory* (DFT) codes in the
community of material science. The FFT kernel implements a layered MPI
communication with FFT task groups to split the cost of collective
communication operations to balance the impact on the performance.

JuPedSim is an open source framework for simulating, analyzing and visualizing pedestrian dynamics in complex geometries, with the possibility for several exits and obstacles.

OpenMP CriticalAn oil & gas code had the openmp-critical-section pattern and the computational
aspects of the original code is recreated here. This application solves the 3D
wave equation: $`\frac{\partial^{2}u}{\partial t^{2}} = c^{2}\nabla^{2}u`

$
using the pseudospectral method.

A naive approach to file I/O in parallel software is for one process to sequentially read/write ASCII data to/from a single file (e.g. using the C fscanf and fprintf commands) with point to point communications to share the data with all other processes.

RankDLBRankDLB demonstrates performance issues arising in programs where the computational load per MPI rank evolves over time and therefore creates a load imbalance among MPI ranks. The computational problem must contain a coupling between MPI ranks where data is exchanged between ranks after the computation of a single iteration has completed.