A Centre of Excellence in HPC

BEM4I is a library of parallel boundary element based solvers developed at IT4Innovations National Supercomputing Center. It supports solutions of the Laplace, Helmholtz, Lame, and wave equations. The library implements OpenMP and hybrid OpenMP/MPI parallelization. The development is focused on an efficient implementation utilizing multi- and many-core architecture. System matrices assembled within the BEM are generally dense and the library uses Adaptive Cross Approximation technique to approximate them. The resulted linear system is solved by the appropriate iterative solver based on the quality of the system matrix. For Helmholtz and wave equations, the solver is the GMRES method, for Laplace and Lame it can be the CG method.

CalculiX IOCalculiX is a free three dimensional structural finite element analysis program. It supports linear and non-linear calculations of static, dynamic and thermal problems. The code is written in C and Fortran. Parallelization is achieved using the pthread programming model.

CalculiX solverCalculiX is a free three dimensional structural finite element analysis program. It supports linear and non-linear calculations of static, dynamic and thermal problems. The code is written in C and Fortran. Parallelization is achieved using the pthread programming model.

Communication ImbalanceThe *Communication Imbalance* kernel is a synthetic program which reproduces a
communication pattern in between several MPI processes. Initially it computes a
connectivity matrix which represents from/to which ranks will comunicate to one
each other, and it also preassigns a given number of elements to each rank.

DuMuX DUNE is a free and open-source simulator for flow and transport processes in porous media written in C++. This is the DuMuX DUNE kernel, which implement one of the communication and computation patterns found in DuMuX DUNE. The kernel implements a sparse alltoallv communication pattern where computation is performed on the individual communicated buffers.

False communication-computation overlapThe *False communication-computation overlap* kernel is a synthetic program which reproduces a communication/computation
pattern between several MPI processes.

FFTXlib is the stand-alone kernel that represents the *Fast Fourier
Transformation* (FFT) algorithm used in the *Quantum ESPRESSO* application, one
of the most used plane-wave *Density Functional Theory* (DFT) codes in the
community of material science. The FFT kernel implements a layered MPI
communication with FFT task groups to split the cost of collective
communication operations to balance the impact on the performance.

JuPedSim is an open source framework for simulating, analyzing and visualizing pedestrian dynamics in complex geometries, with the possibility for several exits and obstacles.

OpenMP CriticalAn oil & gas code had the openmp-critical-section pattern and the computational aspects of the original code are recreated here. This application solves the 3D wave equation: \(\frac{\partial^{2}u}{\partial t^{2}} = c^{2}\nabla^{2}u\) using the pseudospectral method. However, for the WP7 kernel the finite difference method was selected for the purpose of simplicity, which re-creates the computational profile of the original code. The code contains a critical section within a parallel OpenMP loop, which greatly slows its performance.

Parallel File I/OA naive approach to file I/O in parallel software is for one process to sequentially read/write ASCII data to/from a single file (e.g. using the C fscanf and fprintf commands) with point to point communications to share the data with all other processes.

RankDLBRankDLB demonstrates performance issues arising in programs where the computational load per MPI rank evolves over time and therefore creates a load imbalance among MPI ranks. The computational problem must contain a coupling between MPI ranks where data is exchanged between ranks after the computation of a single iteration has completed.