Home Resources for Co-Design Patterns Writing-unstructured-data-to-linear-file-formats

Writing unstructured data to linear file formats

Many applications cope with unstructured data, which result in unbalanced data distribution over processes. A simple example for this are particle simulations where particles are moved around between processes over time. So, when the simulation shall write the global state at the end - or in-between, e.g., for a checkpoint - each process will have a different number of particles. This makes it hard to write data efficiently to a file in a contiguous way. The same pattern can also be found, e.g., in applications using unstructured meshes with different number of elements per process.

Imbalanced data distribution to contiguous file

Four processes with different amount of data that are to be written to a contiguous file “my-file.dat”

An often seen but obviously bad approach to cope with this is serialization of the I/O across the processes. Instead offset based parallel I/O should be applied.

Recommended best-practice(s):

MPI-I/O with pre-computed offsets for each processes

Related program(s):

Writing unstructured data to a contiguous file using MPI I/O with offsets

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No 676553 (POP1) and 824080 (POP2).

Currently, the project receives funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101143931 (POP3). The JU receives support from the European Union's Horizon Europe research and innovation programme and Spain, Germany, France, Portugal and the Czech Republic.