Co-design at POP CoE project

Inefficient file I/O due to many unbuffered write operations

In a naive implementation, I/O operations are most likely implemented in serial. Data is read from and written to disk on demand whenever it is required to do so. However, this might lead to a significant performance decrease if the amount of data transferred to or from file is very small in a single operation and many of these operations happen.

Consider the following situation: After a certain number of timesteps, a simulation code has to write some results to a file. Typically some information stored on every discretization point needs to be written to this file. The following code skeleton illustrates this pattern:

! MAX_POINTS = number of discretization points
#define MAX_POINTS 1000000

TYPE point
    real*8 :: position(3)
END TYPE

TYPE(point) :: points(MAX_POINTS);

OPEN(42, file = 'output.dat', status = 'unknown', action = 'rewind');

DO i = 0, MAX_POINTS
    write(42, *) (points(i)%position(j), j=1,3)
END DO

close(42);

In this code example, the position coordinates in 3D space for 1 million discretization points are written to a file called output.dat For each discretization point, a single write operation is issued. So a total of 1 million write operations each only writing 24 bytes are performed in this example.

A file system is organized in file system blocks of a certain size (e.g. 4 MB). If write operations only write very small data to a file then the same file system block needs to be accessed multiple times, which also causes meta data to be updated every time causing significant overhead. Furthermore, if the file system is connected via a network to the HPC cluster system, then one also has to pay the cost of latency to transfer the data with every small write operation.

This pattern can be found in the CalculiX application, for example. In a small test example, around 1.6 million write operations each of size 20 bytes (2 double precision floating numbers + 1 integer number) can be performed in roughly 3 seconds when using unbuffered I/O operations.

Recommended best-practices: