Parallel File I/O experiments

The data below reports the time taken to write and read 4.8e8 double precision values, where the data is distributed evenly over all processes, i.e. strong scaling. The times are an average of three measurements on MareNostrum 4, with 48 processes per 48 core compute node. MareNostrum 4 has an IBM General Parallel File System (GPFS). Data is shown for all parallel-file-io versions, i.e.

  • serial-ascii a naive worst case implemtation where all file I/O is undertaken by process 0, with data for file I/O operations communicated between process 0 and other processes via point to point blocking sends and receives, and with file I/O to a single ASCII file
  • parallel-ascii-multifile where every process writes and reads ASCII data to its own file
  • parallel-binary-multifile where every process writes and reads binary data to its own file
  • parallel-library-netcdf-independent-access with parallel file I/O to a single file using NetCDF with independent access
  • parallel-library-netcdf-collective-access with parallel file I/O to a single file using NetCDF with collective access
  • parallel-library-mpiio-independent-access with parallel file I/O to a single file using MPI file I/O with independent access
  • parallel-library-mpiio-collective-access with parallel file I/O to a single file using MPI file I/O with collective access

The first two plots compare writing and reading for all cases. All parallel implemtations are significantly faster than the serial-ascii file I/O version, and reading data is significantly faster than writing for this version.

Writing data Reading data

The next two plots zoom in on the time axis to allow comparision of all parallel file I/O methods.

For writing data, parallel-binary-multifile and both parallel-library-mpiio versions are signifcantly faster than the parallel-library-netcdf versions. The parallel-ascii-multifile version is also faster than both parallel-library-netcdf versions on 4+ nodes, and is the only version which shows signifcant speed up as the number of compute nodes increases.

Reading is faster than writing for parallel-ascii-multifile and shows the same pattern of speedup as the number of compute nodes is increased. The large difference in time between the parallel-library-netcdf versions and the other binary file I/O versions is not observed for reading.

Writing data Reading data

The final two plots zoom further on the time axis to allow a better comparision of the fastest methods, for easier comparision the same time scale is used.

For writing parallel-binary-multifile is faster than both parallel-library-mpiio versions, although all version use similar times as the number of compute nodes is increased.

For reading the picture is more complex. Both parallel-library-netcdf versions are faster than the parallel-library-mpiio versions. The parallel-library-netcdf versions and parallel-binary-multifile versions use similar times on 8+ compute nodes.

Writing data Reading data