A Centre of Excellence in HPC

Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]

Implemented best practices: Replicating computation to avoid communication (gemm) ·

The best-practice algorithm is

```
while ( t < t_end ) {
1. send/receive particle positions to/from all other processes
2. determine which non-bonded forces need to be computed
3. compute the force for particles assigned to this processes
4. receive atoms from other processors and compute interactions:
compute force: F[i,j];
compute force: F[j,i] = -F[i,j];
}
```

Stage 4 in the algorithm is used to guarantee that \({\bf F}_{i,j}\) is not calculated twice as \({\bf F}_{i,j}=-{\bf F}_{j,i}\) by replicaing computation rather than comunicating it from/to another MPI process. Because the computation are trivial, the overall result is that this algorithm is faster wrt the pattern one. When computation part is time consuming this algorithm may be inefficent and sharing workload among MPI processes is to be preferred.

**Code purpose:**

`md_mpi_comm_avoid.c`

can be used to demonstrate the good MPI Transfer efficiency for a large number of particles when using a simple `MPI_SEND/MPI_RECV`

strategy to split the force calculation among processes.

**How to use:**

The Makefile command `make`

generates an executable file named `md_mpi_comm_avoid.exe`

using the GNU compiler. To run the code, first define the number of time steps and atoms to be used and then launch the application on a specific number of MPI processes, for example replace `NUMATOMS`

, `NUMSTEPS`

and `NUMPROC`

in the following, where these are respectively the number of time steps, atoms, and MPI processes.

`mpirun -n <NUMPROC> ./md_mpi_comm_avoid.exe NUMSTEPS NUMATOMS`

Default values are assigned if the number of iterations is not provided, or are less than `0`

.

Screen output will be generated, similar to the following one:

```
> POP WP7 best-practice
> Version of code: best-practice version without performance bottleneck
> Implements Best-practice: Replicating computation to avoid communication
> Problem size: NUMSTEPS = 10 TOTATOMS = 2000
> Best-practice wall time (integration) = 10.30
```