# Replicating computation to avoid communication

Version's name: Replicating computation to avoid communication ; a version of the Communication computation trade-off program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]
Implemented best practices: Replicating computation to avoid communication (gemm) ·

The best-practice algorithm is

while ( t < t_end ) {
1. send/receive particle positions to/from all other processes
2. determine which non-bonded forces need to be computed
3. compute the force for particles assigned to this processes
4. receive atoms from other processors and compute interactions:
compute force: F[i,j];
compute force: F[j,i] = -F[i,j];
}


Stage 4 in the algorithm is used to guarantee that $${\bf F}_{i,j}$$ is not calculated twice as $${\bf F}_{i,j}=-{\bf F}_{j,i}$$ by replicaing computation rather than comunicating it from/to another MPI process. Because the computation are trivial, the overall result is that this algorithm is faster wrt the pattern one. When computation part is time consuming this algorithm may be inefficent and sharing workload among MPI processes is to be preferred.

Code purpose:

md_mpi_comm_avoid.c can be used to demonstrate the good MPI Transfer efficiency for a large number of particles when using a simple MPI_SEND/MPI_RECV strategy to split the force calculation among processes.

How to use:

The Makefile command make generates an executable file named md_mpi_comm_avoid.exe using the GNU compiler. To run the code, first define the number of time steps and atoms to be used and then launch the application on a specific number of MPI processes, for example replace NUMATOMS, NUMSTEPS and NUMPROC in the following, where these are respectively the number of time steps, atoms, and MPI processes.

mpirun -n <NUMPROC> ./md_mpi_comm_avoid.exe NUMSTEPS NUMATOMS

Default values are assigned if the number of iterations is not provided, or are less than 0.

Screen output will be generated, similar to the following one:

>   POP WP7 best-practice
>   Version of code: best-practice version without performance bottleneck
>   Implements Best-practice: Replicating computation to avoid communication
>   Problem size: NUMSTEPS = 10 TOTATOMS = 2000
>   Best-practice wall time (integration) = 10.30