This kernel code implements the solution of the 3D diffusion equation. There are currently three different implementations: cpu_diffusion which uses a single CPU core, cpu_openmp_diffusion which useses multiple CPU cores via OpenMP and opencl_diffusion where the iterations are computed on the GPU while the CPU launches kernels and manages the date transfer between MPI ranks.
First of all, the initial state is set and stored in field u. The diffusion is computed for the given number of iterations. The MPI ranks are connected in x-direction where rank 0 is located at lower x-coordinates than rank 1. The cells at xmax of rank 0 are used as ghost cells for rank 1, while the cells at xmin of rank 1 are used as ghost cells for rank 0.
Each iteration starts with the exchange of ghost cells via non blocking MPI. After the initialization of the field holding the Laplacian operator, the operator for field u is computed. Finally, u is updated using the equation u(t+dt)=u(t)+dt*Laplace(u(t))
To build and run this kernel you will need a
The kernel can be compiled with the provided Makefile via
make cpu_diffusion|cpu_openmp_diffusion|opencl_diffusion|clean
The resulting executables can be executed using mpirun, the number of ranks must be two. The program requires two arguments:
To run the kernel execute
mpirun -np 2 ./cpu_diffusion <n> <cycles>
For example: mpirun -np 2 ./cpu_diffusion 40 1000
At the end of the computation, each rank reports the rate at which work items were processes. A higher value indicates better performance.