CPU to GPU

Program's name: CPU to GPU
Available version(s): Programming language(s): C++ ·
Programming model(s): OpenCL ·

This kernel code implements the solution of the 3D diffusion equation. There are currently three different implementations: cpu_diffusion which uses a single CPU core, cpu_openmp_diffusion which useses multiple CPU cores via OpenMP and opencl_diffusion where the iterations are computed on the GPU while the CPU launches kernels and manages the date transfer between MPI ranks.

First of all, the initial state is set and stored in field u. The diffusion is computed for the given number of iterations. The MPI ranks are connected in x-direction where rank 0 is located at lower x-coordinates than rank 1. The cells at xmax of rank 0 are used as ghost cells for rank 1, while the cells at xmin of rank 1 are used as ghost cells for rank 0.

Each iteration starts with the exchange of ghost cells via non blocking MPI. After the initialization of the field holding the Laplacian operator, the operator for field u is computed. Finally, u is updated using the equation u(t+dt)=u(t)+dt*Laplace(u(t))

Getting started

Prerequisites

To build and run this kernel you will need a

  • MPI library
  • C++ compiler
  • GPU with OpenCL runtime (opencl_diffusion only)

Building and running the kernel

The kernel can be compiled with the provided Makefile via

make cpu_diffusion|cpu_openmp_diffusion|opencl_diffusion|clean

The resulting executables can be executed using mpirun, the number of ranks must be two. The program requires two arguments:

  1. n: number of elements in each direction, i.e. nnn elements will be used
  2. number of iterations to perform

To run the kernel execute

mpirun -np 2 ./cpu_diffusion <n> <cycles>

For example: mpirun -np 2 ./cpu_diffusion 40 1000

At the end of the computation, each rank reports the rate at which work items were processes. A higher value indicates better performance.