# CPU to GPU

Program's name: CPU to GPU

Available version(s):
Programming language(s):
C++ ·

Programming model(s):
OpenCL ·

This kernel code implements the solution of the 3D diffusion equation. There are currently three different implementations: *cpu_diffusion* which uses a single CPU core, *cpu_openmp_diffusion* which useses multiple CPU cores via OpenMP and *opencl_diffusion* where the iterations are computed on the GPU while the CPU launches kernels and manages the date transfer between MPI ranks.

First of all, the initial state is set and stored in field u. The diffusion is computed for the given number of iterations. The MPI ranks are connected in x-direction where rank 0 is located at lower x-coordinates than rank 1. The cells at xmax of rank 0 are used as ghost cells for rank 1, while the cells at xmin of rank 1 are used as ghost cells for rank 0.

Each iteration starts with the exchange of ghost cells via non blocking MPI. After the initialization of the field holding the Laplacian operator, the operator for field u is computed. Finally, u is updated using the equation u(t+dt)=u(t)+dt*Laplace(u(t))

# Getting started

## Prerequisites

To build and run this kernel you will need a

- MPI library
- C++ compiler
- GPU with OpenCL runtime (opencl_diffusion only)

## Building and running the kernel

The kernel can be compiled with the provided Makefile via

```
make cpu_diffusion|cpu_openmp_diffusion|opencl_diffusion|clean
```

The resulting executables can be executed using mpirun, the number of ranks must be two. The program requires two arguments:

- n: number of elements in each direction, i.e. n
*n*n elements will be used
- number of iterations to perform

To run the kernel execute

```
mpirun -np 2 ./cpu_diffusion <n> <cycles>
```

For example: mpirun -np 2 ./cpu_diffusion 40 1000

At the end of the computation, each rank reports the rate at which work items were processes. A higher value indicates better performance.