This version is a CUDA port of the baseline serial version. It is implementing
in-source versions of the MatMulTiled, transpose and softmax services.
The code also keeps versions of the servial implementation for comaparison
purposes.
Several configure files are included in the source code distribution.
In order to build the program you should execute the make program:
$> make [ENVIRONMENT]
Where the ENVIRONMENT options can be:
| WITH_EXTRAE={true | false} |
The generated binary will be suffixed with the options provided above.
To run the program you must include in the command line the size of the context and the number of dimensions. For instance:
$ ./attention-cuda <context_size> <dim>