LLM-Attention (CUDA version)

Version's name: LLM-Attention (CUDA version) ; a version of the LLM-Attention kernel program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]
Based on version(s): LLM-Attention (serial version) ·

This version is a CUDA port of the baseline serial version. It is implementing in-source versions of the MatMulTiled, transpose and softmax services. The code also keeps versions of the servial implementation for comaparison purposes.

Pre-requisites

A Nvidia compiler.
(optional) The Extrae library, to generate Paraver traces

Several configure files are included in the source code distribution.

Building the kernel

In order to build the program you should execute the make program:

$> make [ENVIRONMENT]

Where the ENVIRONMENT options can be:

VERSION={cuda}
WITH_EXTRAE={true | false}

The generated binary will be suffixed with the options provided above.

Executing the kernel

To run the program you must include in the command line the size of the context and the number of dimensions. For instance:

$> ./attention-cuda <context_size> <dim>