Sparse Matrix Vector Multiplication (SPMxV)

Program's name: Sparse Matrix Vector Multiplication (SPMxV)
Available version(s): Sparse Matrix Vector Multiplication ·
Programming language(s): C++ ·
Programming model(s): OpenMP ·

The SPMXV kernel implements the multiplication \(y \leftarrow Ax\) of a sparse matrix \(A\) and a vector \(x\). The memory bandwidth is the limiting factor in the SpMXV kernel, as by using a sparse matrix data structure the memory accesses become more random. The sparse matrices used in this kernel are stored in the CSR (Compressed Sparse Row) format.

To compute the resulting value for matrix row \(r\), the corresponding matrix values have to be loaded. CSR is efficient in the aspect that for the sparse matrix-vector multiplication, the matrix values are stored consecutively in memory, i.e., the \(n\) values of an arbitrary row \(r\) are stored consecutively as \((a(i), \dots , a(i + n − 1))\). Hence, reading from the matrix has high spatial locality, but again every element is read only once. The elements of the right-hand side vector \(x\) also have to be loaded, but here the memory access pattern depends on the matrix structure. For every matrix element \(a(i)\), the column is stored in a separate vector named col at the same position \(i\), again resulting in consecutive access to col. However, access to \(x\) is random, as the position \(i\) of \(x(i)\) is determined by the value of col\((i)\). Consequently, the reuse of the elements in the vector col from the cache is only successful for blocks/bands in the matrix, resulting in a low spatial and temporal locality of this part of the operation. Finally, SPMXV is by far the most time-consuming operation in a GMRES (or similar iterative) method. The main parameters for the SPMXV kernel are the size of the matrix and the vector as well as the structure of the used matrix. While the size increases the outer loop, the number of non-zero elements in a row determines the iteration count of the inner loop. Depending on the structure of the matrix this may vary and therefore result in a load imbalance.