For loops auto-vectorization

Program's name: For loops auto-vectorization
Available version(s): Programming language(s): C ·

For loops auto-vectorization covers the essentials of optimizing the utilization of vector instructions to compute a given data-parallel workload. In this context, the compiler can provide valuable information about the limitations of the program and also hints on how to modify the code to fully optimize it.

To this end, we provide a set of examples, namely vector addition and matrix multiplication, that can be used to highlight possible problems when trying to use vector instructions efficiently. To do so, we structured a synthetic application to show the pattern of having poor vectorized code (for-loops-poor-auto-vec) and also possible best-practices to overcome this issue (for-loops-full-auto-vec).

The vector addition kernel is simply a for loop that iterates over arrays and performs the addition operation element wise:

void vadd(double *c, double *a, double *b, int n)
{
	for(int i=0; i<n; i++) c[i]=a[i]+b[i];
}

The matrix multiplication kernel is implemented as a set of nested for loops that iterate over the rows and columns of the matrices to perform the matrix multiplication:

void matmul(const double* A, const double* B, double* C, const int L, const int M, const int N)
{
	for(int i=0;i<L;i++)
		for(int j=0;j<N;j++)
			for(int k=0;k<M;k++)
				C[i*N+j]+=A[i*M+k]*B[k*N+j];
}

The compiler can generate a report about vector instructions. To do so, you’ll need to provide additional compilation flags. For instance, for the Intel compiler we can add:

  • -qopt-report-phase=vec;
  • -qopt-report=5.

Given this information, we can act accordingly to optimize our code. For further details, please check all available versions.

How to build

For loops auto-vectorization has two versions for-loops-poor-vec (pattern) and for-loops-full-auto-vec (best-practice). Both can be build using the Makefile provided with the source code.

A quick way to compile is: make all.

For a detailed description on how to compile this application, please follow the specific description for each version.

How to execute

Just type in your console the executable you want to run: e.g. ./pattern.all.

For a detailed description on how to run this application, please follow the specific description for each version.