A code for timing parallel matrix multiplication, where all processes have access to the data in A and B, and the parallelisation is achieved by splitting B over the columns, i.e.:
\[\bf {C}_b = \bf {A} \times \bf {B}_b\]where, \(\bf {A}\) is a matrix with \(m\) rows and \(k\) columns (i.e. \(m \times k\)), \(\bf {B}\) is a \(k \times n\) matrix so that \(\bf {C}\) is a \(m \times n\) matrix. Moreover, \(\bf {C}_b\) and \(\bf{B}_b\) are matrix blocks generated by splitting over the columns of matrix \(\bf {B}\). For \(p\) processes, the blocks \(\bf {B}_b\) are \(k \times w\), where \(w=n/p\), and the blocks \(\bf{C}_b\) are \(m \times w\). For simplicity, we assume the number of columns splits evenly over the processes.
How to compile and run:
For each of the version, a makefile script is provided to generate the executable file. In the makefile script, the Intel Fortran compiler is used by default. To use different versions of BLAS (e.g. OpenBLAS), PAPI, and compilers (GNU compiler) the makefile script must be edited.
To run a version of the application, a list of values must be provided by the user. The number of input values varies depending on the version, also to launch the application the number of MPI processes must be provided.
mpirun -np <n_procs> <executable_name> <list_of_parameters>
For both versions, when an input value is less than 1 then a default value is assigned, also, when one of the input value is missing then the application terminates with an error message explaining how to run it. A screen with useful data is output after the correct termination of the application.
Related reports: BAND ·