Alya is a simulation code for high performance computational mechanics. Alya solves coupled multiphysics problems using high performance computing techniques for distributed and shared memory supercomputers, together with vectorization and optimization at the node level.
This kernel corresponds to the algebraic system assembly of a finite element code (FE) for solving partial differential equations (PDE’s). The matrix assembly consists of a loop over the elements to compute element matrices and right-hand sides and their assemblies in the local system.
This version of the kernel leverages the use of multidependencies features provided by OmpSs programming model to avoid the use of atomic constructs.
The source code for this version and the master one is the same, a preprocessor directive (ALYA_OMPSS, defined in this case) is used to choose which of the two versions is compiled.
This version of the kernel requires OmpSs and gfortran to be built. If you do not have OmpSs available on your system, you can follow this link:
https://pm.bsc.es/ftp/ompss/doc/user-guide/installation.html
for a complete guide of installation.
For building the kernel, just type:
#> ./compile_alya_ompss.sh
#> export OMP_NUM_THREADS=16 (or whatever you want)
#> ./miniapp_ALYA_OMPSS.x --implicit ./tests/cavtet04_600_MM1_ALYA_OMPSS_16.bin
The “–implicit” execution flag just tells the program to run an implicit assembly (as opposed to explicit assembly). Only implicit assembly is considered for both vers ions of the program (master and multidependencies).
The directory “tests/” contains the input files needed to run the program. They are in binary format so it is not possible to modify them.
If everything goes fine, you should see something like that:
start read
miniapp_read
miniapp_read: element integration
miniapp_read: mesh data
miniapp_read: parallel data
miniapp_read: ompss data
end read
----------------------------------
nelem= 318320
npoin= 57604
VECTOR_SIZE= 16
par_omp_nelem_chunk= 600
num_subd_par= 530
----------------------------------
Using OpenMP
NUM_THREADS= 16
MAX_THREADS= 16
Number of subdomains= 530
Chunk size (can be changed)= 600
Max neighbors= 22
IMPLICIT METHOD -------------->
time= 0.13966536056250334
time= 0.13289238139986992
time= 0.13748298678547144
time= 0.19586112350225449
time= 0.14218118786811829
time= 0.15607176814228296
time= 0.13923503085970879
time= 0.16393845435231924
time= 0.16770993731915951
time= 0.16430170554667711
time= 0.16796042304486036
loop finished correctly, time= 0.15676349988207222