Alya assembly (multidependencies)

Version's name: Alya assembly (multidependencies) ; a version of the Alya assembly program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]
Implemented best practices: Multidependencies in indirect reductions on large data structures ·

Alya is a simulation code for high performance computational mechanics. Alya solves coupled multiphysics problems using high performance computing techniques for distributed and shared memory supercomputers, together with vectorization and optimization at the node level.

This kernel corresponds to the algebraic system assembly of a finite element code (FE) for solving partial differential equations (PDE’s). The matrix assembly consists of a loop over the elements to compute element matrices and right-hand sides and their assemblies in the local system.

This version of the kernel leverages the use of multidependencies features provided by OmpSs programming model to avoid the use of atomic constructs.

The source code for this version and the master one is the same, a preprocessor directive (ALYA_OMPSS, defined in this case) is used to choose which of the two versions is compiled.

BUILDING INSTRUCTIONS

This version of the kernel requires OmpSs and gfortran to be built. If you do not have OmpSs available on your system, you can follow this link:

https://pm.bsc.es/ftp/ompss/doc/user-guide/installation.html

for a complete guide of installation.

For building the kernel, just type:

#> ./compile_alya_ompss.sh

EXECUTING INSTRUCTIONS

#> export OMP_NUM_THREADS=16 (or whatever you want)
#> ./miniapp_ALYA_OMPSS.x --implicit ./tests/cavtet04_600_MM1_ALYA_OMPSS_16.bin

The “–implicit” execution flag just tells the program to run an implicit assembly (as opposed to explicit assembly). Only implicit assembly is considered for both vers ions of the program (master and multidependencies).

The directory “tests/” contains the input files needed to run the program. They are in binary format so it is not possible to modify them.

If everything goes fine, you should see something like that:

 start read
 miniapp_read
 miniapp_read: element integration
 miniapp_read: mesh data
 miniapp_read: parallel data
 miniapp_read: ompss data
 end   read
 ----------------------------------
 nelem=                     318320
 npoin=                      57604

 VECTOR_SIZE=                   16
 par_omp_nelem_chunk=          600
 num_subd_par=                 530
 ----------------------------------
 Using OpenMP
 NUM_THREADS=           16
 MAX_THREADS=           16
 Number of subdomains=         530
 Chunk size (can be changed)=         600
 Max neighbors=          22
 IMPLICIT METHOD -------------->
 time=  0.13966536056250334
 time=  0.13289238139986992
 time=  0.13748298678547144
 time=  0.19586112350225449
 time=  0.14218118786811829
 time=  0.15607176814228296
 time=  0.13923503085970879
 time=  0.16393845435231924
 time=  0.16770993731915951
 time=  0.16430170554667711
 time=  0.16796042304486036
 loop finished correctly, time=  0.15676349988207222

The following experiments have been registered:

Comparison of the original code using atomics and the multidependencies version