DENISE Black Edition subnormals kernel

Version's name: DENISE Black Edition subnormals kernel ; a version of the DENISE Black Edition subnormals kernel program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]
Recommended best-practices:

This kernel is an extraction of the hotspot kernel from DENISE Black Edition, a 2D time-domain isotropic (visco)elastic FD modeling and FWI code. The kernel is extracted from the code part that calculates the propagation of SH waves. It showcases the significant slowdown of subnormal floating-point calculations when they are not handled properly.

The kernel can be compiled with or without flush-to-zero (FTZ) enabled to showcase potential performance differences due to subnormal handling. It can be run by using the provided Makefile as described below.

Prerequisites

  • A C compiler that supports setting the -mdaz-ftz, either GCC (set by default) or Clang.
  • perf for measuring the hardware performance counters
  • Optionally, an Intel CPU to measure floating-point assists (not available for other CPUs)

Compiling the kernel

To compile the kernel, use the provided Makefile that uses GCC by default for compilation. The Makefile includes the following compile targets:

  • kernel: Compiles the kernel with default settings, FTZ disabled.
  • kernel_ftz: Compiles the kernel with flush-to-zero (FTZ) mode, enabled by compiling with -mdaz-ftz.
  • kernel_nooutput: As kernel, but without verbose output during execution.
  • kernel_ftz_nooutput: As kernel_ftz, but without verbose output during execution.
  • all: Compiles all versions of the kernel.

Running the kernel

The following run targets are available in the Makefile:

  • run: Runs the kernel compiled with FTZ disabled, outputs the execution time and number of subnormals occurring in update_s_elastic_PML_SH (for reference) for each iteration.
  • run_ftz: Runs the kernel with FTZ enabled.
  • perf_ipc: Runs the kernel without FTZ and measures instructions per cycle (IPC) using perf.
  • perf_ipc_ftz: Runs the kernel with FTZ enabled and measures IPC using perf.
  • perf_assists (only available for Intel CPUs): Runs the kernel without FTZ and measures floating-point assists using perf.
  • perf_assists_ftz (only available for Intel CPUs): Runs the kernel with FTZ enabled and measures floating-point assists using perf.
The following experiments have been registered: