Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

-extend_source    Extend over the 77 column F77's limit
-free / -fixed    Free/Fixed form for Fortran
-ip               Enables interprocedural optimization for single-file compilation
-ipo              Enables interprocedural optimization between files - whole program optimisation

 

Compiling for KNL and SKL

Since KNL and SKL nodes are binary compatible with legacy x86 instruction set, any code compiled for normal A1 Marconi  BDW nodes will run on these nodes. However, specific compiler option is needed to generate AVX-512 instructions to derive better performance from these nodes.

Version 15.0 and newer of the Intel compilers can generate these instructions if you specify the  for KNL nodes the "-xMIC-AVX512" flag (which generates specific AVX512 instructions, hence the binary will not work on the Broadwell partition) or the -axMIC-AVX512 flag (which generates optimized executables for both AVX2 and AVX512 ISA):

module load intel
icc -axMICaMIC-AVX512 -O3 -o executable source.c
icpc -axMICaMIC-AVX512 -O3 -o executable source.cc
ifort -axMICaMIC-AVX512 -O3 -o executable source.f

Differently for SKL nodes you have to specify the  -"xCORE-AVX512" flag. When using this option, Intel compilers default to using AVX512 “low”, i.e., a 256-bit version of AVX512 through AVX512-VL (see also compiler documentation for -qopt-zmm-usage=low).

This means that by default, the compiler generates instructions which operate only on 256 bits of the 512 bit registers, but benefit from things like masking and doubled register set size. The most obvious benefit from this approach is that the frequency drop equals only that of AVX2 code.

If you have a code which vectorizes well, you can try experimenting with "-xCORE-AVX512 -qopt-zmm-usage=high" to make the compiler generate AVX-512 code with full 512 bit vectors in use. Doing this will cause the clock frequency of the CPU to drop rather significantly which in turn often causes the code either to run slower or at equal speed compared to for instance AVX2.”

There are certain considerations to be taken into account before running legacy codes on KNL and SKL nodes. Primarily, the effective use of vector instructions is critical to achieve good performance on the There are certain considerations to be taken into account before running legacy codes on KNL nodes. Primarily, the effective use of vector instructions is critical to achieve good performance on KNL cores. For guideline on how to get vectorization information and improve code vectorization, refer to

...