In Aug 2017 CINECA has added the third partition of Marconi, 1512 nodes  housing 2 Intel Xeon 8160 ( SkyLake or SKL). They will be upgraded within the end of 2017 by additional 800 similar nodes, completing in this way the Marconi project and making available a system of about 20 PFlops. This document provides a getting started guide with regards to using the SKL nodes. For further help please email

System Configuration

Each SKL node consists of:

  • 2 x 24-core  Intel Xeon 8160 CPU with base clock speed 2.10 GHz (32 double-precision FLOPs/cycle), providing a peak performance about 3.23 TFLOPs per node
  • 192 GB of DDR4 RAM

  • 100 Gb/s Intel OmniPath 2:1 between SKL nodes
  • (240 GB SATA local disk)

They support vectorisation instruction such as SSe and AVX up to AVX-512.

Interconnection between two sockets of the same node is made via UPI links at 10.4 GT/s of bandwidth. There are up to 3 links within a single node, bringing the overall theoretical performance at 62.4 GT/s.

The three partitions (A1 - BDW; A2 - KNL, A3 - SKL) share the same login nodes ( and storage (based on GPFS). The three partitions are served by three different PBS servers that must be selected in order to address the required resource. Starting from September 26th, 2018 the activity on Marconi-A1 has been stopped.

Who can use the SKL partition

The Marconi-A3 partition, based on SKL chips is expected to be completed before the end of year 2017, and made of more than 30 racks and about 2.300 nodes. The first step consists in the availability of the first 21 racks (1512 nodes) on August the 7th, 2017. This resource is at exclusive use of EUROfusion users. The second part (11 racks and 792 nodes) will be opened to a wider scientific communities usage.

Login nodes and storage

The SKL partition shares the environment (login nodes and storage) with the other (already existent) partition A2-KNL. There are three login nodes, connected in a round-robin way, that can be collectively reached with a single name


There are several storage areas on the system, $HOME and $CINECA_SCRATCH defined on a user basis, $WORK, defined on a project basis, in addition to a long-term archive (DRES). For information, please refer to our UserGuide (UG2.4: Data storage and FileSystems).

Compiling Code for SKL

Since SKL is binary compatible with legacy x86 instruction set, any code compiled for normal A1/A2 Marconi  nodes will run on these nodes. However, specific compiler option is needed to generate code optimized for the architecture and the supported features  as AVX-512 instruction sets, and derive better performance from these nodes.

Intel Compilers

The Intel compiler (version 17.0 and newer) is the best possible tool for optimizing codes on Marconi. The basic modules of the Intel suite should be loaded using the "module" commands:

module load intel (compilers)
module load intelmpi (Intel MPI software stack)
module load mkl (math kernel libraries)

The Intel compilers can generate optimal instruction for SKL if you specify CFLAGS=“-xCORE-AVX512”

module load intel

icc -xCORE-AVX512 -O3 -o executable source.c
icpc -xCORE-AVX512 -O3 -o executable
ifort -xCORE-AVX512 -O3 -o executable source.f

GNU compilers

The GCC provided by the system is version 4.8.5. For better support of new hardware features we recommend to use the latest version that can be loaded via the module command. Currently the latest version available is GCC 6.1.0

    module load gnu

The corresponding flags for optimizing on SKL are CFLAGS=“-march=skylake-avx512”

module load gnu

gcc -march=skylake-avx512 -O3 -o executable source.c
g++ -march=skylake-avx512 -O3 -o executable
gfortran -march=skylake-avx512 -O3 -o executable source.f

Production Environment

Since MARCONI is a general purpose system and it is used by several users at the same time, long production jobs must be submitted using a queuing system. Please refer to the dedicated section of MARCONI Users Guide.

Optimizing Code for SKL - Vectorization 

There are certain considerations to be taken into account before running legacy codes on SKLL nodes. Primarily, the effective use of vector instructions is critical to achieve good performance on SKL cores. For guideline on how to get vectorization information and improve code vectorization, refer to

 How to Improve Code Vectorization

  • No labels