UG3.4: Leonardo UserGuide

hostname: login.leonardo.cineca.it

early availability: March, 2023

start of pre-production: May, 2023 (Booster)

last quarter 2023 (Data Centric)

This system is the new pre-exascale Tier-0 EuroHPC supercomputer hosted by CINECA and currently built in the Bologna Technopole, Italy. It is supplied by ATOS, based on a BullSequana XH2135 supercomputer nodes, each with four NVIDIA Tensor Core GPUs and a single Intel CPU. It also uses NVIDIA Mellanox HDR 200Gb/s InfiniBand connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability.

System Architecture

Architecture: Atos BullSequana XH21355 "Da Vinci" blade - Booster - Atos BullSequana X2610 compute blade - Data-centric (will be available in the last quarter of the 2023)
Internal Network: Nvidia Mellanox HDRDragonFly+ 200 Gb/s
Storage: 106 PB (raw) Large capacity storage, 620 GB/s
High Performance Storage 5.4 PB, 1.4 TB/s Based on 31 x DDN Exascaler ES400NVX2

Login nodes: in β production 1 (16 later): login14 accessible via IP 131.175.43.130, icelake nogpu

	Booster	Data Centric
Model	Atos BullSequana XH21355 "Da Vinci" blade	Atos BullSequana X2610 compute blade
Racks	150
Nodes	3456	1536
Processors	32 cores Intel Ice Lake Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz	56 cores Intel Sapphire Rapids
Accelerators	4 x NVIDIA Ampere GPUs/node, 64GB HBM2	-
Cores	32 cores/node	56 cores/node
RAM	512 (8x64) GB DDR4 3200 MHz	(16 x 32) GB DDR5 4800 MHz
Peak Performance	about 309 Pflop/s	9 Pflops/s
Internal Network	NVIDIA Mellanox HDR DragonFly++ 200Gb/s 2 x NVIDIA HDR 2×100 Gb/s cards 1x Nvidia HDR100 100 Gb/s card
Disk Space	106PB Large capacity storage 5.4 PB of High performance storage

The following guide is already with the production configuration. In the very next days will begin the pre-production phase in which is mandatory the access via 2FA. Please refer to the Access section bellow in the Leonardo User Guide.

Peak performance details

Node Performance
Theoretical Peak Performance	CPU (nominal/peak freq.)	1680 Gflops
	GPU	75000 Gflops
	Total	76680 GFlops
Memory Bandwidth (nominal/peak freq.)		24.4 GB/s

Access

All the login nodes have an identical environment and can be reached with SSH (Secure Shell) protocol using the "collective" hostname:

> login.leonardo.cineca.it

The mandatory access to Leonardo is the two-factor authentication (2FA). Please refer to this link of the User Guide to activate and connect via 2FA. For information about data transfer from other computers please follow the instructions and caveats on the dedicated section Data storage or the document Data Management.

Software environment

The available software environment is based on spack and modules, and needs to be activated. Some vendor installations are also available and presented in an lmod environment on the login node, but we warmly encourage the beta testers to use the spack environment to provide a valuable feedback on the software stack provided by CINECA.

The temporary stack is available with:

$ source /home/cinprod/spack_setup.sh
$ module use /home/cinprod/spack/02/modules/BA/0.19/
$ module load spack

Please note that we are still optimizing Leonardo software stack, and more installations may be added/replaced. Always check with "module av" (the hash in the module name can change).

Beta-production environment

The production environment is based on the slurm scheduler, already in place on the cluster but in a very preliminary configuration.

The only available partition is "prod" (#SBATCH --partition=prod). Please refer to the general online guide to slurm and on task/thread bindings, and please pay attention to the setting of the SRUN_CPUS_PER_TASK for hybrid applications dispatched with "srun". In this preliminary configuration, please explicit the request of the correct pmix plugin when launching your parallel applications with "srun": srun --mpi=pmix_v3 <options> <exe>. No mpi settings are needed if you launch with "mpirun".
The GPUs are not yet defined as G(eneral)res(ources) (Gres), and all the 4 GPUs of a node will be available in a job. Do not ask for gres=gpu:X (or analogous --gpus-per-node) in your script. Take the node in exclusive with the #SBATCH --exclusive directive
The $SBATCH --exclusive directive is also recommended to avoid annoying drawbacks on the $TMPDIR of job

Pre-production environment

Storage:

the scratch areas are now available ($CINECA_SCRATCH or $SCRATCH)
home filesystem:
- BETA-0 users: your old home is not accessible anymore (but your data are still there). We already started the sync of these homes' contents to the corresponding user scratch. Due to the huge amount of data the sync process is taking a very long time. We'll inform you when the copy is finished (a stop will be required for the last rsync). After a check from your side that everything is fine, we'll proceed with the data removal in your old homes. You can however reprise your activity now on your scratch area (if what has been synced until now is sufficient for you to submit new jobs).
- BETA-1 users: you started already with the home at the correct path, /leonardo/home/userexternal/<username>. The 100 GB quota is enforced as well. Please copy the contents not supposed to be in your $HOME to your $SCRATCH (we will NOT move the contents of the new homes), and remove from $HOME the transferred data. You can reprise your activity now on your scratch area, just copy there the needed input files and scripts.
work filesystem: the $WORK areas are not available yet. Until they will be configured and put in place the automatic cleaning of the scratch area will NOT be active.

Slurm:

use the "boost_usr_prod" partition in the place of "prod". You can use at most 32 nodes on this partition (MaxTime=24:00:00). Please request the boost_qos_bprod QOS to go up to 512 nodes (MaxTime=10:00:00) This limit will be in place until May 25, when it will be reduced to 256 nodes with MaxTime=24:00:00 (production environment) before May 25.
you have to request the gpus with --gres=gpu:X or --gpus-per-node=X
the --mpi=pmix_v3 is not required anymore when launching with srun

Page tree