Page History

...

The mandatory access to Leonardo is the two-factor authentication (2FA). Please refer to this link of the User Guide to activate and connect via 2FA. For information about data transfer from other computers please follow the instructions and caveats on the dedicated section Data storage or the document Data Management.

Accounting

In this pre-production phase the accounting will be soon available.

For accounting information please consult our dedicated section.

The account_no (or project) is important for batch executions. You need to indicate an account_no to be accounted for in the scheduler, using the flag "-A"

#SBATCH -A <account_no>

With the "saldo -b" command you can list all the account_no associated with your username.

> saldo -b   (reports projects defined on LEONARDO )

Please note that the accounting is in terms of consumed core hours, but it strongly depends also on the requested memory and number of GPUs, please refer to the dedicated section.

Budget Linearization policy

On LEONARDO, as on the other HPC clusters in Cineca, a linearization policy for the usage of project budgets has been defined and implemented. The goal is to improve the response time, giving users the opportunity of using the cpu hours assigned to their project in relation to their actual size (total amount of core-hours).

Disks and Filesystems

The storage organization conforms to the CINECA infrastructure (see Section Data Storage and Filesystems).

In addition to the home directory $HOME, for each user is defined a scratch area $CINECA_SCRATCH, a large disk for the storage of run time data and files.

A $WORK area is defined for each active project on the system, reserved for all the collaborators of the project. In this pre-production phase the $WORK area is not yet available. Until the $WORK areas will be configured and put in place the automatic cleaning of the scratch area will NOT be active.

	Total Dimension (TB)	Quota (GB)	Notes
$HOME	0.46 PiB	70GB per user	permanent/backed up, user specific, local
$CINECA_SCRATCH	41.4 PiB	no quota	/leonardo_scratch/fast (confinata sugli OST flash)) large (confinata sugli OST HDD) temporary, user specific, local no backup automatic cleaning procedure of data older than 40 days (time interval can be reduced in case of critical usage ratio of the area. In this case, users will be notified via HPC-News)
$WORK	not yet available (10PB)		permanent, project specific, local no backup extensions can be considered if needed (mailto: superc@cineca.it)

work filesystem: the $WORK areas are not available yet. Until they will be configured and put in place the automatic cleaning of the scratch area will NOT be active.

It is also available a temporary storage local on compute nodes generated when the job starts and accessible via environment variable $TMPDIR. For more details please see the dedicated section of UG2.5: Data storage and FileSystems. On LEONARDO the $TMPDIR local area has 1 TB of available space.

$DRES environment variable points to the shared repository where Data RESources are maintained. This is a data archive area available only on-request, shared with all CINECA HPC systems and among different projects. $DRES is not mounted on the compute nodes of the production partitions and can be accessed only from login nodes and from the nodes of the serial partition. This means that you cannot access it within a standard batch job: all data needed during the batch execution has to be moved to $CINECA_SCRATCH before the run starts, either from the login nodes or via a job submitted to the serial partition.

...

Software environment

Module environment

The software modules are collected in different profiles and organized by functional categories (compilers, libraries, tools, applications,...). The profiles are of two types: “programming” type (base and advanced) for compilation, debugging and profiling activities, and “domain” type (chem-phys, lifesc,..) for the production activity. They can be loaded together.

"Base" profile is the default. It is automatically loaded after login and it contains basic modules for the programming activities (ibm, gnu, pgi, cuda compilers, math libraries, profiling and debugging tools,..).

If you want to use a module placed under other profiles, for example an application module, you will have to load preventively the corresponding profile:

>module load profile/<profile name>
>module load autoload <module name>

For listing all profiles you have loaded you can use the following command:

>module list

In order to detect all profiles, categories and modules available on LEONARDO the command “modmap” will be soon available as for the other clusters. With modmap you can see if the desired module is available and which profile you have to load to use it.

>modmap -m <module_name>

Spack environment

In case you don't find a software you are interested in, you can install it by yourself.
In this case, on Leonardo we also offer the possibility to use the “spack” environment by loading the corresponding module. Please refer to the dedicated section in UG2.6: Production Environment

Please note that we are still optimizing Leonardo software stack, and more installations may be added/replaced. Always check with "module av" (the hash in the module name can change).

GPU and intra/inter connection environment

It will be described soon.

Production environment

Since LEONARDO is a general purpose syste and is used by several users at the same time, long production jobs must be submitted using a queuing system (scheduler). The scheduler guarantees that the access to the resources is as fair as possible

The production environment on LEONARDO is based on the slurm scheduler, already in place on the cluster but still not complete and in a pre-production configuration.

Leonardo is based on a policy of node sharing among different jobs, i.e. a job can ask for resources and these can also be a part of a node, for example few cores and 1GPU. This means that, at a given time, one physical node can be allocated to multiple jobs of different users. Nevertheless, exclusivity at the level of the single core is guaranteed by low-level mechanisms.

Roughly speaking, there are two different modes to use an HPC system: Interactive and Batch. For a general discussion see the section Production Environment.

Interactive

A serial program can be executed in the standard UNIX way:

> ./program

This is allowed only for very short runs on the login nodes, and very soon the interactive environment will have 10 minutes cpu-time limit. Please do not execute parallel applications on the login nodes!

Batch

As usual on HPC systems, the large production runs are executed in batch mode. This means that the user writes a list of commands into a file (for example script.x) and then submits it to a scheduler (SLURM for Leonardo) that will search for the required resources in the system. As soon as the resources are available script.x is executed and the results and sent back to the user.

This is an example of script file:

#!/bin/bash
#SBATCH -A <account_name>
#SBATCH -p boost_usr_prod
#SBATCH --time 00:10:00     # format: HH:MM:SS
#SBATCH -N 1                # 1 node
#SBATCH --ntasks-per-node=8 # 8 tasks out of 128
#SBATCH --gres=gpu:1        # 1 gpus per node out of 4
#SBATCH --mem=7100          # memory per node out of 494000MB 
#SBATCH --job-name=my_batch_job
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<user_email>
mpirun ./myexecutable #in case you compiled with spectrum-mpi
OR
srun ./myexecutable #in all the other cases

Please refer to the general online guide to slurm and on task/thread bindings, and please pay attention to the setting of the SRUN_CPUS_PER_TASK for hybrid applications dispatched with "srun".
The $SBATCH --exclusive directive is also recommended to avoid annoying drawbacks on the $TMPDIR of job

You can write your script file (for example script.x) using any editor, then you submit it using the command:

> sbatch script.x

The script file must contain both directives to SLURM and commands to be executed, as better described in the section Batch Scheduler SLURM .

Using SLURM directives you indicate the account_number (-A: which project pays for this work), where to run the job (-p: partition), what is the maximum duration of the run (--time: time limit). Moreover you indicate the resources needed, in terms of cores, GPUs (later) and memory.

One of the commands will be probably the launch of a parallel MPI application. In this case the right command is srun, as an alternative to the usual mpirun command. In this way you will get full support for process tracking, accounting, task affinity, suspend/resume and other features

SLURM partitions

A list of partitions defined on the cluster, with access rights and resources definition, can be displayed with the command sinfo:

> sinfo -o "%10D %20F %P"

The command returns a more readable output which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total".

In the following table you can find the main features and limits imposed on the partitions of Leonardo.

SLURM partition	Job QOS	# cores/# GPU per job	max walltime	max running jobs per user/ max n. of cores/nodes/GPUs per user	priority	notes
lrd_all_serial (default)	normal	max = 1 core, 1GPU	04:00:00	4 cpus/1 GPU	40
lrd_all_serial (default)	qos_install	max = 16 cores	04:00:00	max = 16 cores 1 job per user	40	request to superc@cineca.it
boost_usr_prod	normal	max = 32 nodes	24:00:00		40	runs on all nodes
	boost_qos_dbg	max = 2 nodes	02:00:00	2 nodes / 64 cores / 8 GPUs	80	runs on 24 nodes
	boost_qos_bprod	min = 33 nodes max =256 nodes	24:00:00	256 nodes	60	runs on 512 nodes min is 33 FULL nodes
	lrd_qos_lprod	max = 2 nodes	100:00:00	4 nodes	40

For the "boost_usr_prod" partition in the place of "prod". You can use at most 32 nodes on this partition (MaxTime=24:00:00). Please request the boost_qos_bprod QOS to go up to 512 nodes (MaxTime=10:00:00) This limit will be in place until May 25, when it will be reduced to 256 nodes with MaxTime=24:00:00 (production environment) before May 25.

Graphic session

It will be available soon.

Programming environment

Compilers

It will be available soon.

Debugger and Profilers

It will be available soon.

...

Beta-production environment

The production environment is based on the slurm scheduler, already in place on the cluster but in a very preliminary configuration.

The only available partition is "prod" (#SBATCH --partition=prod). Please refer to the general online guide to slurm and on task/thread bindings, and please pay attention to the setting of the SRUN_CPUS_PER_TASK for hybrid applications dispatched with "srun". In this preliminary configuration, please explicit the request of the correct pmix plugin when launching your parallel applications with "srun": srun --mpi=pmix_v3 <options> <exe>. No mpi settings are needed if you launch with "mpirun".
The GPUs are not yet defined as G(eneral)res(ources) (Gres), and all the 4 GPUs of a node will be available in a job. Do not ask for gres=gpu:X (or analogous --gpus-per-node) in your script. Take the node in exclusive with the #SBATCH --exclusive directive
The $SBATCH --exclusive directive is also recommended to avoid annoying drawbacks on the $TMPDIR of job

Pre-production environment

Storage:

the scratch areas are now available ($CINECA_SCRATCH or $SCRATCH)
home filesystem:
- BETA-0 users: your old home is not accessible anymore (but your data are still there). We already started the sync of these homes' contents to the corresponding user scratch. Due to the huge amount of data the sync process is taking a very long time. We'll inform you when the copy is finished (a stop will be required for the last rsync). After a check from your side that everything is fine, we'll proceed with the data removal in your old homes. You can however reprise your activity now on your scratch area (if what has been synced until now is sufficient for you to submit new jobs).
- BETA-1 users: you started already with the home at the correct path, /leonardo/home/userexternal/<username>. The 100 GB quota is enforced as well. Please copy the contents not supposed to be in your $HOME to your $SCRATCH (we will NOT move the contents of the new homes), and remove from $HOME the transferred data. You can reprise your activity now on your scratch area, just copy there the needed input files and scripts.
work filesystem: the $WORK areas are not available yet. Until they will be configured and put in place the automatic cleaning of the scratch area will NOT be active.

Slurm:

...

Page tree

Versions Compared

Old Version 25

New Version 26

Key

Accounting

Budget Linearization policy

Disks and Filesystems

Software environment

Module environment

Spack environment

GPU and intra/inter connection environment

Production environment

Interactive

Batch

SLURM partitions

Graphic session

Programming environment

Compilers

Debugger and Profilers

Beta-production environment

Pre-production environment