Page History

...

Since M100 is a general purpose system and it is used by several users at the same time, long production jobs must be submitted using a queuing system. This guarantees that the access to the resources is as fair as possible.
Roughly speaking, there are two different modes to use an HPC system: Interactive and Batch. For a general discussion see the section Production Environment and Tools.

Interactive

A serial program can be executed in the standard UNIX way:

> ./program

Each node of Marconi100 consists in 2 Power9 sockets with 16 cores and 2 Volta GPUs per socket. The multithreading is active with 4 threads per physical core (128 total logical cpus).

Due to how the hardware is detected on a Power9 architecture, the numbering of (logical) cores follows the order of threading:

$ ppc64_cpu --info

Core   0:    0*    1*    2*    3*

Core   1:    4*    5*    6*    7*

Core   2:    8*    9*   10*   11*

Core   3:   12*   13*   14*   15*

.............. (Cores from 4 to 27)........................

Core  28:  112*  113*  114*  115*

Core  29:  116*  117*  118*  119*

Core  30:  120*  121*  122*  123*

Core  31:  125*  126*  127*

Since the nodes can be shared by users, Slurm has been configured to allocate one (physical) task per core by default. Without this option, by default one task will be allocated per thread on nodes with more than one ThreadsPerCore (as it is on Marconi100).

As a result of such configuration, for each requested task a physical core with all its 8 threads will be allocated to the task.

Since a physical core (4 HTs) is assigned to one task, a maximum of 32 tasks per node can be asked (--ntasks-per-node), corresponding (as mentioned) to receive 4 logical cpus per task.

Interactive

A serial program can be executed in the standard UNIX way:

> ./program

This is allowed only for very short runs on the login nodes, since This is allowed only for very short runs on the login nodes, since the interactive environment has a 10 minutes time limit.

...

which shows, for each partition, the total number of nodes and the number of nodes by state in the format "Allocated/Idle/Other/Total".

IMPORTANT:

Please note that the recommended way to launch parallel tasks in slurm jobs is with srun. By using srun vs mpirun you will get full support for process tracking, accounting

...

, task affinity, suspend/resume and other features.

For more information and examples of job scripts, see section Batch Scheduler SLURM.

...

The m100_all_serial partition is available with a maximum walltime of 4 hours, 6 tasks and 18000 MB per job. It runs on two dedicated nodes, and it is designed for pre/post-processing serial analysis, and for moving your data (via rsync, scp etc.) in case more than 10 minutes are required to complete the data transfer. In order to use this partition you have to specify the SLURM flag "-p"This is the default partition, which is assumed by SLURM if you do not explicit request a partition with the flag "–partition" or "-p". You can however explicity request it in your batch script with the directive:

#SBATCH -p m100_all_serial

Submitting Batch jobs for production

sinfo -d lists all the partitions available on M100. Some of them are reserved to dedicated class of users (for example *_fua_ * partitions are for EUROfusion users):

...

m100_fua_prod and m100_fua_dbg, are reserved to EuroFusion users, respectivelly respectively for production and debugging
m100_usr_prod and m100_usr_dbg are opened to academic production.

...

Each node exposes itself to SLURM as having 32 cores, 4 GPUs and 230 GB 230000 MB of memory. SLURM assigns a node in shared way, assigning to the job only the resources required and allowing multiple jobs running on the same node/nodes. If you want to have the node/s in exclusive mode, ask for all the resources of the node (either ncpusntasks-per-node=32 or ngpus=4 or mem=230000).

The maximum memory which can be requested is 230000MB (with a medium memory available per physical core of ~7 GB) and this value guarantees that no memory swapping will occur.

medium memory available per core ~7.1 GB

For example, to request a single core node in a production queue the following SLURM job script can be used:

that no memory swapping will occur.

For example, to request one core and one GPU in a production queue the following SLURM job script can be used:

#!/bin/bash
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1            # this refers to the number of requested gpus per node, and can vary between 1 and 4#!/bin/bash
#SBATCH -N 1
#SBATCH -A <account_name>
#SBATCH --mem=2300007100              #  <-- sostituire con memoria corrispondedte a 1 corethis refers to the requested memory per node with a maximum of XXXXXX
#SBATCH -p m100_usr_prod
#SBATCH --time 00:05:00
#SBATCH --job-name=my_batch_job
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<user_email>

...

Page tree

Versions Compared

Old Version 40

New Version 41

Key

Interactive

Interactive

Submitting Batch jobs for production