...
hostname: login.marconi100m100.cineca.it
early availability: march 2020April 2020
start of production: to be defined (2020)
...
Login nodes: 8 Login IBM Power9 LC922 (similar to the compute nodes)
Model: IBM Power AC922 (Whiterspoon) Racks: 55 total (49 compute) |
---|
Access
All the login nodes have an identical environment and can be reached with SSH (Secure Shell) protocol using the "collective" hostname:
...
The srun command will take by default PMI2 as MPI type.
Please note that
1) The recommended way to launch parallel tasks in slurm jobs is with srun. By using srun vs mpirun you will get full support for process tracking, accounting, task affinity, suspend/resume and other features.
...
The info reported here refer to the general user M100 partition. The production environment of MARCONI_Fusionfor EUROfusion users is discussed in a separate document.
...
For more information and examples of job scripts, see section Batch Scheduler SLURM.
Submitting serial Batch jobs
The m100_all_serial partition is available with a maximum walltime of 4 hours, 6 tasks and 18000 MB per job. It runs on two dedicated nodes, and it is designed for pre/post-processing serial analysis, and for moving your data (via rsync, scp etc.
...
) in case more than 10 minutes are required to complete the data transfer. In order to use this partition you have to specify the SLURM flag "-p":
#SBATCH -p m100_all_serial
Submitting Batch jobs for production
sinfo -d lists all the partitions available on M100. Some of them are reserved to dedicated class of users (for example *_fua_ * partitions are for EUROfusion users):
- m100_fua_prod and m100_fua_dbg, are reserved to EuroFusion users, respectivelly for production and debugging
- m100_usr_prod and m100_usr_dbg are opened to academic production.
Each node exposes itself to SLURM as having 32 cores, 4 GPUs and xx GB of memory. SLURM assigns a node in shared way, assigning to the job only the resources required and allowing multiple jobs running on the same node/nodes. If you want to have the node/s in exclusive mode, ask for all the resources of the node (hence, ncpus=32 or ngpus=4 or all the memory).
The maximum memory which can be requested is 182000MB and this value guarantees that no memory swapping will occur.
For example, to request a single node in a production queue the following SLURM job script can be used:
#!/bin/bash
#SBATCH -N 1
#SBATCH -A <account_name>
#SBATCH --mem=180000 <-- sostituire con memoria corrispondedte a 1 core
#SBATCH -p m100_usr_prod
#SBATCH --time 00:05:00
#SBATCH --job-name=my_batch_job
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<user_email>
srun ./myexecutable
Users with exhausted but still active projects are allowed to keep using the cluster resources, even if at a very low priority, by adding the "qos_lowprio" flag to their job;
(inserire la richiesta di QOS)
Summary
In the following table you can find all the main features and limits imposed on the queues/Partitions of M100.
SLURM partition | QOS | # cores per job | max walltime | max running jobs per user/ max n. of cpus/nodes per user | max memory per node (MB) | priority | notes |
m100_all_serial (default partition) | noQOS | max = 6 (max mem= 18000 MB) | 04:00:00 | 6 cpus | 18000 | 40 | |
qos_rcm | min = 1 max = 48 | 03:00:00 | 1/48 | 182000 | - | to be defined | |
m100_usr_dbg | no QOS | min = 1 node max = 4 nodes | 00:30:00 | 4/4 | 182000 | 40 | runs on 24 dedicated nodes |
m100_usr_prod | no QOS | min = 1 node max = 64 nodes | 24:00:00 | 64 nodes | 182000 | 40 | |
skl_qos_bprod | min=65 nodes max = 256 nodes | 24:00:00 | 1/256 1 jobs per account | 182000 | 85 | #SBATCH -p skl_usr_prod #SBATCH --qos=skl_qos_bprod | |
qos_special | >256 nodes | >24:00:00 (max = 64 nodes for user) | 182000 | 40 | #SBATCH --qos=qos_special request to superc@cineca.it | ||
qos_lowprio | max = 64 nodes | 24:00:00 | 64 nodes | 182000 | 0 | #SBATCH --qos=qos_lowprio |
Graphic session
If a graphic session is desired we recommend to use the tool RCM (Remote Connection Manager). For additional information visit Remote Visualization section on our User Guide.