...
SLURM partition | Job QOS | # cores/# GPU per job | max walltime | max running jobs per user/ max n. of cores/nodes/GPUs per user | priority | notes |
m100_all_serial | normal | max = 1 core, max mem= 7600MB | 04:00:00 | 4 cpus/1 GPU | 40 | |
qos_install | max = 16 cores | 04:00:00 | max = 16 cores 1 job per user | 40 | request to superc@cineca.it | |
m100_usr_prod | normal | max = 16 nodes | 24:00:00 | 40 | runs on 880 nodes | |
m100_qos_dbg | max = 2 nodes | 02:00:00 | 2 nodes/64cores/8GPUs | 80 | runs on 12 nodes | |
m100_qos_bprod | min = 17 nodes max =256 nodes | 24:00:00 | 256 nodes | 60 | runs on 512 nodes min is 17 FULL nodes (544 cores, 2176 cpus) | |
m100_usr_preempt | normal | max = 16 nodes | 24:00:00 | 1 | runs on 99 nodes | |
m100_fua_prod (EUROFUSION) | normal | max = 16 nodes | 24:00:00 | 40 | runs on 87 nodes | |
m100_qos_fuadbg | max = 2 nodes | 02:00:00 | 45 | runs on 12 nodes | ||
m100_qos_fuabprod | min = 17 nodes max = 32 nodes | 24:00:00 | 40 | run on 64 nodes at same time | ||
all partitions | qos_special | > 32 nodes | > 24:00:00 | 40 | request to superc@cineca.it | |
all partitions | qos_lowprio | max = 16 nodes | 24:00:00 | 0 | active projects with exhausted budget request to superc@cineca.it |
...
This will place the temporary outputs of the nsys code in your TMPDIR folder that by default is /scratch_local/slurm_job.$SLURM_JOB_ID where you have 1 TB of free space.
This workaround may cause conflicts between multiple jobs running this profiler on a compute node at the same time, so we strongly suggest also to request the compute node exclusively:
#SBATCH --exclusive
Nsight Systems can also collect kernel IP samples and backtraces, however, this is prevented by the perf event paranoid level being set to 2 on Marconi100. It is possible to bypass this restriction by adding the SLURM directive:
#SBATCH --gres=sysfs
Along with the exclusive one.
MPI environment
We offer two options for MPI environment on Marconi100:
...