...
In the following table you can find the main features and limits imposed on the partitions of M100.
Note: core refers to a physical cpu, with its 4 HTs; cpu refers to a logical cpu (1 HT). Each node has 32 cores/128 cpus.
SLURM partition | Job QOS | # cores/# GPU per job | max walltime | max running jobs per user/ max n. of cpuscores/nodes/GPUs per user | priority | notes |
m100_all_serial (def. partition) | normal | max = 1 core, max mem= 7600MB | 04:00:00 | 4 cpus/1 GPU | 40 | |
m100_usr_prod | normal | max = 16 nodes | 24:00:00 | 40 | runs on 880 nodes | |
m100_qos_dbg | max = 2 nodes | 02:00:00 | 2 nodes/64cpus64cores/8GPUs | 4580 | runs on 12 nodes | |
m100_qos_bprod | min = 17 nodes max =256 nodes | 24:00:00 | 256 nodes | 8560 | runs on 512 nodes min is 17 FULL nodes (544 cores, 2176 cpus) | |
m100_usr_preempt | normal | max = 16 nodes | 24:00:00 | 1 | runs on 99 nodes | |
m100_fua_prod (EUROFUSION) | normal | max = 16 nodes | 24:00:00 | 40 | runs on 87 nodes | |
m100_qos_fuadbg | max = 2 nodes | 02:00:00 | 45 | runs on 12 nodes | ||
m100_qos_fuabprod | max = 32 nodes | 24:00:00 | 40 | run on 64 nodes at same time | ||
all partitions | qos_special | > 32 nodes | > 24:00:00 | 40 | request to superc@cineca.it | |
all partitions | qos_lowprio | max = 16 nodes | 24:00:00 | 0 | active projects with exhausted budget |
...