Table of Contents | ||
---|---|---|
|
...
hostname: login.m100.cineca.it
...
SLURM partition | Job QOS | # cores/# GPU per job | max walltime | max running jobs per user/ max n. of cores/nodes/GPUs per user | priority | notes |
m100_all_serial | normal | max = 1 core, max mem= 7600MB | 04:00:00 | 4 cpus/1 GPU | 40 | |
qos_install | max = 16 cores | 04:00:00 | max = 16 cores 1 job per user | 40 | request to superc@cineca.it | |
m100_usr_prod | normal | max = 16 nodes | 24:00:00 | 40 | runs on 880 nodes | |
m100_qos_dbg | max = 2 nodes | 02:00:00 | 2 nodes/64cores/8GPUs | 80 | runs on 12 nodes | |
m100_qos_bprod | min = 17 nodes max =256 nodes | 24:00:00 | 256 nodes | 60 | runs on 512 nodes min is 17 FULL nodes (544 cores, 2176 cpus) | |
m100_usr_preempt | normal | max = 16 nodes | 24:00:00 | 1 | runs on 99 nodes | |
m100_fua_prod (EUROFUSION) | normal | max = 16 nodes | 24:00:00 | 40 | runs on 87 nodes | |
m100_qos_fuadbg | max = 2 nodes | 02:00:00 | 45 | runs on 12 nodes | ||
m100_qos_fuabprod | min = 17 nodes max = 32 nodes | 24:00:00 | 40 | run on 64 nodes at same time | ||
all partitions | qos_special | > 32 nodes | > 24:00:00 | 40 | request to superc@cineca.it | |
all partitions | qos_lowprio | max = 16 nodes | 24:00:00 | 0 | active projects with exhausted budget request to superc@cineca.it |
...
16 full cores are requested and 2 GPUs. The 16x4 (virtual) cpus are used for 4 MPI tasks and 16 OMP threads per task. The -m flag in the srun command specifies the desired process distribution between nodes/socket/cores (the default is block:cyclic). Please refer to srun manual for more details on the processes distribution and binding. The --map-by socket:PE=4 will assign and bind 4 physical consecutive cores to each process (see process mapping and binding on the official IBM Spectrum MPI manual).
> salloc -N1 --ntasks-per-node=32 --cpus-per-task=4 --gres=gpu:2 --partition=...
...
Here you can find Other batch job examples on M100 . You can find more information on process mapping and binding on the official IBM Spectrum MPI manual.
Graphic session
If a graphic session is desired we recommend to use the tool RCM (Remote Connection Manager). For additional information visit Remote Visualization section on our User Guide.
...
This will place the temporary outputs of the nsys code in your TMPDIR folder that by default is /scratch_local/slurm_job.$SLURM_JOB_ID where you have 1 TB of free space.
This workaround may cause conflicts between multiple jobs running this profiler on a compute node at the same time, so we strongly suggest also to request the compute node exclusively:
#SBATCH --exclusive
Nsight Systems can also collect kernel IP samples and backtraces, however, this is prevented by the perf event paranoid level being set to 2 on Marconi100. It is possible to bypass this restriction by adding the SLURM directive:
#SBATCH --gres=sysfs
Along with the exclusive one.
MPI environment
We offer two options for MPI environment on Marconi100:
...
Here you can find some useful details on how to use them on Marconi100.
Warning: When you compile your code using the XL compiler with Spectrum-MPI parallel library (our recommended software stack) you have to use mpirun (not srun) to execute your program.
Spectrum-MPI
It is an IBM implementation of MPI. Together with XL compiler it is the recommended enviroment to be used on Marconi100.
In addition to OpenMPI it adds unique features optimized for IBM systems such as CPU affinity features, dynamic selection of interface libraries, workload manager integrations and better performance.
Spectrum-MPI supports both CUDA-aware and GPUDirect technologies.
...