...
16 full cores are requested and 2 GPUs. The 16x4 (virtual) cpus are used for 4 MPI tasks and 16 OMP threads per task. The -m flag in the srun command specifies the desired process distribution between nodes/socket/cores (the default is block:cyclic). Please refer to srun manual for more details on the processes distribution and binding.
> salloc -N1 --ntasks-per-node=32 --cpus-per-task=4 --gres=gpu:2 --partition=...
export OMP_NUM_THREADS=4
mpirun ./myprogram
32 full cores are requested and 2 GPUs. The 32x4 (virtual) cpus are used for 32 MPI tasks and 4 OMP threads. In this way you are asking for entire node and you can ask for 2 or 3 or 4 GPUs, because you can obtain only cores related to the requested GPUs.
Here you can find Other batch job examples on M100 .
Graphic session
If a graphic session is desired we recommend to use the tool RCM (Remote Connection Manager). For additional information visit Remote Visualization section on our User Guide.
...