Copy-How to bind MPI tasks/ OpenMP threads to logical/physical cpus in SLURM

MPI tasks

By default SLURM uses the auto-affinity module to bind MPI tasks to cpus. In some cases you may want to modify this default, in order to ensure optimal performances on Marconi-A2 (KNL) and Marconi-A3 (SkyLake).

You can find below the guidelines to do it.

1) If you require for a single KNL or SKL node (exlusive nodes) to use less tasks than the available physical cpus (< 48 cpus on SKL nodes, < 68 cpus on KNL nodes), you have to specify the following slurm option as a parameter of the "srun" command line:

--cpu-bind=cores

Alternatively, you can request more cpus for the single task so that you will use all the cpus of the node:

--cpus-per-task=<n° cpus per node / n° tasks per node>

If not specified, the default of --cpus-per-task is 1.

For SKL nodes:

--cpus-per-task=<48 / n° tasks per node>

For KNL nodes:

you have to remember that hyperthreading is enabled, thus the “ n° cpus per node” refers to logical cpus rather than physical. So there are in total 272 maximum logical cpus for a single node (68 physical cpus * 4 ):

--cpus-per-task=<272 / n° tasks per node>

If the “n° tasks per node” is not a divisor of “n° cpus per node“, then “n° cpus per task” should be equal to:

--cpus-per-task=<floor (n° cpus per node/ n° tasks per node)>

On hyperthreading nodes (KNL) it should be equal to:

--cpus-per-task=floor (68/ n° tasks per node) * 4

In case the number of tasks is not a divisor of the number of cpus per node, covering all the tasks is not an alternative, and you should add the srun directive "--cpu-bind=cores" anyway.

2) On hyperthreading nodes (KNL) if you require for a single node more tasks of available physical cpus (> 68 cpus) you have to specify the following SLURM option:

--cpu-bind=threads

in order to ensure the binding between mpi tasks and logical cpus and to avoid the overload of physical cpus.

Alternatively, you can request more cpus for the single task until you use all logical cpus of the node (=272):

--cpus-per-task=<272 / n° tasks per node>

3) If you require for a single node a number of tasks that is equal to number of physical cpus (48 on SKL node, 68 on KNL nodes) or number of logical cpus (272 on KNL nodes) there is no need for adding --cpu-bind or --cpus-per-task slurm options. Each task is assigned a CPU in sequential order.

OpenMP threads

For OpenMP codes you have to make explicit the number of cpus to allocate for each single task to be used from OpenMP threads. In order to do it you can use the following slurm option:

--cpus-per-task=<n° cpus>

On BDW and SKL nodes the cpu concept coincides with physical cpu (core) and the n° of cpus for single task can be up to 36/48 respectively. On KNL nodes it coincides with logical cpu (thread) and the n° of cpus for single task can be up to 272.

The OpenMP threads can bind only physical cpus on nodes without hyperthreading (SKL and BDW), differently on hyperthreading nodes (KNL) they can bind physical or logical cpus.

In order to define if the OpenMP threads have to bind cores or threads you can use the following variable:

export OMP_PLACES= <cores|threads>

For example, if you request 68 logical cpus for a single task:

--cpus-per-task=68

you are allocating 17 physical cpus (68/4) and than if you want the OpenMP threads bind physical cpus you have to set the variable to “cores”:

export OMP_PLACES=cores

and you can specify up to 17 OpenMP threads for single task:

export OMP_NUM_THREADS=<up to 17>

If you select more of 17 OpenMP threads you can bind them only to logical cpus because the physical cpus allocated for single task are 17 and then you have to specify "threads" for "OMP_PLACES variable":

export OMP_PLACES=threads

and you can specify up to 68 OpenMP threads for single task:

export OMP_NUM_THREADS=<up to 68>

By default for intelmpi module on Marconi the kmp_affinity environment variable is set to "compact" value:

export KMP_AFFINITY=compact

It binds threads to available cpus consecutively.

You can modify this default in “scatter” way :

export KMP_AFFINITY=scatter

Alternatively, you can use non intel options

export OMP_PROC_BIND=<close|spread|true>

A safe default setting is

export OMP_PROC_BIND=true
  
For all the details see the intel web pages:

https://software.intel.com/en-us/node/522691

You can find, at the following web page, some MPI and MPI/OpenMP jobs scripts examplest:

UG2.6.1: How to submit the job - Batch Scheduler SLURM

Page tree

Copy-How to bind MPI tasks/ OpenMP threads to logical/physical cpus in SLURM

MPI tasks

OpenMP threads