Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this example we ask for 8 tasks, 2 SKL nodes and 1 hour of wallclock time, and runs an MPI application (myprogram) compiled with the intel compiler and the mpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from. Through “–cpus-per-task=1” istruction each task will bind 1 physical cpu (core). This is a default option.

############# A3 Skylake ###############################################
#!/bin/bash
#SBATCH --time=01:00:00

#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --ntasks-per-socket=2
#SBATCH --cpus-per-task=1 #SBATCH --mem=<mem_per_node>
#SBATCH --partition=<partion_name> #SBATCH --qos=<qos_name> #SBATCH --job-name=jobMPI #SBATCH --err=myJob.err #SBATCH --out=myJob.out #SBATCH --account=<account_no>

module load intel intelmpi srun myprogram < myinput > myoutput

...

For a typical OpenMPI job you can take one the following scripts as a template, and modify it depending on your needs.

Nodes without hyperthreading

...


Here we ask for a single SKL node node and a single task, thus allocating 48 physical cpus for SKL or 36 for itGalileo. With the export of OMP_NUM_THREADS we are setting 48 or 36 OperMP threads for the single task. 

######## A3 skl #######################################################
#!/bin/bash
#SBATCH --time=01:00:00

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48 or 36

#SBATCH --partition=<partition_name> #SBATCH --qos=<qos_name> #SBATCH --mem=182000<mem_per_node> #SBATCH --out=myJob.out #SBATCH --err=myJob.err #SBATCH --account=<account_no>

module load intel
export OMP_NUM_THREADS=48 or 36 srun myprogram < myinput > myoutput ###################################

...

For a typical hybrid job you can take one the following scripts as a template, and modify it depending on your needs. 

Nodes without hyperthreading (skl) 

The script asks for 8 MPI tasks, 2 SKL nodes, and 4 OpenMP threads for task, 1 hours of wallclock time. The application (myprogram) was compiled with the intel compiler and the openmpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from. The mpi tasks and openMP threads will bind physical cpus (cores).

...