Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For threaded applications (pure OpenMP, no MPI), you obtain a full node by requesting --ntasks-per-node = 1 and --cpus-per-task=128. You can choose to exploit or not the SMT
featurefeature (it will depend on the value you assign to the OMP_NUM_THREADS variable) , but in any case switch the binding of the OMP threads on (this is the default for the XL compilers , while for gnu and PGI , and hpc-sdk (ex-pgi) compilers it is off by default).

Different compilers abide by different default settings for the binding and placing of the threads:

  • XL compilers
    • OMP_PROC_BIND = false (default)/true, close/scatter
    • OMP_PLACES = threads (default),cores
    • Comments for SMT configurations: in both the default setting for OMP_PLACES (threads) and for OMP_PLACES=cores the threads are placed ALWAYS on the first HW thread of the physical cores. Setting OMP_PROC_BIND=close/scatter makes no difference.  
  • NVIDIA hpc-sdk compilers (ex PGI): 

      ...

        • OMP_PROC_BIND = false (default)/true, close/scatter
        • OMP_PLACES = threads (default),cores
      • GCC compilers
        • OMP_PROC_BIND = false (default)/true, close/scatter
        • OMP_PLACES = threads (default),cores

      For instance, if you want each OMP thread bound to a physical cores, ask for the full node (--cpus-per-task=128) and

      #SBATCH --nodes=1
      #SBATCH

      ...

      --ntasks-per-node=1     

      ...


      #SBATCH

      ...

      --cpus-per-task=

      ...

      128        # full node
      #SBATCH ........

      export OMP_PROC_BIND=

      ...

      true

      export OMP_PLACES=threads/cores          #  XL default: cores, pinned to the 1st HW thread of each physical core; threads: pinned to the threads 

                                                                  # hpc-sdk  default: threads,  pinned to the 1st HW thread of each physical core;  cores: placed on all 4 HW threads of the physical cores

                                                                   # gnu default: threads e OMP_PROC_BIND=close, placed on subsequent threads (4 per physical core); set OMP_PROC_BIND=spread to place one thread per physical core                                                                  cores: placed on all 4 HW threads of the physical cores
      export OMP_NUM_THREADS=32 # the OMP thread will be bound to the physical cores

      <your exe>

      2) MPI parallelization

      For pure MPI applications (hence, no OpenMP threads as parallel elements), set the value of --ntasks-per-node to the number of MPI processes you want to run per node, and --cpus-per-task=4.
      For instance, if you want to run an MPI application with 2 processes (no threads), each of them using 1 GPU:

      ...