Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Total Dimension (TB)

Quota (GB)

Notes

$HOME20050
  • permanent/backed up, user specific, local
$CINECA_SCRATCH2.500200no quota
  • temporary, user specific, local
  • no backup
  • automatic cleaning procedure of data older than 40 days (time interval can be reduced in case of critical usage ratio of the area. In this case, users will be notified via HPC-News)
$WORK75.1009001.000
  • permanent, project specific, local
  • no backup
  • extensions can be considered if needed (mailto: superc@cineca.it)

...

> srun ./myprogram

or

> mpirun ./myprogram

The srun command will take by default PMI2 as MPI type.


Please note that

1) The recommended way to launch parallel tasks in slurm jobs is with srun. By using srun vs mpirun you will get full support for process tracking, accounting, task affinity, suspend/resume and other features.

...

Users with exhausted but still active projects are allowed to keep using the cluster resources, even if at a very low priority, by adding the  "qos_lowprio" flag to their job;:

#SBATCH --qos=qos_lowprio(inserire la richiesta di QOS)


Summary


In the following table you can find all the main features and limits imposed on the queues/Partitions of M100. 

...

SLURM

partition

QOS# cores per job
max walltime

max running jobs per user/

max n. of cpus/nodes per user

max memory per node

(MB)

prioritynotes

m100_all_serial

(default partition)

m100_all_serial

max = 6

(max mem= 18000 MB)

04:00:00

6 cpus


 700040

 qos_rcm

min = 1

max = 4832

03:00:001/4832

182000


- 

to be defined









m100_usr_dbgm100_qos_dbg

max = 2 nodes

02:00:0024 nodes18200045runs on 24 dedicated nodes
m100_usr_prodm100_usr_prod

min = 1 node

max = 16 nodes

1-00:00:0016 nodes18200040

m100_qos_bprod

min = 17 nodes

max = 256 nodes

1-00:00:00

1/256

1 jobs per account

18200050

#SBATCH -p skl_usr_prod

#SBATCH --qos=skl_qos_bprod


qos_special>256 nodes

>24:00:00

(max = 64 nodes for user)


                          182000        40

#SBATCH --qos=qos_special

request to superc@cineca.it

qos_lowpriomax = 64 nodes24:00:0064 nodes1820000#SBATCH --qos=qos_lowprio
m100_usr_preempt
max = 16 nodes08:00:00

10
m100_fua_prodm100_fua_prodmax = 16 nodes1-00:00:00

60

m100_qos_fuadbgmax = 2 nodes02:00:00

65

...

and checking the "compilers" section. The available compilers are:

  • CUDAXLXL
  • PGI
  • GNU
  • CUDAPGI

CUDA

Compute Unified Device Architecture is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs. 

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords. We refer to the the NVIDIA CUDA Parallel Computing Platform documentation.

...

XL 

The XL compiler family offers C, C++, and Fortran compilers designed for optimization and improvement of code generation, exploiting the inherent opportunities in Power Architecture.

...

GNU compilers

The gnu compilers are always available but they are not the best optimizing compilers. You do not need to load the module for using them.

The name of the GNU compilers are:

  • g77: Fortran77 compiler
  • gfortran: Fortran95 compiler
  • gcc: C compiler
  • g++: C++ compiler

The documentation can be obtained with the man command:

> man gfortan
> man gcc

Some miscellanous flags are described in the following:

-ffixed-line-length-132       To extend over the 77 column F77's limit
-ffree-form / -ffixed-form    Free/Fixed form for Fortran

PORTLAND Group (PGI)

Initialize the environment with the module command:

> module load profile/advanced
> module load pgi

The name of the PGI compilers are:

  • pgf77: Fortran77 compiler
  • pgf90: Fortran90 compiler
  • pgf95: Fortran95 compiler
  • pghpf: High Performance Fortran compiler
  • pgcc: C compiler
  • pgCC: C++ compiler

The documentation can be obtained with the man command after loading the relevant module:

> man pgf95
> man pgcc

Some miscellanous flags are described in the following:

-Mextend            To extend over the 77 column F77's limit
-Mfree / -Mfixed    Free/Fixed form for Fortran
-fast               Chooses generally optimal flags for the target platform
-fastsse            Chooses generally optimal flags for a processor that supports SSE instructions

For users running a program first time 

For program optimization 

...




XL 

The XL compiler family offers C, C++, and Fortran compilers designed for optimization and improvement of code generation, exploiting the inherent opportunities in Power Architecture.

...


PORTLAND Group (PGI)


Initialize the environment with the module command:

> module load pgi


The name of the PGI compilers are:


  • pgf77: Fortran77 compiler
  • pgf90: Fortran90 compiler
  • pgf95: Fortran95 compiler
  • pghpf: High Performance Fortran compiler
  • pgcc: C compiler
  • pgCC: C++ compiler


The documentation can be obtained with the man command after loading the relevant module:


> man pgf95
> man pgcc


Some miscellanous flags are described in the following:


-Mextend            To extend over the 77 column F77's limit
-Mfree / -Mfixed    Free/Fixed form for Fortran
-fast               Chooses generally optimal flags for the target platform
-fastsse            Chooses generally optimal flags for a processor that supports SSE instructions


GNU compilers

The gnu compilers are always available but they are not the best optimizing compilers, it ensures the maximum portability. You do not need to load the module for using them.

Initialize the environment with the module command:

> module load gnu

The name of the GNU compilers are:

  • g77: Fortran77 compiler
  • gfortran: Fortran95 compiler
  • gcc: C compiler
  • g++: C++ compiler

The documentation can be obtained with the man command:

> man gfortan
> man gcc

Some miscellanous flags are described in the following:

-ffixed-line-length-132       To extend over the 77 column F77's limit
-ffree-form / -ffixed-form    Free/Fixed form for Fortran


CUDA

Compute Unified Device Architecture is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs. 

In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords. We refer to the the NVIDIA CUDA Parallel Computing Platform documentation.

...

Debugger and Profilers

If at runtime your code dies, then there is a problem. In order to solve it, you can decide to analyze the core file (core not available with PGI compilers) or to run your code using the debugger.


Compiler flags

Whatever your decision, in any case you need to enable compiler runtime checks, by putting specific flags during the compilation phase. In the following we describe those flags for the different Fortran compilers: if you are using the C or C++ compiler, please check before because the flags may differ.

The following flags are generally available for all compilers and are mandatory for an easier debugging session:

-O0     Lower level of optimization
-g      Produce debugging information

Other flags are compiler specific and are described in the following:

XL Fortran compiler

to be added...

PORTLAND Group (PGI) Compilers

The following flags are useful (in addition to "-O0 -g") for debugging your code:

-C                     Add array bounds checking
-Ktrap=ovf,divz,inv    Controls the behavior of the processor when exceptions occur: 
                       FP overflow, divide by zero, invalid operands
GNU Fortran compilers

The following flags are useful (in addition to "-O0 -g")for debugging your code:

-Wall             Enables warnings pertaining to usage that should be avoided
-fbounds-check    Checks for array subscripts.


Debuggers available

Totalview 

The TotalView debugger is a programmable tool that lets you debug, analyze, and tune the performance of complex serial, multiprocessor, and multithreaded programs.
TotalView has many features and it gives you a great number of tools for finding your program's problems.

Details on how to use totalview are in 

https://docs.roguewave.com/en/totalview/current/html/

Scalasca

Scalasca is a tool for profiling parallel scientific and engineering applications that make use of MPI and OpenMP.

Details how to use scalasca in
http://www.scalasca.org/software/scalasca-2.x/documentation.html

PGI: pgdbg (serial/parallel debugger)

pgdbg is the Portland Group Inc. symbolic source-level debugger for F77, F90, C, C++ and assembly language programs. It is capable of debugging applications that exhibit various levels of parallelism.

GNU: gdb (serial debugger)

GDB is the GNU Project debugger and allows you to see what is going on 'inside' your program while it executes -- or what the program was doing at the moment it crashed.

VALGRIND

Valgrind is a framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. The Valgrind distribution currently includes six production-quality tools: a memory error detector, two thread error detectors, a cache and branch-prediction profiler, a call-graph generating cache profiler, and a heap profiler.

Valgrind is Open Source / Free Software, and is freely available under the GNU General Public License, version 2.

Profilers (gprof)

In software engineering, profiling is the investigation of a program's behavior using information gathered as the program executes. The usual purpose of this analisys is to determine which sections of a program to optimize - to increase its overall speed, decrease its memory requirement or sometimes both.

A (code) profiler is a performance analisys tool that, most commonly, measures only the frequency and duration of function calls, but there are other specific types of profilers (e.g. memory profilers) in addition to more comprehensive profilers, capable of gathering extensive performance data.

gprof

The GNU profiler gprof is a useful tool for measuring the performance of a program. It records the number of calls to each function and the amount of time spent there, on a per-function basis. Functions which consume a large fraction of the run-time can be identified easily from the output of gprof. Efforts to speed up a program should concentrate first on those functions which dominate the total run-time.

gprof uses data collected by the -pg compiler flag to construct a text display of the functions within your application (call tree and CPU time spent in every subroutine). It also provides quick access to the profiled data, which let you identify the functions that are the most CPU-intensive. The text display also lets you manipulate the display in order to focus on the application's critical areas.

Usage:

>  gfortran -pg -O3 -o myexec myprog.f90
> ./myexec
> ls -ltr
   .......
   -rw-r--r-- 1 aer0 cineca-staff    506 Apr  6 15:33 gmon.out
> gprof myexec gmon.out

It is also possible to profile at code line-level (see "man gprof" for other options). In this case you must use also the “-g” flag at compilation time:

>  gfortran -pg -g -O3 -o myexec myprog.f90
> ./myexec
> ls -ltr
   .......
   -rw-r--r-- 1 aer0 cineca-staff    506 Apr  6 15:33 gmon.out
> gprof -annotated-source myexec gmon.out


It is possilbe to profile MPI programs. In this case the environment variable GMON_OUT_PREFIX must be defined in order to allow to each task to write a different statistical file. Setting

export GMON_OUT_PREFIX=<name>

 once the run is finished each task will create a file with its process ID (PID) extension

<name>.$PID

 If the environmental variable is not set every task will write the same gmon.out file.


Scientific libraries