Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Marconi100 login and compute nodes host four Tesla Volta (V100) GPUs per node (CUDA compute capability 7.0). The most recent versions of nVIDIA CUDA toolkit and of the Community Edition PGI compilers (supporting CUDA Fortran) is available in the module environment, together with a set of GPU-enabled libraries, applications and tools.  

The topology of the nodes is reported below: the two Power9 CPUs (each with 16 physical cores) are connected by a 64 GBps X bus. Each of them is connected with 2 GPUs via NVLink 2.0.

Image Added

The topology of the node can be visualized by running the command nvidia-smi node devices is as follows:

$ nvidia-smi topo -m

...

GPU0     X     NV3    SYS    SYS    0-63
GPU1    NV3     X     SYS    SYS    0-63
GPU2    SYS    SYS     X     NV3    64-127
GPU3    SYS    SYS    NV3     X     64-127
Legend:

...

from the output of the comman it is possible to see that GPU0 and GPU1 are connected with the NVLink, as well as the couple GPU2 & GPU3. The first couple is connected to cpus 0-63, the second to cpus 64-127. The cpus are numbered from 0 to 127 becauso of a hyperthreading of four: 32 physical core x 4 → 127 cpus.


The internode communications is based on a Mellanox Infiniband EDR network, and the openmpi OpenMPI and IBM MPI Spectrum libraries are configured so to exploit the Mellanox Fabric Collective Accelerators (also on CUDA memories) and Messaging Accelerators.

...

Details how to use scalasca in
http://www.scalasca.org/software/scalasca-2.x/documentation.html

PGI: pgdbg (serial/parallel debugger)

...