...
Marconi100 login and compute nodes host four Tesla Volta (V100) GPUs per node (CUDA compute capability 7.0). The most recent versions of nVIDIA CUDA toolkit and of the Community Edition PGI compilers (supporting CUDA Fortran) is available in the module environment, together with a set of GPU-enabled libraries, applications and tools.
The topology of the nodes is reported below: the two Power9 CPUs (each with 16 physical cores) are connected by a 64 GBps X bus. Each of them is connected with 2 GPUs via NVLink 2.0.
The topology of the node can be visualized by running the command nvidia-smi node devices is as follows:
$ nvidia-smi topo -m
...
GPU0 X NV3 SYS SYS 0-63
GPU1 NV3 X SYS SYS 0-63
GPU2 SYS SYS X NV3 64-127
GPU3 SYS SYS NV3 X 64-127
Legend:
...
from the output of the comman it is possible to see that GPU0 and GPU1 are connected with the NVLink, as well as the couple GPU2 & GPU3. The first couple is connected to cpus 0-63, the second to cpus 64-127. The cpus are numbered from 0 to 127 becauso of a hyperthreading of four: 32 physical core x 4 → 127 cpus.
The internode communications is based on a Mellanox Infiniband EDR network, and the openmpi OpenMPI and IBM MPI Spectrum libraries are configured so to exploit the Mellanox Fabric Collective Accelerators (also on CUDA memories) and Messaging Accelerators.
...
Details how to use scalasca in
http://www.scalasca.org/software/scalasca-2.x/documentation.html
PGI: pgdbg (serial/parallel debugger)
...