In this page:

This document introduces the user how to launch totalview through an RCM session

Cineca provides the user with an easy tool to establish a graphic session with our systems: RCM. All the software that comes with a graphic user interface (GUI) can be used within an RCM session. In this regard, Totalview makes no exception and can be easily used in conjunction with RCM to establish a debugging session of a parallel code. With respect to other GUIs that can be run on RCM, Totalview is a little peculiar and must be run directly on the nodes that execute the parallel code. In the following, we will detail how to establish a Totalview debugging session through RCM with a SLURM job.  

Please refer to this page for the instructions on how to use RCM; for most of the cases is as simple as: 1) download the tool, 2) launch the executable. 

Once you have established a connection through RCM with one of our systems, Marconi or Galileo please follow the instructions below.   

MARCONI

Once connected you should have a desktop session open. Now open a terminal following "Applications -> System Tools -> Terminal". When done, a terminal pops-up and you can use it as you do normally with a ssh connection. Now let's go through the operations required to launch a Totalview job.


1) Get the DISPLAY number

On a terminal session within RCM type the command:

> echo $DISPLAY

   :8

This will return a display number to use for connecting your totalview job with the RCM session.


2) Prepare a batch script (job.sh)

#!/bin/bash
#SBATCH -e totaljob.err
#SBATCH -o totaljob.out
#SBATCH -A <your_account>
#SBATCH -N 1
#SBATCH -t 00:10:00
#SBATCH -p skl_usr_dbg


module load autoload intelmpi
module load totalview

#set the DISPLAY so as to use the same opened in the RCM session. This is just an example, use your own hostname and display setting.

export DISPLAY="r161c001s02:8"

totalview srun ./my_executable


In bold, in the above example, we have told the Totalview user interface to open on the current VNC session (opened automatically by RCM). Please refer to the above section on how to get the correct DISPLAY number.

3) submit the job

Now you can submit the above script to the SLURM scheduler. Once it becomes running, the Totalview user interface will pop-up and you will be able to debug your code:

> sbatch job.sh

GALILEO100

As in the example above, once connected to GALILEO100 with RCM,  open a terminal (start -> terminal). Then follow this set of instructions described below.  

1) Setup the .tvdrc file - only the first time

The first time you estabilish a Totalview session, a folder named .totalview will be created in your $HOME (it is not visible with the standard "ls" command, you have to add the flag -a for the hidden directories and files). Inside it, create a text file named .tvdrc, that should contain the following lines documented also in the official Slurm manual:

dset -set_as_default TV::bulk_launch_enabled true
dset -set_as_default TV::bulk_launch_string {srun --mem-per-cpu=0 -N%N -n%N -w`awk -F. 'BEGIN {ORS=","} {if (NR==%N) ORS=""; print $1}' %t1` -l --input=none %B/tvdsvr%K -callback_host %H -callback_ports %L -set_pws %P -verbosity %V -working_directory %D %F}
dset -set_as_default TV::bulk_launch_tmpfile1_host_lines {%R}

2) Prepare the job (job.sh script)

#!/bin/bash

#SBATCH -t 30:00
#SBATCH -N 1
#SBATCH -o totaljob.out
#SBATCH -e totaljob.err
#SBATCH -A <your_account>
#SBATCH -p g100_usr_prod
 
module load totalview

tvconnect srun ./your_executable

3) Submit the job

>sbatch job.sh

4) Open a Totalview terminal

In the RCM shell, load the module of Totalview and launch "totalview" to open the GUI. When the job starts, you will be asked by a prompt to connect to it and you will see that the tool is trying to debug the "srun" command.

5) Launch the simulation

Press the green "Go" button to launch the simulation. Eventually, a prompt will ask you if you want to stop the parallel job: if you choose "Yes", you will finally see the main code of the executable you want to debug and you can start working on it.



PS: In a terminal opened inside RCM, the shortcut to paste text copied elsewhere is "Ctrl+Shift+Insert"

Note: In order to use RCM on Mac OS please follow this link <RCM-CLIENT ON MAC OS

  • No labels