Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Datamovers are dedicated nodes on each HPC cluster that are designed for transferring data FROM/TO a cluster.

Hostnames and IPs

Marconi:

  • alias: data.marconi.cineca.it
  • hostnames and IPs:
    • dmover1.marconi.cineca.it - 130.186.17.140
    • dmover2.marconi.cineca.it - 130.186.17.141

...

  • alias: data.leonardo.cineca.it
  • hostnames and IPs:
    • dmover1.leonardo.cineca.it - 131.175.44.50
    • dmover2.leonardo.cineca.it - 131.175.44.51
    • dmover3.leonardo.cineca.it - 131.175.44.53

Main features

This transfer service is containerized, and there are many differences between these nodes and the login nodes.

...

  1. if you want to transfer files FROM/TO your $HOME directory you have to specify the absolute path, the same is true for your $WORK, $CINECA_SCRATCH filesystems
  2. You cannot use the SSH configuration stored in your remote ~/.ssh/ directory (~/.ssh/config).

Listing Directory via sftp

If you need to list files on a cluster where login nodes are offline, you can rely on datamover service via the sftp command:

...

$ sftp fcola000@data.m100.cineca.it:/m100_scratch/userinternal/fcola000/ 
Connected to data.m100.cineca.it.
Changing to: /m100_scratch/userinternal/fcola000/
sftp>
cd cuda
sftp> ls -l
drwxr-xr-x    3 fcola000 interactive     4096 Feb 11  2021 targets
sftp> pwd
Remote working directory: /m100_scratch/userinternal/fcola000/cuda
sftp> 

Available transfer tools

rsync

There are 2 possible ways to use rsync via datamovers:

  1. You need to upload or download data FROM/TO your local machine TO/FROM a CINECA HPC cluster
    $ rsync -PravzHS /absolute/path/from/file <username>@data.<cluster_name>.cineca.it:/absolute/path/to/
    $ rsync -PravzHS  <username>@data.<cluster_name>.cineca.it:/absolute/path/from/file /absolute/path/to/
  2. You need to transfer files between 2 CINECA HPC clusters
    $ ssh -xt <username>data.<cluster_name_1>.cineca.it rsync -PravzHS /absolute/path/from/file <username>@data.<cluster_name_2>.cineca.it:/absolute/path/to/
    $ ssh -xt <username>data.<cluster_name_1>.cineca.it rsync -PravzHS <username>@data.<cluster_name_2>.cineca.it:/absolute/path/from/file  /absolute/path/to/

scp

There are 3 possible ways to use scp via datamovers:

  1. You need to upload or download data FROM/TO your local machine TO/FROM a CINECA HPC cluster
    $ scp /absolute/path/from/file <username>@data.<cluster_name>.cineca.it:/absolute/path/to/
    $ scp  <username>@data.<cluster_name>.cineca.it:/absolute/path/from/file /absolute/path/to/


  2. You need to transfer files between 2 CINECA HPC clusters
    $ ssh -xt <username>data.<cluster_name_1>.cineca.it scp /absolute/path/from/file <username>@data.<cluster_name_2>.cineca.it:/absolute/path/to/
    $ ssh -xt <username>data.<cluster_name_1>.cineca.it scp <username>@data.<cluster_name_2>.cineca.it:/absolute/path/from/file  /absolute/path/to/
  3. You need to transfer files between 2 CINECA HPC clusters using your local machine as a bridge. We strongly suggest not using this option because it has very low transfer performance, each file you move from one cluster to another will pass through your local machine
    $ scp -3  <username>data.<clusterr_name_1>.cineca.it:/absolute/path/from/file  data.<cluster_name_2>.cineca.it:/absolute/path/from/file

sftp

There are 2 possible ways to use scp via datamovers:

...

It is also possible to use the flag -b and execute sftp in batch mode (TOBE TESTED)

wget

Sometimes, the 10-minute CPU time limit or the 4-hour wall time limit on the serial queue are not enough to download a large dataset for ML. In this case, you can use wget from the datamover. Here you can find a simple example

...

Please note that is mandatory to use the flag -P with the absolute path of the destination folder, because of the fake /home directory.

curl

Sometimes, the 10-minute CPU time limit or the 4-hour wall time limit on the serial queue are not enough to download a large dataset for ML. In this case, you can use curl from the datamover. Here you can find a simple example

...

Please note that is mandatory to use the flag --output with the absolute path of the destination file, because of the fake /home directory.

GridFTP

Datamover nodes of each CINECA HPC cluster host the GridFTP servers. On the datamover we have enabled the authentication via ssh as an alternative to the x509 certificate. Of course, you have to keep in mind that the SSH authentication is based on the ssh certificate, obtained via 2-Factor authentication.

...