Datamovers are dedicated nodes on each HPC cluster that are designed for transferring data FROM/TO a cluster.
Hostnames and IPs
Marconi:
- alias: data.marconi.cineca.it
- hostnames and IPs:
- dmover1.marconi.cineca.it - 130.186.17.140
- dmover2.marconi.cineca.it - 130.186.17.141
Galileo100:
- alias: data.g100.cineca.it
- hostnames and IPs:
- dmover1.g100.cineca.it - 130.186.16.212
- dmover2.g100.cineca.it - 130.186.16.213
Leonardo:
- alias: data.leonardo.cineca.it
- hostnames and IPs:
- dmover1.leonardo.cineca.it - 131.175.44.50
- dmover2.leonardo.cineca.it - 131.175.44.51
- dmover3.leonardo.cineca.it - 131.175.44.52
- dmover4.leonardo.cineca.it - 131.175.44.53
Main features
This transfer service is containerized, and there are many differences between these nodes and the login nodes.
First of all, on datamovers, there is no CPU time limit, that allows long data transfers. Unlike, on login nodes, there is a 10-minute of CPU time limit that usually interrupts the transfer of a large amount of data.
By construction, the shell is not available, so it is not possible to open interactive sessions. In other words you cannot connect directly to the datamover via SSH. The only available commands are scp, rsync, sftp, wget, and curl.
However, the authentication is still based on SSH protocol. There are only 2 possible authentication methods:
- publickey: it only accepts valid SSH certificates, obtained via 2-Factor Authentication. No private/public keys generated in other ways are accepted.
- hostbased: if you are already logged into a CINECA HPC cluster and try to use a datamover from a login node, the SSH daemon on the datamover recognizes you are already authenticated on a CINECA HPC cluster and that is enough.
Remark: The host-based authentication is not enabled inside a job batch. If you want to use a datamover inside a job batch you have to copy a valid 2FA SSH certificate inside your ~/.ssh directory on the cluster where you are submitting the job batch.
Important: When you are authenticated on a datamover, the environment variables $HOME, $WORK and $CINECA_SCRATCH (as well as ~ or * ) are not defined.
This property has 2 side effects:
- if you want to transfer files FROM/TO your cluster personal areas, you have to specify the absolute path of them.
- You cannot make use of the SSH configuration files stored in your remote ~/.ssh/ directory (such as $HOME/.ssh/config).
Listing Directory via sftp
If you need to list files on a cluster where login nodes are offline, you can rely on datamover service via the sftp command:
$ sftp <username>@data.<cluster_name>.cineca.it:/path/to/be/listed/
Connected to data.<cluster_name>.cineca.it
Changing to: /path/to/be/listed/
sftp>
One entered the sftp session, the familiar pwd, cd /path/to/, ls commads are available to explore the remote filesystem, together with the sftp command lpwd, lcd /path/to/, lls. You can also transfer data from the sftp session, see the appropriate section below.
At present M100 login nodes are offline, so you need to resort to the sftp command to explore M100 filesystem. For instance:
$ sftp fcola000@data.m100.cineca.it:/m100_scratch/userinternal/fcola000/
Connected to data.m100.cineca.it.
Changing to: /m100_scratch/userinternal/fcola000/
sftp> cd cuda
sftp> ls -l
drwxr-xr-x 3 fcola000 interactive 4096 Feb 11 2021 targets
sftp> pwd
Remote working directory: /m100_scratch/userinternal/fcola000/cuda
sftp>
Available transfer tools
rsync
There are 2 possible ways to use rsync via datamovers:
- You need to upload or download data FROM/TO your local machine TO/FROM a CINECA HPC cluster
$ rsync -PravzHS /absolute/path/from/file <username>@data.<cluster_name>.cineca.it:/absolute/path/to/
$ rsync -PravzHS <username>@data.<cluster_name>.cineca.it:/absolute/path/from/file /absolute/path/to/
- You need to transfer files between 2 CINECA HPC clusters
$ ssh -xt <username>@data.<cluster_name_1>.cineca.it rsync -PravzHS /absolute/path/from/file <username>@data.<cluster_name_2>.cineca.it:/absolute/path/to/
$ ssh -xt <username>@data.<cluster_name_1>.cineca.it rsync -PravzHS <username>@data.<cluster_name_2>.cineca.it:/absolute/path/from/file /absolute/path/to/
scp
There are 3 possible ways to use scp via datamovers:
- You need to upload or download data FROM/TO your local machine TO/FROM a CINECA HPC cluster
$ scp /absolute/path/from/file <username>@data.<cluster_name>.cineca.it:/absolute/path/to/
$ scp <username>@data.<cluster_name>.cineca.it:/absolute/path/from/file /absolute/path/to/
- You need to transfer files between 2 CINECA HPC clusters
$ ssh -xt <username>@data.<cluster_name_1>.cineca.it scp /absolute/path/from/file <username>@data.<cluster_name_2>.cineca.it:/absolute/path/to/
$ ssh -xt <username>@data.<cluster_name_1>.cineca.it scp <username>@data.<cluster_name_2>.cineca.it:/absolute/path/from/file /absolute/path/to/
- You need to transfer files between 2 CINECA HPC clusters using your local machine as a bridge. We strongly suggest not using this option because it has very low transfer performance, each file you move from one cluster to another will pass through your local machine
$ scp -3 <username>@data.<clusterr_name_1>.cineca.it:/absolute/path/from/file data.<cluster_name_2>.cineca.it:/absolute/path/from/file
sftp
There are 2 possible ways to use sftp via datamovers:
- You need to upload or download data FROM/TO your local machine TO/FROM a CINECA HPC cluster
$ sftp <username>@data.<cluster_name>.cineca.it:/absolute/remote/path/to/
sftp> put relative/local/path/to/file
Uploading /absolute/local/path/to/file to /absolute/remote/path/to/file
file 100% 414 365.7KB/s 00:00
sftp> get relative/remote/path/to/file
Fetching /absolute/remote/path/to/file to file
file 100% 1455KB 19.0MB/s 00:00
sftp> - You need to transfer files between 2 CINECA HPC clusters
$ ssh -xt <username>@data.<cluster_name_1>.cineca.it sftp <username>@data.<cluster_name_2>.cineca.it:/absolute/path/to/
It is also possible to use the flag -b and execute sftp in batch mode.
wget
Sometimes, the 10-minute CPU time limit or the 4-hour wall time limit on the serial queue is not enough to download a large dataset for ML. In this case, you can use wget from the datamover. Here you can find a simple example
$ ssh -xt <username>@data.<cluster_name>.cineca.it wget http://ftp.gnu.org/gnu/wget/wget2-2.0.0.tar.gz -P /absolute/path/to/
Please note that is mandatory to use the flag -P with the absolute path of the destination folder, because of the fake /home directory.
curl
Sometimes, the 10-minute CPU time limit or the 4-hour wall time limit on the serial queue is not enough to download a large dataset for ML. In this case, you can use curl from the datamover. Here you can find a simple example
$ ssh -xt <username>@data.<cluster_name>.cineca.it curl https://curl.se/download/curl-8.2.1.tar.gz --output /absolute/path/to/curl-8.2.1.tar.gz
Please note that is mandatory to use the flag --output with the absolute path of the destination file, because of the fake /home directory.
GridFTP
Datamover nodes of each CINECA HPC cluster host the GridFTP servers. On the datamover we have enabled the authentication via ssh as an alternative to the x509 certificate. Of course, you have to keep in mind that the SSH authentication is based on the ssh certificate, obtained via 2-Factor authentication.
Transfer data between 2 CINECA HPC clusters
$ globus-url-copy -vb -cd -r sshftp://<username>@gftp.<cluster_name_1>.cineca.it:22/absolute/path/from/directory/ sshftp://<username>@gftp.<cluster_name_2>.cineca.it:22/absolute/path/to/
$ globus-url-copy -vb -cd -r gsiftp://<username>@gftp.<cluster_name_1>.cineca.it:2811/absolute/path/from/directory/ gsiftp://<username>@gftp.<cluster_name_2>.cineca.it:2811/absolute/path/to/
CAVEAT: at present the gsiftp mode is not available on Leonardo.
Transfer data FROM/TO local machine TO/FROM a CINECA HPC cluster
$ globus-url-copy -vb -cd -r /absolute/path/from/directory/ sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/to/
$ globus-url-copy -vb -cd -r /absolute/path/from/directory/ gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/to/
$ globus-url-copy -vb -cd -r sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/from/directory/ /absolute/path/to/
$ globus-url-copy -vb -cd -r gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/from/directory/ /absolute/path/to/
It is also possible to use globus-url-copy to transfer data FROM/TO another HPC site TO/FORM an HPC CINECA cluster. Here you can find the complete list of all the possibilities. You can use both protocols, gsiftp or sshftp, it depends on the cluster/site configuration. For example, it is not possible to use gsiftp protocol on Leonardo.
$ globus-url-copy -vb -cd -r gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/from/directory/ gsiftp://<username>@<other cluter endpoint>:2811/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/from/directory/ gsiftp://<username>@<other cluter endpoint>:2811/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/from/directory/ sshftp://<username>@<other cluter endpoint>:22/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/from/directory/ sshftp://<username>@<other cluter endpoint>:22/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r gsiftp://<username>@<other cluter endpoint>:2811/absolute/path/from/directory/ gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r sshftp://<username>@<other cluter endpoint>:22/absolute/path/from/directory/ gsiftp://<username>@gftp.<cluster_name>.cineca.it:2811/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r sshftp://<username>@<other cluter endpoint>:22/absolute/path/from/directory/ sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/to/directory/
$ globus-url-copy -vb -cd -r gsiftp://<username>@<other cluter endpoint>:2811/absolute/path/from/directory/ sshftp://<username>@gftp.<cluster_name>.cineca.it:22/absolute/path/to/directory/
Please note that authentication via ssh certificate is not available on Globus online
Additional transfer modes will be explored and reported in the next future.