Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Overview

REPO is a Cineca service, implemented through iRODS (Integrated Rule-Oriented Data System), for the management of long lasting data.

This service aims to store and maintain scientific data sets and it is built in a way that allows a user to safely back-up data and at the same time manage them through a variety of clients, such as web browser, graphical desktop and command line interfaces.

It relies on plain filesystems to store data files and on databases to store the metadata. The service's architecture has been carefully designed to scale to millions of files and petabytes of data, joining robustness and versatility, and to offer to the scientific communities a complete set of features to manage the data life-cycle:


Image RemovedImage Added


The links for the Data Repository interfaces are listed at the URL https://www.repo.cineca.it. 

Upload/Download: the system supports high performance transfer protocols like GridFTP, or iRODS multi-threads transfer mechanism.

Metadata management: each object can be associated with specific metadata represented as triplets (name, value, unit), or simply tagged and commented. This operation can be performed at any time, not just before the first upload.

Preservation: the long-term accessibility is granted by means of a seamless archiving process, which is able to move the collections of data from the on-line storage space to a tape based off-line space and back, according to general or per-project policies.

Stage-in/stage-out: the service is enabled to move data sets, requested as input for computations, towards the HPC machines' local storage space, commonly named “scratch”, and backwards as soon as the results are available.

Sharing: the capability to share single data objects or whole collections is implemented via a unix-like ownership model, which allows to make them accessible to single users or groups. Moreover, a ticket based approach is used to provide temporary access tokens with limited rights.

Searching: the data are indexed and the searches can be based on the objects location or on the associated metadata.

How to request a REPO space

Archiving on the REPO area is managed through the DRES (Data RESource) space, as discussed in the "Data Storage Resource" document. You can require a DRES of REPO type by sending an email to superc@cineca.it.

How to access the REPO space

There are three different ways to access data in the REPO:

  • iRODS commands
  • gridftp clients, such as globus-url-copy or Globus Online

  • WebDAV protocol

iRODS commands

Configuration

You can use the iCommands from CINECA HPC machines (MARCONI, MARCONI100 and GALILEOGALILEO100) or from your local linux machine.

1) download iCommands

  • On MARCONI , PICO and GALILEO, the iCommands are available without any module to load
  • On MARCONI100 and GALILEO100 the iCommands are availble with a module. So on the login node, type: 
  • UI Text Box

    $ module load icommands

    $ iinit 

    $ ils 

  • On your local linux machine you have to install the iCommands downloading it from http://irods.org/download/ . Packages .deb and .rpm (CentOS5, CentOS6, SUSECentOS7, Ubuntu16 and Ubuntu18) are provided. 

    If you want to install the iCommands with support for PAM authentication on your linux machine from source code, you have to download it from https://github.com/irods/irods .

2) download the file chain.pem  (click to download) 

3) create the .irods/irods_environment.json config file in the home directory of the system where you use the icommand (MARCONI, PICO MARCONI100 or GALILEOGALILEO100, your local linux machine):

UI Text Box
{
"irods_host": "data.repo.cineca.it",
"irods_port": 1247,
"irods_default_resource": "cinecaRes1",
"irods_home": "/CINECA01/home/your-group/your-username",
"irods_cwd": "/CINECA01/home/your-group/your-username",
"irods_user_name": "your-username",
"irods_zone_name": "CINECA01",
"irods_client_server_negotiation": "request_server_negotiation",
"irods_client_server_policy": "CS_NEG_REFUSE",
"irods_encryption_key_size": 32,
"irods_encryption_salt_size": 8,
"irods_encryption_num_hash_rounds": 16,
"irods_encryption_algorithm": "AES-256-CBC",
"irods_default_hash_scheme": "MD5",
"irods_match_hash_policy": "compatible",
"irods_server_control_plane_port": 1248,
"irods_server_control_plane_key": "TEMPORARY__32byte_ctrl_plane_key",
"irods_server_control_plane_encryption_num_hash_rounds": 16,
"irods_server_control_plane_encryption_algorithm": "AES-256-CBC",
"irods_maximum_size_for_single_buffer_in_megabytes": 32,
"irods_default_number_of_transfer_threads": 4,
"irods_transfer_buffer_size_for_parallel_transfer_in_megabytes": 4,
"irods_authentication_scheme": "PAM",
"irods_ssl_certificate_chain_file": "/path/to/.irods/chain.pem",
"irods_ssl_ca_certificate_file": "/path/to/.irods/chain.pem",
"irods_ssl_verify_server": "cert"
}

3) type the command iinit the first time you use irods . On default, the PAM authentication method is enabled ("irods_authentical_scheme parameter" in the json configuration file), so the password of your hpc username will be requested. 

Note the after some times (days...), you will need retype the iinit command to use the icommands.

4) operate in REPO space, using the icommands. The documentation about the IRODS commands is available at this link.

Authentication

The installation of iRODS in CINECA supports two authentication mechanisms: username password (PAM) previously seen and GSI (e.g. X.509 Certificate)

If you want to use GSI authentication instead of PAM authentication, please replace the line "irods_authentication_scheme": "PAM"  in your  ".irods/irods_environment.json" file with:

UI Text Box
  "irods_authentication_scheme": "GSI",   

In this case the GSI support should be enabled in the iCommands. The GridFTP server address is: gftp.repo.cineca.it:2811

GridFTP clients

In order to access your REPO space through a GridFTP client as globus-url-copy or Globus Online consult these web pages: globus-url-copy or Globus Online.

WebDAV protocol

Togheter with the iCOMMAND anche GrdiFTP, the data stored in the iRODS server can be accessed also by the WebDAV protocol by a WebDAV client (cadaver, nautilus, cyberduck, ...), or by mounting the resource with davfs2.

In what follows, same examples are provided to access by the WebDAV protocol.

cadaver (linux)

Run in a shell the linux command:

UI Text Box

cadaver https://www.repo.cineca.it/davrods

and provide your HPC-CINECA username and password for the DAV authentication on server www.repo.cineca.it .

Finally, the following prompt will appear

UI Text Box

dav:/davrods/>

and you will be able to manage your data on the iRODS repository. You can find a list of all available commands using the help command
UI Text Box

dav:/davrods/> help

For example, if you want to edit or create a file you can use the command "edit". The file will be edited or created in a temporary folder and then transferred to iRODS repository.

nautilus  (linux)

Run Nautilus file manager, then click on "Connect to Server" and use the following configuration for the server

UI Text Box

Server = www.repo.cineca.it
Port = 443
Type = Secure WebDAV(HTTPS)
Folder = /davrods

Put your CINECA-HPC username and password, click on "Connect" and surf among your directory in the iRODS repository.

davfs2 (linux)

After the installation of the package davfs2, mount the directory by the command 

UI Text Box
mount -t davfs https://www.repo.cineca.it/davrods/ /mnt/

And for the authentication with server https://www.repo.cineca.it/davrods/ provide your HPC-CINECA username and password.

Cyberduck (windows, mac)

Install and run Cyberduck https://cyberduck.io/ ,

then click on "New Connection" and use the following configuration for the WebDAV server

UI Text Box

Server = www.repo.cineca.it
Port = 443
Path = /davrods

Protocol = WebDAV  (HTTP/SSL)

Provide your CINECA-HPC username and password, click on "Connect" and surf among your directory in the iRODS repository