Link to the new User Guide https://docs.hpc.cineca.it/index.html
Data Storage architecture
All HPC systems share the same logical disk structure and file systems definition.
The available storage areas can have multiple definitions/purposes:
- temporary (data are accessible for a defined time window before to be deleted);
- permanent (data are accessible up to six months after the end of the project);
or:
- user specific (each username has a different data area);
- project specific (accessible by all users linked to the same project).
or:
- local (specific for each system);
- shared (the same area can be accessed by all HPC systems)
The available data areas are defined, on all HPC clusters, through predefined environment variables
. You can access on these areas simply using those names:
cd $HOME cd $SCRATCH cd $WORK cd $DRES cd $FAST (Leonardo Only) cd $PUBLIC (Leonardo Only)
Suggestion
Overview of Available Data Areas
Name | Area Attributes | Quota | Backup | Note | Typical Usage |
---|---|---|---|---|---|
$HOME | local, permanent, user sepcific, backed | 50 GB | daily | - | Data are critical, not so large, I want to be sure to preserve them. |
$WORK | local, permanent, project specific | 1 TB | - | - | Large data to be shared with collaborators of my project. |
$FAST | loca, permanent, project specific | 1 TB | - | Available only on Leonardo | Faster I/O compared with other aresa. |
$SCRATCH | local, temporary, user specific | - / 20 TB | Temporary | On Marconi the same variable is named $CINECA_SCRATCH | Large temporary data. |
$TMPDIR | local, temporary, user specific | - | - | Directory removed at job completion | - |
$PUBLIC | permanent, user specific, shared | 50 GB | - | Available only on Leonardo | Data to be shared with other users, not necessarily participating in common projects. |
$DRES | permanent, shared | defined by project | - | - |
|
*
All the filesystems are based on Lustre.
Ethical Use of the SCRATCH Area
Users are encouraged to respect the intended use of the various areas. Users are reminded that the SCRATCH area is not subject to restrictions (quota) to facilitate the production of data, even large amounts. However, the SCRATCH area should not be used as a temporary storage area. Users are warned against using “touch” commands or similar methods to extend the retention of files beyond the 40-day limit. The use of such “improper” procedures will be monitored, and users will be subject to various degrees of restrictions up to a ban.
Description of Data Areas
Backup Policies and Data Availability
- Daily backups guarantee the $HOME filesystem. In particular, the daily backup procedure preserves a maximum of three different copies of the same file. Older versions of files are kept for 1 month. The last version of deleted files is kept for 2 months, then definitely removed from the backup archive. Different agreements about Backup policies are possibile. For more information contact the HPC support (superc@cineca.it).
- Data, both backed up and non-backed up, are available for the entire duration of the project. After a project expires, users will still have full access to the data for an additional six months. Beyond this six-month period, data availability is not guaranteed. A scheme of data availability is reported in the figure below.
Important: Users have responsibility to backup their important data !!!
Monitoring the occupancy
The occupancy status of all areas accessible to a user, along with the storage quota limits, can be monitored using a simple command available on all HPC cluster. There are two commands named cindata
, cinQuota
(only for Galileo 100 and Leonardo). For both commands the flag -h
can be used to show the help. In the follwing, an example of cindata
and cinQuota
outputs produced for a DRES user is shown.
File permissions
As explained above, $WORK and $DRES are environmental variables automatically set in the user environment.
$WORK variable points to a directory (fileset) specific for one of the user projects:
/gpfs/work/<account_name>
$DRES variable points to space where all of the dres are defined:
/gss/gss_work/
In order to use a specific dres type the following path:
$DRES/<dres_name>
The owner of the root directory is the "Principal Investigator" (PI) of the project or the "owner" of the DRES, the group corresponds to the name of the project or the name of the DRES. Default permissions are:
own: rwx group: rwx other: -
In this way, all project collaborators, sharing the same project group, can read/write into the project/dres fileset, whereas other users can not.
Users are advised to create a personal subdirectory under $WORK and $DRES. By default, files into the subdirectory are private, but the owner can easily share the files with the other collaborators by opening the subdirectory:
chmod 777 mydir chmod 755 mydir
Since the $WORK/$DRES fileset is closed to non collaborators, the data sharing is active only among the project collaborators.Pointing $WORK to a different project: the chprj
command
Pointing $WORK to a different project: the chprj command
The user can modify the project pointed to by the variable $WORK using the "change project" command.
To list all your accounts (both active or completed) and the default project:
chprj -l
To set $WORK to point to a different <account_no> project:
chprj -d <account_no>
More details are in the help page of the command:
chprj -h chprj --help
On LEONARDO only: The command applies to the $FAST variable as well.
Endianness
Endianness is the attribute of a system that indicates whether integers are represented from left to right or right to left. At present, all clusters in Cineca are "little-endian".