Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To run a batch job, a user must login to an HPC system using his/her username and password. The username must be associated with one or more active projects (Accounts) with available budgets.

Usernames and Accounts

In CINECA, the words "username" and "account" have different meanings.

...

The mapping between users and Accounts is done by the CINECA staff, who is in charge of creating new projects and associating a PI to each of them. A PI, in turn, can associate other users to a project as collaborators via the UserDB page related to the project.

The "saldo" command

You can list all the Accounts attached to your username on the current cluster, together with the "budget" and the consumed resources, with the command:

...

For more information, run the "saldo" command without any option.

Billing policy

The billing procedures do not consider the time spent in interactive work, meaning it is free of charge.

...

Please note that every cluster usually has a "serial" queue, defined on front-end nodes, that allows for serial jobs for a short time limit (maximum 4 hours). On these queues, accounting is not enabled, meaning that you can use them without being charged. Consequently, serial queues are allowed to be also used when an account is expired or has exhausted all of its budget: it is useful for post-processing or data transfer.

"Non-exclusive" nodes: memory matters!

On some clusters (for example on GALILEO100 or MARCONI100LEONARDO) you can choose to allocate for your job only part of the node. You are not forced to allocate all of it as it happens in clusters (like MARCONI) running in exclusive mode. In this case, the accounting procedure also takes into account the amount of memory you request for your job. If you ask for an amount of memory that is larger than the equivalent number of cores requested, the jobs will be billed for a larger number of cores than the ones you have reserved.
The billing always follows the basic idea illustrated above, but a generalized parameter for the number of reserved cores, accounting for the memory request, is now used:

...

This rule applies to each cluster based on its amount of total memory and cores.

"Non-exclusive" nodes: TMPDIR matters (LEONARDO only)!

On LEONARDO, the job's TMPDIR local area is managed by the slurm job_container/tmpfs plugin and can be explicitly requested on the diskful nodes. In this case, the accounting procedure also takes into account the amount of space (gres/tmpfs) you request for your job. If you ask for an amount of local storage that is larger than the equivalent number of cores requested, the jobs will be billed for a larger number of cores than the ones you have reserved.
The billing always follows the basic idea illustrated above, but a generalized parameter for the number of reserved cores, accounting for the local storage request, is now used:

accounted hours = ElapsedTime x ReservedCoresEquiv

where

ReservedCoresEquiv = ReservedCores x LocalStorageFactor

  • LocalStorageFactor = 1
    If the local storage you ask for (in terms of the equivalent number of cores) is smaller than or equal to the number of reserved cores. In this case, the amount of cpu-hours billed depends only on the number of requested cores.
  • LocalStorageFactor = (ReservedLocalStorage / TotalLocalStorage) / (ReservedCores / TotalCores)
    If the local storage you ask for (in terms of the equivalent number of cores) is larger than the number of reserved cores. In this case, the amount of cpu-hours billed also depend on the amount of local storage requested (i.e. the actual percentage of node allocated).

For example, on LEONARDO-DCGP partition, the TotalLocalStorage considered to calculate the LocalStorageFactor is 3 TB, and each compute node has 112 cores:

  • TotalLocalStorage = 3 TB
  • TotalCore = 112

If you ask for only one core and 1 TB of memory (thus allocating for yourself one third of the local storage of the node even if you are using one core), the LocalStorageFactor is:

  • ReservedLocalStorage = 1 TB
  • ReservedCores = 1
  • → LocalStorageFactor = (1 TB / 3 TB) / (1 / 112) = 37

Hence, with such a request for each hour of computation, your budget will be billed for 37 equivalent CPUs, i.e., for 37 hours. 

At present the slurm job_container/tmpfs plugin is ONLY enabled on LEONARDO.

Accounting and accelerators

Recently, the accounting system has been extended to nodes equipped with accelerators. The principle is the same as memory accounting: asking for a number of accelerators that will make you allocate a bigger portion of the node than what is suggested by the simple number of cores requested, will increase the consumption accordingly. 

For GALILEO100LEONARDO, every GPU will be treated as 24 8 cores in terms of accounting. That is because GPU nodes have 48 32 CPUs and 2 4 GPU each. So allocating 1 GPU is equivalent to allocating half a quarter of the node (i.e. 24 8 CPUs). For MARCONI100 which has 32 CPUs and 4 GPU each, the rule holds similarly. 

Some examples based on GALILEO100 LEONARDO (1 node):

  • cpus=3224, gpus=1 ==> the number of GPUs requested is equal to having requested 24 8 CPUs, but since 32 24 of them have been requested in the standard way, they are not taken into account. Thus 32 CPUs will 24 CPUs will be billed;
  • cpus=6, gpu=1 ==>  the number of GPUs requested is equal to having requested 24 8 CPUs, which is higher than the number of CPUs requested. Thus 24 8 CPUs will be billed;
  • cpus=24, gpus=2 4 ==> the number of GPUs requested is equal to having requested 48 32 CPUs, while 24 of them have been requested in the standard way, and they are not enough to cover for the GPU request. Therefore 48 32CPUs will be billed;
  • cpus=24, gpus=1,mem=366GB 500GB ==> the situation is similar to the first example (so 24 CPUs billed), but the memory request is higher than what is guaranteed by the simple allocation of the CPUs or GPUs, since it is equivalent of allocating the entire node. So, 48 32CPUs will be billed.

Low priority production jobs for active projects with exhausted budget

Non-expired projects with exhausted budgets may be allowed to keep using the computational resources at the cost of minimal priority. Ask superc@cineca.it to motivate your request and, in case of a positive evaluation, you will be enabled to use the qos_lowprio QOS:

...

  #SBATCH -A <account>               # your non expired, exhausted account 

Budget linearization

A linearization policy for the usage of project budgets is active on all clusters at Cineca. For each account, a monthly quota is defined as (total_budget / total_no_of_months). Starting from the first day of each month, the collaborators of any account are allowed to use the quota at full priority. As long as the budget is consumed, the jobs submitted from the account will gradually lose priority, until the monthly budget is fully consumed. At that moment, their jobs will still be considered for execution (so it is possible to consume more than the monthly quota), but with a lower priority than the jobs from accounts that still have some quota left.

...