Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, the cost is based on elapsed time and the number of cores reserved (not used!) by the batch jobs. In general, most tools and applications from our Software Catalog can be used free of charge even if the program is burdened with a licence. Only in a few cases, you need to register and pay an additional fee in order to access special applications. All information is reported in the description of the specific application description: see application-software-science.

In order to To run a batch job, a user must login to an HPC system using his/her username and password. The username must be associated with one or more active projects (Accounts) with available budgets.

...

USERNAME: identifies the individual connecting to the system. It is the string used (along with the password) for getting access to the system (through ssh, for example). It is 8 characters long and can be obtained, from your interactive unix Unix session, by the command:

> echo $USER 

...

ACCOUNT: indicates the grant or resource allocation which you can use for your batch jobs. Usually, a "budget" is associated with an Account and reports how many resources (computing hours) can be used within that Account. In UG2.2 Become a user, we describe the several ways your username can be associated with an account.

You can list all the Accounts attached to your username on the current cluster, together with the "budget" and the consumed resources, with the command "saldo" (see below).
One single username can use multiple accounts, and one single account can be used by multiple usernames, all competing for the same budget.

...

Nevertheless, users with no active Account will still be able to access the HPC platforms in order to perform some lightweight post-processing (interactive runs) and/or to retrieve their data. Usernames will be kept alive for a whole year after their last (most recent) account has been shut down.

The mapping between users and Accounts is done by the CINECA staff, who is in charge of creating new projects and associating a PI to each of them. A PI, in turn, can associate other users to a project as collaborators , via the UserDB page related to the project.

...

On systems like MARCONI, where different independent partitions are were available (KNL and SKL) you should specify the host you are interested in:

> saldo -b  (by default returns accounts active on SKL)> saldo -b --knl (returns accounts active on KNL)
> saldo -b --skl (returns accounts active on SKL)

...

that prints daily resource usage report for selected usernames and/or Accounts on the local cluster.

For more information, run the "saldo" command without any option.

Billing policy

The billing procedures do not consider the time spent in in interactive work is not considered by the billing procedures, meaning it is free of charge.

...

Please note that every cluster usually has usually a "serial" queue, defined on front-end nodes, that allows for serial jobs for a short time limit (maximum 4 hours). On these queues, accounting is not enabled, meaning that you can use them without being charged. As a consequenceConsequently, serial queues are allowed to be also used also when an account is expired or has exhausted all of its budget: it is useful for example for post-processing or data transfer.

...

On some clusters (for example on GALILEO GALILEO100 or MARCONI100LEONARDO) you can choose to allocate for your job only part of the node. You are not forced to allocate all of it as it happens in clusters (like MARCONI) running in exclusive mode. In this case, the accounting procedure also takes also into account the amount of memory you request for your job. If you ask for an amount of memory that is larger than the equivalent number of cores requested, the jobs will be billed for a larger number of cores than the ones you have reserved.
The billing always follows the basic idea illustrated above, but a generalized parameter for the number of reserved cores, accounting for the memory request, is now used:

...

  • MemFactor = (ReservedMemory / TotalMemory) / (ReservedCores / TotalCores)
    If the memory you ask for (in terms of the equivalent number of cores) is larger than the number of reserved cores. In this case, the amount of cpu-hours billed depends also depend on the amount of memory requested (i.e. the actual percentage of node allocated).

For example, on GALILEO  GALILEO100  the TotalMemory considered to calculate the MemFactor is 118000 MB 375300  MB of memory (around 116GB366 GB), and each compute node has 36 48 cores:

  • TotalMemory = 116 366 GB
  • TotalCore = 3648

If you ask for only one core and 58 GB of memory (thus allocating for yourself half of the node even if you are using one core), the MemFactor is:

  • ReservedMemory = 58 GB
  • ReservedCores = 1
  • → MemFactor = (58 GB / 116 366 GB) / (1 / 3648) = 188

Hence, with such a request for each hour of computation, your budget will be billed for 18 8 equivalent CPUs, i.e., for 18 8 hours. 

This rule applies for to each cluster based on its amount of total memory and cores.

"Non-exclusive" nodes: TMPDIR matters (LEONARDO only)!

On LEONARDO, the job's TMPDIR local area is managed by the slurm job_container/tmpfs plugin and can be explicitly requested on the diskful nodes. In this case, the accounting procedure also takes into account the amount of space (gres/tmpfs) you request for your job. If you ask for an amount of local storage that is larger than the equivalent number of cores requested, the jobs will be billed for a larger number of cores than the ones you have reserved.
The billing always follows the basic idea illustrated above, but a generalized parameter for the number of reserved cores, accounting for the local storage request, is now used:

accounted hours = ElapsedTime x ReservedCoresEquiv

where

ReservedCoresEquiv = ReservedCores x LocalStorageFactor

  • LocalStorageFactor = 1
    If the local storage you ask for (in terms of the equivalent number of cores) is smaller than or equal to the number of reserved cores. In this case, the amount of cpu-hours billed depends only on the number of requested cores.
  • LocalStorageFactor = (ReservedLocalStorage / TotalLocalStorage) / (ReservedCores / TotalCores)
    If the local storage you ask for (in terms of the equivalent number of cores) is larger than the number of reserved cores. In this case, the amount of cpu-hours billed also depend on the amount of local storage requested (i.e. the actual percentage of node allocated).

For example, on LEONARDO-DCGP partition, the TotalLocalStorage considered to calculate the LocalStorageFactor is 3 TB, and each compute node has 112 cores:

  • TotalLocalStorage = 3 TB
  • TotalCore = 112

If you ask for only one core and 1 TB of memory (thus allocating for yourself one third of the local storage of the node even if you are using one core), the LocalStorageFactor is:

  • ReservedLocalStorage = 1 TB
  • ReservedCores = 1
  • → LocalStorageFactor = (1 TB / 3 TB) / (1 / 112) = 37

Hence, with such a request for each hour of computation, your budget will be billed for 37 equivalent CPUs, i.e., for 37 hours. 

At present the slurm job_container/tmpfs plugin is ONLY enabled on LEONARDO.

Accounting and accelerators

Recently, the accounting system has been extended to nodes equipped with accelerators. The principle is the same as the memory accounting: asking for a number of accelerators that will make you allocate a bigger portion of the node than what is suggested by the simple number of cores requested, it will increase the consumption accordingly. 

For GALILEOLEONARDO, every GPU will be treated as 18 8 cores in terms of accounting. That is because GPU nodes have 36 32 CPUs and 2 4 GPU each. So allocating 1 GPU is equivalent to allocate half allocating a quarter of the node (i.e. 18 8 CPUs). For MARCONI100 that has 32 CPUs and 2 GPU each the rule holds similarly. 

Some examples based on GALILEO LEONARDO (1 node):

  • cpus=24, gpus=1 ==> the number of GPUs requested is equal as to having requested 18 8 CPUs, but since 24 of them have been requested in the standard way, they are not taken into account. Thus 24 CPUs will 24 CPUs will be billed;
  • cpus=6, gpu=1 ==>  the number of GPUs requested is equal as to having requested 18 8 CPUs, which is higher than the number of CPUs requested. Thus 18 8 CPUs will be billed;
  • cpus=24, gpus=2 4 ==> the number of GPUs requested is equal as to having requested 36 32 CPUs, while 24 of them have been requested in the standard way, and they are not enough to cover for the GPU request. Therefore 36 32CPUs will be billed;
  • cpus=24, gpus=1,mem=115GB 500GB ==> the situation is similar to the first example (so 24 CPUs billed), but the memory request is higher than what is guaranteed by the simple allocation of the CPUs or GPUs, since it is equivalent of allocating the entire node. So, 36 32CPUs will be billed.

Low priority production jobs for active projects with exhausted budget

...

The linearization effect on the priority is fine graduated, as the linearization parameter depends on the percentage of the monthly quota consumed. The job sorting formula also depends also on other aspects, like walltime, resources requested or time spent waiting in queue, so a low priority job can still have some chance of being executed in a quick amount of time if well-tuned (but not as quick as jobs with the same tuning but advantaged in terms of linearization priority). You can check the usage of your monthly quota with the "saldo -b" command: the last two columns are about the quota defined for your account and the monthly consumption.

This policy is similar to those already applied by other important HPC centers in Europe and worldwide. The goal is to improve the response time, giving users the opportunity of using the cpu hours assigned to their project in relation to their actual size (total amount of core-hours). Please note that it is recommended to apply applying a sort of "linearization" of your project budget is recommended. . Each month a given percentage of your budget is guaranteed, but non-linear usage is discouraged for the welfare of all the users that are simultaneously hosted by our HPC systems.

...