The utilization of our High-Performance Computing (HPC) resources operates within a "pay-per-use" framework. Each approved Project Account is endowed with a budget in local core hours, which may be expended for accessing HPC resources. Notably, interactive jobs incur no charges, as reported in the documentation provided here. For batch processes, charges are assessed based on the duration of utilization and the quantity of requested resources, irrespective of their degree of utilization.

Execution of batch processes is exclusively allowed only for authenticated users, and related cost is measured in terms of hours, thereby depleting the total allocated budget. Upon exhaustion of the budget associated with a Project Account, batch job submission is suspended.

The majority of software featured within our Software Catalog may be accessed without incurring additional costs due to utilization licenses, notwithstanding any attendant licensing stipulations. However for a limited subset of Softwares, special permissions may be needed (write to superc@cineca.it).


Project Account and User Account

Each approved grant is linked to a budget denominated in standard core hours (STDH), uniquely identified by a Project Account name. Detailed instructions on how to formulate a grant application and information regarding submission deadlines can be accessed through the provided guidelines available here. A User Account may be linked to multiple Project Accounts and may operate within one or more HPC clusters (such as Marconi or Leonardo). It is imperative to treat User Account credentials as strictly confidential. Sharing of credentials, even among members of the same working group, is strictly prohibited.

Each individual user in possession of login credentials bears personal responsibility for any misuse that may occur !

How to check your budget (saldo command)

Accessing our HPC systems enables you to check the Project Account, its current available budget, and the User Account. Simply use the command saldo  from the command line interface. Below are examples illustrating typical usage of the saldo  command:

saldo -b

You can list all the Accounts attached to your username on the current cluster, together with the "budget" and the consumed resources, with the command:

> saldo -b
(an example of the command output is reported below)
-----------------------------------------------------------------------------------------------------------------------------------------


account                start         end         total        localCluster   totConsumed     totConsumed     monthTotal     monthConsumed

                                             (local h)   Consumed(local h)     (local h)               %      (local h)         (local h)

-----------------------------------------------------------------------------------------------------------------------------------------

Proj_A            20110323    20300323         50000               25000      55027726             50.0           600                600

Proj_B            20220427    20301231        100000               10000         27086             10.0           731                 731

Proj_C            20230524    20300323          6500                   0             0              0.0             0                   0


The column account  refers to the Project Account, which in other terms corresponds to approved grants. In the context of the example being discussed, there exist three approved grants identified by the Project Names: Proj_A, Proj_B  and Proj_C.

The start  and end columns specify the begin and the end of each Project Account, whereas total column provides the total hours assigned to the grant. The column monthTotal refers to the amount of consumed hours (see section Budget Linearization for more informations). The titles of the remaining columns are self-explanatory. 

One single username can use multiple Accounts and one single Account can be used by multiple usernames (possibly on multiple platforms), all competing for the same budget.

On systems like LEONARDO, where different independent partitions are available (Booster and DCGP) you should specify the host you are interested in:

> saldo -b  (by default returns accounts active on Booster)
> saldo -b --dcgp (returns accounts active on DCGP)

suggestions

Detailed information can be obtained just executing the command  > saldo without any option.

saldo -r

> saldo -r -s 202403 -e 202405 -u user1 

(flags: -s 202403 [starting time,yyyymm] -e 202405 [end time, yyyymm], -u <user account>) 

------------------Resources used from 202403 to 202405------------------
date        username    account              localCluster       num.jobs
                                               Consumed/h
------------------------------------------------------------------------
20240301    user1    	Proj_A 	                0:01:00              1
20240301    user1    	Proj_B                  0:05:00              1
20240303    user1    	Proj_A                 	1:05:36              4
20240504    user1    	Proj_A                 	0:02:40              6
20240506    user1    	Proj_A                  0:00:56              1
-------------------------------------------------------------------------  


The saldo  command provides detailed information about cluster usage on a daily basis, including the date, Project Account name (column "account"), consumed hours for the day, and the total number of jobs submitted on that day. This information is essential for monitoring and managing resource utilization efficiently.

suggestions

Detailed information can be obtained just executing the command  > saldo without any option.

Billing Policy

The Billing Policy encompasses a structured methodology for determining the utilization of reserved resources required to access HPC services, measured in terms of STDH. The calculation of STDH is contingent upon various factors, including the cluster section where a job is executed, the quantity of reserved cores, allocated memory capacity, and other relevant parameters. It is important to note utilization of the serial partition (which imposes a wall time limit of 4 hours) are not factored into the calculation (free of charge). Additionally, access to the serial partition is permitted for limited post-production data analysis, even for Project Accounts that have expired.

Billing is exclusively applied to batch jobs and is based on the notion of "elapsed time" and the count of  "equivalent reserved cores". It is important to emphasize that the quantity of "equivalent reserved cores" may diverge from the number of physical cores available. The following expression is used to calculate the amount of billed hours (abh):

abh = ET x CE

where ET  is the elapsed time measured in hours, and CE is the number of equivalent reserved cores. Depending on the type of HPC services required and the specific cluster section utilized, the calculation of CE varies. Below are typical examples illustrating CE  calculations:


For specific CINECA HPC sections and/or clusters, the default minimum request for resources is either a single node or a multiple of it (up today only on Marconi). Even if users request a partial node allocation, the billing will consider at minimum a single node. The billing procedure follows the principle:


abh = ET x CE

For example, consider a node composed of 48 cores. If a user reserves 16 cores, their budget will be billed for 48 equivalent cores.

In another scenario, suppose a user requests 49 cores, where each compute node has 48 cores. In this case, the user's budget will be billed for 96 cores (2 nodes x 48 cores/node).


Here's how to set  exclusive  in your job script:

#SBATCH --exclusive  

Note

Billing and job execution are distinct concepts in the context of HPC systems. When using the option "--exclusive" in job submission, it ensures that no other jobs are executed on the reserved node. However, if a user requests only a subset of the available cores on a node (e.g., 16 cores out of 48 cores/node), their job will execute based on the specifications provided in the job script (in this example, utilizing 16 cores).

However, billing will be calculated based on the entire node (48 cores), as explained previously. This ensures accurate accounting for the resources reserved, regardless of the actual usage within the node.

Users have the option to allocate only a portion of a node for their job, as opposed to allocating the entire node. In such instances, the accounting procedure incorporates the requested amount of memory for the job. If the requested memory exceeds the equivalent number of cores reserved, the job will be billed for a greater number of cores than those initially reserved.

The billing procedure adheres to the following principle:


abh = ET x CE

where:  CE = No. of cores * MemFactor

and the MemFactor is determined as follows:

  • MemFactor = 1 if the requested memory (in terms of equivalent cores) is smaller than or equal to the number of reserved cores. In this scenario, the billed cpu-hours depend solely on the number of requested cores.


  • MemFactor = (ReservedMemory / TotalMemory) / (ReservedCores / TotalCores) if the requested memory (in terms of equivalent cores) surpasses the number of reserved cores. In such cases, the billed cpu-hours also depend on the amount of memory requested (i.e., the actual percentage of the node allocated).


For example, on GALILEO100, where each compute node has 48 cores and a total memory of 366 GB, if a user requests only one core and 58 GB of memory (thus utilizing half of the node), the MemFactor is calculated as:

ReservedMemory = 58 GB

ReservedCores = 1

MemFactor = (58 GB / 366 GB) / (1 / 48) = 8

Consequently, for each hour of computation, the user's budget will be billed for 8 equivalent cores, equivalent to 8 hours.

On LEONARDO, the job's TMPDIR local area is managed by the slurm job_container/tmpfs  plugin and can be explicitly requested on the diskful nodes. In this scenario, the billing procedure also considers the requested amount of space (gres/tmpfs)  for the job. If the requested local storage exceeds the equivalent number of cores reserved, the job will be billed for a greater number of cores than those initially reserved.

The billing procedure follows the same principle as previously described, but incorporates a generalized parameter for the number of reserved cores, accounting for the local storage request:

abh = ET x CE

where:

CE = ReservedCores x LocalStorageFactor

and the LocalStorageFactor is determined as follows:

  • LocalStorageFactor = 1 if the requested local storage (in terms of equivalent cores) is smaller than or equal to the number of reserved cores. In such cases, the billed cpu-hours depend solely on the number of requested cores.

  • LocalStorageFactor = (ReservedLocalStorage / TotalLocalStorage) / (ReservedCores / TotalCores)

    if the requested local storage (in terms of equivalent cores) surpasses the number of reserved cores. In such cases, the billed cpu-hours also depend on the amount of local storage requested (i.e., the actual percentage of the node allocated).

For example, on the LEONARDO-DCGP partition, where each compute node has 112 cores and a total local storage of 3 TB, if a user requests only one core and 1 TB of local storage (thus utilizing one third of the node's local storage), the LocalStorageFactor is calculated as:

ReservedLocalStorage = 1 TB
ReservedCores = 1
LocalStorageFactor = (1 TB / 3 TB) / (1 / 112) = 37

Consequently, for each hour of computation, the user's budget will be billed for 37 equivalent CPUs, equivalent to 37 hours.

It's important to note that the slurm job_container/tmpfs  plugin is currently enabled only on LEONARDO.

The accounting system has recently been expanded to encompass nodes equipped with accelerators, such as GPUs. Similar to memory accounting, requesting a number of accelerators that results in allocating a larger portion of the node than indicated by the simple number of cores requested will increase resource consumption accordingly.

For LEONARDO, each GPU is treated as equivalent to 8 cores in terms of accounting. This is because GPU nodes feature 32 CPUs and 4 GPUs each. Therefore, allocating 1 GPU is equivalent to allocating a quarter of the node, which translates to 8 CPUs.

Here are some examples based on LEONARDO (1 node):

  • cpus=24, gpus=1.

    • The requested number of GPUs is equivalent to having requested 8 CPUs.

    • Since 24 CPUs have been requested in the standard way, they are not taken into account.

    • Thus, 24 CPUs will be billed.

  • cpus=6, gpu=1.

    • The requested number of GPUs is equivalent to having requested 8 CPUs, which exceeds the number of CPUs requested.

    • Therefore, 8 CPUs will be billed.

  • cpus=24, gpus=4.

    • The requested number of GPUs is equivalent to having requested 32 CPUs.

    • However, since only 24 CPUs have been requested in the standard way, and they are insufficient to cover the GPU request.

    • 32 CPUs will be billed.

  • cpus=24, gpus=1,mem=500GB.

    • This scenario is analogous to the first example (resulting in 24 CPUs billed), but the memory request exceeds what is guaranteed by the simple allocation of the CPUs or GPUs, as it is equivalent to allocating the entire node.

    • Thus, 32 CPUs will be billed.

For active projects with depleted budgets, low priority production jobs may still be accommodated to utilize computational resources. This is facilitated by enabling the usage of the qos_lowprio . To avail of this option, a request should be directed to superc@cineca.it, providing a rationale for the necessity of continued resource utilization despite budget constraints. Upon a favorable evaluation, access to the qos_lowprio  will be granted.

Here's how to include the qos_lowprio  in your job script:

#SBATCH --qos=qos_lowprio 
#SBATCH -A <account> 
# Specify your non-expired, depleted account 

By incorporating these directives into your job script, you can ensure that low priority production jobs for active projects with exhausted budgets receive the necessary resource allocation.


Use of the CINECA Resources is at the risk of the User. CINECA does not make any guarantee as to their availability or their suitability for purpose.

Budget Linearization

A linearization policy governs the priority of scheduled jobs across Cineca clusters. To each Project Account is assigned a monthly quota calculated as: 

total_budget/total_no_of_months

Beginning on the first day of each month,  any User Accounts belonging a Project Account may utilize their quota at full priority. As the budget is consumed, submitted jobs progressively lose priority until the monthly quota is exhausted. Subsequently, these jobs are still considered for execution but with reduced priority compared to accounts with remaining quota. This policy aligns with practices at other prominent HPC centers globally, aiming to enhance response times by aligning CPU hour usage with budget sizes.

A simple working scheme of budget linearization is showed in the figure below.


budget linearization scheme


It's recommended to adhere to a linearized usage of your budget, as non-linear consumption may impact the welfare of all users concurrently utilizing our HPC systems.
  • No labels