Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents
maxLevel3

...

Submitting Batch jobs for A2 partition

Marconi-A2 production environment is based on the latest release of PBS scheduler/resource manager.

After the first month of testing/pre-production phase, we introduced a simplified and more robust way to manage the job submission to the KNL (A2) partition. Our first configuration relied on the use of routing queues defined on the A1 PBS server, which however suffers of two major drawbacks concerning chained and interactive jobs.

The new configuration can take advantage of the definition of a PBS variable, which is automatically set when loading the "env-knl" module, available in the "base" profile. The loading of the module also modifies the (user-defined or default) prompt on the login nodes by putting the (KNL) string in front of it:

[username@r000u07l02 ~]$ module load env-knl
(KNL) [username@r000u07l02 ~]$

to remind users that they are using the production environment serving the A2 partition. In fact, once loaded the module, the PBS environment will be set accordingly to the PBS server defined on the A2 partition, and the jobs will be directly submitted to the A2 queues instead of passing through the routing queues defined on the A1 partition. With respect to the previous configuration. submission process results simplified. You do not more need to load the specific "env-knl" module to submit jobs on partitions based on Knights Landing processors. Instad, you simply need to specify the correct partition using the "SBATCH -p directive, and choosing from the following list:

knldebug

knlprod

knlbigprod

knlspecial

phase, you simply need to specify the correct queue in the #PBS -q <queue> directive, in case of special queues, or no queue for the default one (replacing the A1 routing queues "knl" and "knltest"). The loading of this environment directly exposes the login nodes to the KNL production partition; for instance, you will only see the queues defined on the KNL nodes with the "qstat -q" command, and the usual command "qsub", "qstat", "qdel" will act on the A2 PBS queues. Please note that differently from the A1 partition (where it suffices the job number to identify a job), and analogously to A3, on A2 the full job id is required (i.e. <job_number>.<PBS A2 server>, for instance 382113.r064u06s01).

Like in the A1 production environment, the A2-queues knldebug and knlprod, are not directly accessible and are served by the default routing queue "knlroute". This queue does not need to be requested, it being the default queue on the A2 PBS environment, and it will route the jobs either to the knldebug or to the knlprod (depending on the number of requested nodes and the walltime). The knlroute queue accepts jobs for the "shared" A2 partition, while a specific (dedicated) queue, xfuaknlprod, needs to be used for the MARCONI-Fusion KNL nodes.

Each KNL node exposes itself to SLURM has having 68 cores (corresponding to the physical cores of the KNL processor). Jobs should request the entire node (hence, #SBATCH -n 68), and the KNL PBS server is configured so that to assign the KNL nodes in an exclusive way (even if less ncpus are asked). Hyper-threading is enabled, hence you can run up to 272 processes/threads on each assigned node.

The preliminar configuration of the Marconi-A2 partition allowed to explore different HBM modes (on-package high-bandwidth memory based on the multi-channel dynamic random access memory (MCDRAM) technology) and clustering modes of cache operations. Please refer to the official Intel documentation for a description of the different modes. Following the suggestions of the Intel experts, we finally adopted one configuration only for all the KNL racks serving the knldebug and knlprod (academic) queues, namely:

  • cache/quadrant

The queues serving the Marconi FUSION partition allow instead the use of nodes in flat/quadrant or cache/quadrant modes, please refer to the dedicated document.

The additional queue knltest (which needs to be explicitly requested via the "#PBS -q knltest" directive ) is defined on racks which have been configured in the following modes (one rack, i.e. 72 KNL nodes, for each configurations):

  • cache/quadrant
  • flat/quadrant

The knltest queue has a restricted access and it is used to tune and optimize other configuration parameters and system tips. At the moment it is dedicated to internal tests, please contact superc@cineca.it for additional information.

Please note that the "mode" configuration is susceptible to changes, we will update you within the shortest delay at each configuration change.

Two "custom resources" have been defined at the chunk-level (mcdram and numa) to request nodes in a specific configuration. The resource mcdram can assume the value "flat" or "cache" (valid for Marconi FUSION queues and the knltest), the numa resource can be only quadrant  for all queues. The default values are mcdram=cache and numa=quadrant, hence for ordinary jobs there is no need to specify them.

The maximum memory which can be requested is 90 GB for cache nodes and 105 GB for flat nodes. However, to avoid memory swapping to disk with the associated performance degradation we strongly suggest to use up to 86 GB for cache nodes and 101GB for flat nodes.

For example, to request a single KNL node in a production queue the following PBS job script can be used:

 

#!/bin/bash
#PBS -l select=1:ncpus=68:mpiprocs=68:mem=86GB:mcdram=cache:numa=quadrant
#PBS -l walltime=0:30:00 
#PBS -A <account_no>
... # Other PBS resource requests
  
PATH_TO_EXECUTABLE > output_file

 

Please note that the following select directive will have the same effect (the default being cache/quadrant):
#PBS -l select=1:ncpus=68:mpiprocs=68:mem=86GB

This will enqueue the job on the knldebug queue. In the present configuration, all jobs requesting up to two KNL nodes and less than 30 minutes will be queued on the knldebug (defined on a pool of reserved nodes). All other jobs will end up in the knlprod queue.

As already mentioned, if the "env-knl" module is loaded, the command "qstat" will list all the jobs submitted on the A2 partition. Analogously, to have the list of jobs submitted by a specific user on the KNL nodes the usual flag "-u" will provide the information:

(KNL) [username@r000u07l02 ~]$ qstat -u $USER -w

The "-w" (wide) option is important because it provides a wider and more complete visualization of the status of your job. The importancy is due to the fact that, with regular qstat, the jobid might appear truncated and therefore unusable when using the two commands below. "qstat -w" guarantees a full and useful display of the jobid, among with other details about your job.

To obtain a full display job status, given a job_id reported by the qstat command, you can use the "-f" flag

(KNL) [username@r000u07l02 ~]$ qstat -f <job_id>

and to delete a job, you need to type:

(KNL) [username@r000u07l02 ~]$ qdel <job_id>

Remember that, for the previous two commands to properly work, the full id name is required as reported by the command "qstat -u $USER -w) (e.g., 382113.r064u06s01).

By unloading the env-knl module you will restore the default PBS configuration (and the default prompt); all PBS commands will hence refer to the server installed on the Broadwell (A1) partition.

 

The loading of the module is the recommended option for submitting jobs to the KNL partition, but an alternative method is provided for those users who frequently use both partitions (even though we suggest to start two separate shells to deal with the two production environments). When logging to Marconi's login nodes, the default PBS environment refers to the Broadwell partition (hence, all the PBS commands will refer to queues and jobs submitted on the Brodwell nodes). If you don't load the env-knl module, you can always query the A2 PBS server (and submit jobs to the A2 nodes) by explicitly providing the server to be queried. Two aliases have been defined to identify the primary and secondary A2 PBS servers, namely knl1 and knl2. Hence, with the command:

[username@r000u07l02 ~] qstat -q

you will see the list of A1 queues, while the command:

[username@r000u07l02 ~] qstat -q @knl1                 

or

[username@r000u07l02 ~] qstat -q @knl2 

will report the list of queues defined on the A2 partition. The PBS service is offered in the High-Availability mode, hence it relies on a primary (knl1) and a secondary (knl2) server. Depending on which one is the active one, the first or second command needs to be issued. If you load the env-knl module, the configuration file will automatically allow to select the active server, while if you want to explicitly query the A2 PBS servers you query the primary (knl1) server first and, if the connection is closed, query the secondary one (knl2).The same applies for all the other uses of the "qstat" command, for example: to see all the jobs submitted by a user the command:

[username@r000u07l02 ~] qstat -u $USER

will provide the list of jobs submitted to the Broadwell partition, while the command:

[username@r000u07l02 ~] qstat -u $USER @knl1 

will provide the list of jobs submitted to the KNL partition.

Analogously, to submit a job to the A1 partition the standard command:

[username@r000u07l02 ~] qsub <submission_script>

will assign the job to Broadwell nodes, while the command:

[username@r000u07l02 ~] qsub -q knlroute@knl1 <submission_script>

will direct the job to the KNL partition (the submission_script must obviously correctly refer to the different resources available on Broadwell and KNL nodes). If the secondary PBS server is up, all the above commands will need to refer to the @knl2 server.

To delete an A2 job from A1:

[username@r000u07l02 ~] qdel <jobid>@knl1

Be sure that there are no spaces between the jobid and "@knl1". Also make sure that you use the full jobid that is returned from "qstat -u $USER -w", because without the -w option the jobid might be truncated.

 

Finally, the symmetric module env-bdw is also available in the "base" profile to load the production environment defined on the Broadwell partition (and change the prompt accordingly, by putting (BDW) in front of the user-defined or default prompt). The loading of this module (a part the change of the prompt) is completely equivalent to unload the "env-knl" module. Hence, if you loaded the env-knl module and you want to go back to the A1 production environment, you can either load the env-bdw module (which will also result in the change of the prompt) or unload the env-knl module (which will restore the original user-defined or default prompt).

Submitting Batch jobs for A3 partition

In this preliminary phase the A3 partition is ecxlusive for Eurofusion users. In the second phase will be opened to scientific community.

The new configuration can take advantage of the definition of a PBS variable, which is automatically set when loading the "env-skl" module, available in the "base" profile. The loading of the module also modifies the (user-defined or default) prompt on the login nodes by putting the (SKL) string in front of it:

[username@r000u07l02 ~]$ module load env-skl
(SKL) [username@r000u07l02 ~]$

[username@r000u07l02 ~]$

With respect to the previous configuration. submission process results simplified. You do not more need to load the specific "env-knl" module to submit jobs on partitions based on Knights Landing processors. Instad, you simply need to specify the correct partition using the "SBATCH -p directive. Choosing a knl_***_*** partition, you will be sure to work on KNL nodes.

Each KNL node exposes itself to SLURM as having 68 cores (corresponding to the physical cores of the KNL processor). Jobs should request the entire node (hence, #SBATCH -n 68), and the KNL PBS server is configured so that to assign the KNL nodes in an exclusive way (even if less ncpus are asked). Hyper-threading is enabled, hence you can run up to 272 processes/threads on each assigned node.

The preliminar configuration of the Marconi-A2 partition allowed to explore different HBM modes (on-package high-bandwidth memory based on the multi-channel dynamic random access memory (MCDRAM) technology) and clustering modes of cache operations. Please refer to the official Intel documentation for a description of the different modes. Following the suggestions of the Intel experts, we finally adopted one configuration only for all the KNL racks serving the knldebug and knlprod (academic) queues, namely:

  • cache/quadrant

The queues serving the Marconi FUSION partition allow instead the use of nodes in flat/quadrant or cache/quadrant modes, please refer to the dedicated document.

The maximum memory which can be requested is 90 GB for cache nodes. However, to avoid memory swapping to disk with the associated performance degradation we strongly suggest to use up to 86 GB for cache nodes.

For example, to request a single KNL node in a production queue the following SLURM job script can be used:

 

#!/bin/bash
#PBS -l select=1:ncpus=68:mpiprocs=68:mem=86GB:mcdram=cache:numa=quadrant
#PBS -l walltime=0:30:00 
#PBS -A <account_no>
... # Other PBS resource requests
  
PATH_TO_EXECUTABLE > output_file

Submitting Batch jobs for A3 partition

Most of the A3-SkyLake nodes are reserved for EuroFusion users only.

sinfo -d lists the following partitions.

skl_fua_prod and skl_fua_dbg are, as it is obvious, reserved to EuroFusion users.

skl_usr_skl and skl_usr_dbg are, instead, opened to academin productionto remind users that they are using the production environment serving the A3 partition. In fact, once loaded the module, the PBS environment will be set accordingly to the PBS server defined on the A3 partition, and the jobs will be directly submitted to the A3 queues instead of passing through the routing queues defined on the A1 partition. In the first period the A3 partition is exclusive for the Eurofusion community, and only two queues are defined: sklsystem (private queue for internal usage) and xfuasklprod (production queue for EUROfusion users). The xfuasklprod queue can be specificated in #PBS -q <queue> directive.
The loading of this environment directly exposes the login nodes to the SKL production partition; for instance, you will only see the queues defined on the SKL nodes with the "qstat -q" command, and the usual command "qsub", "qstat", "qdel" will act on the A3 PBS queue. In the second period the A3 partition will be opened to a scientific wider community usage and more queues will be added.

Each SKL node exposes itself to PBS has as having 48 cores (corresponding to the 48 physical cores of the SKL processor). Jobs should request the entire node (hence, ncpus=48), and the SKL PBS server is configured so that to assign the SKL nodes in an exclusive way (even if less ncpus are asked) .

As for KNL partition, SLURM assignes a SkyLake node in exclusive way, i.e. user will pay for full node even if only requests 1 task per node. 

The maximum memory which can be requested is 180 GB and this value guarantees that no memory swapping will occur.

...