Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Part of this system (MARCONI_Fusion)  is reserved for the activity of EUROfusion (https://www.euro-fusion.org/). Details on the MARCONI_Fusion environment are reported in a dedicated document.

 

Access

All the login nodes have an identical environment and can be reached with SSH (Secure Shell) protocol using the "collective" hostname:

...

The info reported here refer to the general MARCONI partition. The production environment of MARCONI_Fusion is discussed in a separate document.

As usual on systems using SLURM, you can submit a script script.x using the command:

...

The queues serving the Marconi FUSION partition allow instead the use of nodes in flat/quadrant or cache/quadrant modes, please refer to the dedicated document.

The maximum memory which can be requested is 90 GB for cache nodes. However, to avoid memory swapping to disk with the associated performance degradation we strongly suggest to use up to 86 GB for cache nodes.

...

srun ./myexecutable --mpi=pmi2

 

This will enqueue the job on the xfuasklprod  queue.

As already mentioned, if the "env-skl" module is loaded, the command "qstat" will list all the jobs submitted on the A3 partition. Analogously, to have the list of jobs submitted by a specific user on the SKL nodes the usual flag "-u" will provide the information:

(SKL) [username@r000u07l02 ~]$ qstat -u $USER -w

The complete list of the status of user's job is visualized with the full jobid. To obtain a full display job status, given a job_id reported by the qstat command, you can use the "-f" flag

(SKL) [username@r000u07l02 ~]$ qstat -f <job_id>

and to delete a job, you need to type:

(SKL) [username@r000u07l02 ~]$ qdel <job_id>

For the previous commands to properly work, please note that differently from the A1 partition (where it suffices the job number to identify a job), and analogously to A2, on A3 the full job id is required as reported by the command "qstat -u $USER -w" (i.e. <job_number>.<PBS A3 server>, for instance 37239.r000u26s04).

By unloading the env-skl module you will restore the default PBS configuration (and the default prompt); all PBS commands will hence refer to the server installed on the Broadwell (A1) partition.

The loading of the module is the recommended option for submitting jobs to the KNL partition, but an alternative method is provided for those users who frequently use more partitions (even though we suggest to start two separate shells to deal with the two production environments). When logging to Marconi's login nodes, the default PBS environment refers to the Broadwell partition (hence, all the PBS commands will refer to queues and jobs submitted on the Brodwell nodes). If you don't load the env-skl module, you can always query the A3 PBS server (and submit jobs to the A3 nodes) by explicitly providing the server to be queried. Two aliases have been defined to identify the A3 PBS servers: the primary server skl1 and the secondary server skl2. Hence, with the command:

username@r000u07l02 ~] qstat -q

you will see the list of A1 queues, while the command:

[username@r000u07l02 ~] qstat -q @skl1                 

or

[username@r000u07l02 ~] qstat -q @skl2 

will report the list of queues defined on the A3 partition.

If you load the env-skl module, the configuration file will automatically allow to select the active server, while if you want to explicitly query the A3 PBS servers you query the primary (skl1) server first and, if the connection is closed, query the secondary one (skl2). The same applies for all the other uses of the "qstat" command, for example: to see all the jobs submitted by a user the command:

[username@r000u07l02 ~] qstat -u $USER

will provide the list of jobs submitted to the Broadwell partition, while the command:

[username@r000u07l02 ~] qstat -u $USER @skl1 

will provide the list of jobs submitted to the SKL partition.

To delete an A3 job from A1 or A2 partition:

[username@r000u07l02 ~] qdel <jobid>@skl1

Be sure that there are no spaces between the jobid and "@skl1". Also make sure that you use the full jobid that is returned from "qstat -u $USER -w", because without the -w option the jobid might be truncated.

Finally, you can change the partition (from A3 partition to A1 or A2 partition) loading env-bdw or env-knl module, which will also result in the change of the prompt by putting (BDW) or (KNL) in front of the user-defined or default prompt. Otherwise you can unload the env-skl module and restore in this way the original prompt.

Summary

In the following table you can find all the main features and limits imposed on the queues of the shared A1 and A2 partitions. For Marconi-FUSION dedicated queues please refer to the dedicated document.

Queue namePartition # cores per job
max walltime

max running jobs per user/

max n. of cpus per user

max memory per jobpriorityHBM/clustering modenotes
 A1debug

min = 1

max = 144

02:00:004/144

123 GB/node

value suggested: 118 GB/node

40 

managed by route

runs on 24 nodes shared with

the visualrcm queue

routeA1prod

min = 1

max = 2304

24:00:0020/2304

123 GB/node

value suggested: 118 GB/node

50 managed by route
 A1bigprod

min = 2305

max = 6000

24:00:00

1/6000

123 GB/node

value suggested: 118 GB/node

60 managed by route
specialA1special

min = 1

max = 36

180:00:00 

123 GB/node

value suggested: 118 GB/node

100 

ask superc@cineca.it

#PBS -q special

serialA1serial104:00:00

max 12 jobs on this queue

max 4 jobs per user

1 GB30 #PBS -q serial
visualrcmA1visualrcm

min = 1

max = 144

03:00:001/144

123 GB/node

value suggested: 118 GB/node

40 

runs on 24 nodes shared with

the debug queue

          
knlrouteA2knldebug

min = 1

max = 136 (2 nodes)

00:30:005/340

90 GB/node (mcdram=cache)

value suggested: 86 GB/node

40

mcdram=cache

numa=quadrant

managed by knlroute

runs on 144 dedicated nodes

 A2knlprod

min >136

max = 68000 (1000 nodes)

24:00:0020/68000

90 GB/node (mcdram=cache)

value suggested: 86 GB/node

50

mcdram=cache

numa=quadrant

managed by knlroute

 

knltestA2knltest

min =1

max = 952 (14 nodes)

24:00:00-

90 GB/node (mcdram=cache)

value suggested: 86 GB/node

105 GB/node (mcdram=flat)

value suggested: 101 GB/node

30

mcdram=<cache/flat>

numa=quadrant

ask superc@cineca.it

#PBS -q knltest

#PBS -W group_list=<account_no>

...