UG3.4.x: PLX Batch Scheduler PBS

ATTENTION: PLX is out of production by Jan 21, 2015

(updated: )

In this page:

Portable Batch System (or simply PBS) is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources. At present PBS is the scheduling system of GALILEO and PICO.

Running applications using PBS

With PBS you specify the tasks to be executed; the system takes care of running these tasks and returns the results to you. If the available computers are full, then PBS holds your work and runs it when the resources are available.

With PBS you create a batch job which you then submit to PBS. A batch job is a file (a shell script under UNIX) containing the set of commands you want to run. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of processors and CPU time) that your job needs.

Once you create your PBS job, you can reuse it if you wish. Or, you can modify it for subsequent runs.

For example, here is a simple PBS batch job to run a user's application by setting a limit (one hour) to the maximum wall clock time, requesting 1 node with 1 cpus:

#!/bin/bash
#PBS -A <account_no>               (only for account based usernames)
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=1 
#
./my_application

PBS provides two user interfaces: a command line interface (CLI) and a graphical user interface (GUI). The CLI lets you type commands at the system prompt. Only the CLI is presented here, if you are interested in the GUI, please refer to the official documents.

PBS has been configured differently on the various systems reflecting the different system features. Please refer to the specific system page for more detailed information.

PBS commands

The main user's commands of PBS are reported in the table below: Please consult the man pages for more information.

qsub	Submit a job
qstat	Status job, queue, Server
qdel	Delete job

Submit a job:

> qsub [opts ] job_script
> qsub -I [opts] -- /bin/bash  (interactive job)

The second command is related to a so-called "Interactive job": sets job's interactive attribute to TRUE. The job is queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected to the terminal session in which qsub is running. When the job begins execution, all input to the job is taken from the terminal session. Use CNTR-D to close the session.

Displaying Job Status:

> qstat                 (lists all jobs)
> qstat -u $USER (lists only jobs submitted by you)

> qstat <job_id> (only the specified job)

> qstat -f <job_id> (full display of the specified job)

Displaying Queue Status:

> qstat -Q 
> qstat -Qf  <queuename>     (Long format of the specified  queue)

Delete a job:

> qdel  <jobID>.node351

More information about these commands are available with the man command

The User Environment

There are a number of environment variables provided to the PBS job. Some of them are taken from the user's environment and carried with the job. Others are created by PBS.

All PBS-provided environment variable names start with the characters PBS_. Some are then followed by a capital O (PBS_O_) indicating that the variable is from the job's originating environment (i.e. the user's).

The following short example lists some of the more useful variables, and typical values:

PBS_JOBNAME=jobb
PBS_ENVIRONMENT=PBS_BATCH
PBS_JOBID=6207.node351.plx.cineca.it
PBS_QUEUE=archive

PBS_O_WORKDIR=/gpfs/scratch/usercin/aer0
PBS_O_HOME=/plx/usercin/aer0
PBS_O_QUEUE=parallel
PBS_O_LOGNAME=aer0
PBS_O_SHELL=/bin/bash
PBS_O_HOST=node343.plx.cineca.it
PBS_O_MAIL=/var/spool/mail/aer0
PBS_O_PATH=/cineca/bin:/cineca/sysprod/pbs/default/bin: ...

There are a number of ways that you can use these environment variables to make more efficient use of PBS. For examplePBS_ENVIRONMENT can be used to test if we were running under PBS. Another commonly used variable is PBS_O_WORKDIRwhich contains the name of the directory from which the user submitted the PBS job

(NOTE: PBS does execute job script in the $HOME directory by default!)

PBS Resources

A job requests resources through the PBS sintax; PBS matches requested resources with available resources, according to rules defined by the administrator; when resources are allocated to the job, the job can be executed.

There are different types of resources, i.e. server level resources like walltime, and chunked reources like number of cpus or nodes. Other resources may be added to manage access to software resources for example, particularly when resources are limited and lack of their availability leads to aborting the jobs when they are scheduled for execution. Details may be found in documentation (module help) of the interested applications.

The sintax of the request depends on which type is concerned:

#PBS -l <resourse>=<value>        (server level resources, e.g. walltime)
#PBS -l select=[N:]chunk[+[N:]chunk ...] (chunk resources, e.g. cpus, gpus,mpiprocs)

For example:

#PBS -l walltime=10:00
#PBS -l select=1:ncpus=1

Moreover, resources can be required in one of two possible ways:

1) using PBS directives in the job script

2) using options of the qsub command

PBS job script

A PBS job script consists of:

An optional shell specification
PBS directives
Tasks -- programs or commands

Once ready, the job must be submitted to PBS:

> qsub [options] <name of script>

The shell to be used by PBS, if given, is defined in the first line of the job script:

#!/bin/bash

PBS directives are used to request resources or set attributes. A directive begins with the default string #PBS. One or more directives can follow the shell definition in the job script.

The tasks can be programs or commands. This is where the user specifies an application to be run.

PBS directives: num. of processors

The number of cpus required for a serial or parallel MPI/OpenMP/mixed job must be required with the "select" directive:

#PBS -l select=NN:ncpus=CC:ngpus=GG:mpiprocs=TT[+N1....]

where:

NN = number of nodes (max depending on the queue)
CC = number of physical cpus per node (max 16)
GG = number of physical gpus per node (max 2)
TT = number of MPI tasks per node

for example:

#PBS -l select=1:ncpus=1             --> serial job
#PBS -l select=2:ncpus=8:mpiprocs=8  --> MPI job (2 nodes and 8 proc per node)
#PBS -l select=2:ncpus=8:mpiprocs=1  --> mixed job (2 MPI tasks and 8 threads/task)
#PBS -l select=1:ncpus=16:mpiprocs=16:ngpus=2     
                                     --> MPI job with cuda (16 MPI tasks and 2 GPUs)
#PBS -l select=2:ncpus=8:mpiprocs=8+1:ncpus=5:mpiprocs=5 
                                     --> 21 MPI tasks on three nodes: 8+8+5

Please note that hyper-threadind is disabled, so it is warmly recommended to set a value for mpiprocs less or equal to ncpus. If you specify a higher number of mpiprocs you will overload the physical cores and slow down your own execution.

PBS directives: processing time

Resources as the computing time, must be requested in this form:

#PBS -l walltime=<value>

where <value> express the actual elapsed time (wall-clock) in the format hh:mm:ss

for example:

#PBS -l walltime=1:00:00 (one hour)

PBS directives: memory allocation

The default memory depends on the system you are working with. To have best control you can specify the requested memory (up to to maximum memory available on the node - it might depends on the platform: at present both on Galileo and Pico is 122GB at most) using the mem directive:

#PBS -l select=NN:ncpus=CC :mpiprocs=TT:mem=24GB

Please note: if you are requiring a larger memory with respect to the "main amount" on the system, the number of "effective core" could increase. For more information refers to the "accounting" section.

Other PBS directives

#PBS -N name        --> specify job name
#PBS -q destination --> specify the queue destination. For a list a description of available 
                        queues, please refer to the specific cluster description of the guide
#PBS -o path        --> redirects output file
#PBS -e path        --> redirects error file
#PBS -j eo          --> merge std-err and std-out
#PBS -m mail_events --> specify email notification (a=aborted,b=begin,e=end,n=no_mail)
#PBS -M user_list   --> set email destination (email addr)

qsub attributes

The attributes can also be set using the qsub command options:

> qsub [-N name]  [-q destination]  [-o path] [-e path] [-j eo]
       [-m mail_events] [-M user_list]                          <name of script>

The resources can also be required using the qsub command options.

> qsub  [-l walltime=<value>] [-select=<value>] [-A <account_no>] <name of script>

The qsub command options override directives.

Examples

Serial job script

For a typical serial job you can take the following script as a template, modifying it depending on your needs.

The script asks for 10 minutes wallclock time and runs a serial application (R). The input data are in file "data", the output file is "out.txt"; job.out will contain the std-out and std-err of the script. The working directory is $CINECA_SCRATCH/test/.

The account number (PBS -A) is required only for users defined on a username/account basis. To find out your account number, please use the "saldo -b" command.

#!/bin/bash
#PBS -o job.out
#PBS -j eo
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1
#PBS -A <my_account>
# 
cd $CINECA_SCRATCH/test/ 
module load R 
R < data > out.txt

Serial job script with specific queue request

This script is similar to the previous one bud ask for a specific queue. Since "archive" runs on the login nodes, a job running here can access given filesystems mounted only on these nodes.

#!/bin/bash
#PBS -o job.out
#PBS -j eo
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1
#PBS -A <my_account>
#PBS -q archive
#
cd $CINECA_SCRATCH/test/
cp /gss/gsswork/DRES_mia/* .

OpenMP job script

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=8:mpiprocs=1
#PBS -o job.out
#PBS -e job.err
#PBS -A <my_account>

cd $PBS_O_WORKDIR

module load intel openmpi 
./myprogram

MPI Job Scripts

For a typical MPI job you can take the following script as a template, modifying it depending on your needs.

The script asks for 24 tasks and 1 hour of wallclock time, and runs a MPI application (myprogram) compiled with the intel compiler and the openmpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from.

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=2:ncpus=10:mpiprocs=10        # 2 nodes, 10 procs/node = 20 MPI tasks
#PBS -o job.out
#PBS -e job.err
#PBS -A <my_account>

cd $PBS_O_WORKDIR ! this is the dir where the job was submitted from

module load intel openmpi
mpirun ./myprogram < myinput > myoutput

MPI+OpenMP job script

The script asks for 2 MPI tasks and 8 OpenMP threads, 1 hours of wallclock time. The application (myprogram) was compiled with the intel compiler and the openmpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from.

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=2:ncpus=8:mpiprocs=1
#PBS -o job.out
#PBS -e job.err
#PBS -A <my_account>

cd $PBS_O_WORKDIR

module load intel openmpi 
mpirun ./myprogram

Serial job script using 1 GPU

For a typical serial job using 1 GPU you can take the following script as a template, modifying it depending on your needs.

The script asks for 30 minutes wallclock time and runs a serial application on the GPU resource.

#!/bin/bash 
#PBS -l walltime=30:00 
#PBS -l select=1:ncpus=1:ngpus=1 
#PBS -o job.out 
#PBS -e job.err 
#PBS -A <my_account> 
cd $PBS_O_WORKDIR 

./myCUDAprogram

Chaining multiple jobs

In some cases, you may want to chain multiple jobs together, in a way that the output of a run can be used as input of the next run. This is tipycal when you perform Molecular Dynamics Simulations and you want to obtain a long trajectory from multiple simulation runs.

In order to exploit this feature you need to submit the job with the following syntax:

> qsub -W depend=afterok:JOBID.node351.plx.cineca.it job.sh

where JOBID is the job id (e.g. 204375) of the job you want to concatenate. There are multiple choices for the dependency, please refer to the PBS manual (PBS Professional User Guide 10.0) at page 164.

Array jobs

This is an efficient way to performe multiple similar runs, either serial or parallel, by submitting a unique job. On PLX, the maximum allowed number of runs in an array job is set to 20.

In the following examples, 20 serial simulations, that differ only by the input, are submitted through the "job.cmd" script.

#!/bin/bash
#PBS -l select=1:ncpus=1:mpiprocs=1
#PBS -l walltime=24:00:00
#PBS -A <account_name>
#PBS -N job_array
#PBS -J 1-20
#PBS -r y

cd $PBS_O_WORKDIR
cd $PBS_ARRAY_INDEX

../exe input$PBS_ARRAY_INDEX.txt

In the previous example, 20 sub-jobs, each with 1 cpu and 1 GB of memory, are generated in according to the total number requested through the -J PBS option. The $PBS_ARRAY_INDEX environment variable takes on each value in the range specified by the -J option. This variable can be used to identify the directories, created in advance for each input, and the input files needed by the sub-job.

#!/bin/bash
#PBS -l select=1:ncpus=1:mpiprocs=1
#PBS -l walltime=24:00
#PBS -N job_array
#PBS -A <account_name>
#PBS -J 2-21
#PBS -r y 
# Comment: in this example the first value of the range is 2 and not 1 
# because the first line of the list resulting from "ls -1" command 
# is a configuration file valid for all simulations. The elements of the 
# range can take on any value, but they can not be more than 20

cd $PBS_O_WORKDIR

filename=`ls -1 inputs/ | tail -n +${PBS_ARRAY_INDEX} | head -1`

mkdir outputs/$PBS_ARRAY_INDEX
cd outputs/$PBS_ARRAY_INDEX
cp ../../inputs/configuration_file .
cp ../../inputs/$filename .

../../exe $filename

In the previous example, we show how you can use, for instance, arbitrary input file names: you can put them into an "input" directory and select the file to process in each simulation by using the "tail and head" commands and the value of the environment variable $PBS_ARRAY_INDEX. Each run creates an output directory (identified by the value of the array variable) where both the output and a copy of the corresponding input are written.

Please notice that you always need to add the PBS directive which declares the job to be re-runnable, which is mandatory in order to submit an array job:

#PBS -r y

If you don't add the previous directive, the default (#PBS -r n) will be assumed and the array job will not be accepted by the scheduler.

In order to submit a job array:

qsub job_array.sh

In order to display the status of your job array:

qstat <job_array_id>

In order to display the status of all sub-jobs of a job array:

qstat -t <job_array id>

Further documentation

Man pages are available on the PLX system:

> man pbs
> man qsub
> man qstat
> man qdel
> man ...

The "PBS Professional User Guide 10.0" is available in the "docs/documents/IBM-PLX" folder of the hpc.cineca.it portal.

Outgoing links:

Page tree