EURORA Batch Scheduler PBS

EURORA has been taken out of production in August 2014

In this page

Running applications using PBS
- PBS commands
- The User Environment
PBS Resources
- PBS job script
- qsub attributes
Examples
Chaining multiple jobs
Further documentation

Portable Batch System (or simply PBS) is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources.

Running applications using PBS

With PBS you specify the tasks to be executed; the system takes care of running these tasks and returns the results to you. If the available computers are full, then PBS holds your work and runs it when the resources are available.

With PBS you create a batch job which you then submit to PBS. A batch job is a file (a shell script under UNIX) containing the set of commands you want to run. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of processors and CPU time) that your job needs.

Once you create your PBS job, you can reuse it if you wish. Or, you can modify it for subsequent runs.

For example, here is a simple PBS batch job to run a user's application by setting a limit (one hour) to the maximum wall clock time, requesting 1 node with 1 cpus and sending the job in the "parallel" scheduler queue:

#!/bin/bash
#PBS -A <account_no>               (only for account based usernames)
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=1 
#PBS -q parallel
#
./my_application

PBS provides two user interfaces: a command line interface (CLI) and a graphical user interface (GUI). The CLI lets you type commands at the system prompt. Only the CLI is presented here, if you are interested in the GUI, please refer to the official documents.

The available queues on Eurora depend on the time slot (primetime, from 10 am to 6 pm weekdays, and non-primetime, from 6 pm to 10 am weekdays, and from Friday 6 pm to Monday 10 am), and are defined as follows:

 job type  Max nodes   Max CPUs/   max wall time   time slot       Usage
                    GPUs or MICs
-------------------------------------------------------------------------------------------
debug         2         32/ 4        0:30:00      always          Debugging
p_devel       2         32/ 4        1:00:00      primetime       Developing
parallel     32        512/64        4:00:00      always          Parallel production
np_longpar    9        144/18        8:00:00      non-primetime   Long parallel production

With the exception of the p_devel queue, you do not need to specify the job type: according to the resources (number of CPUs/GPUs/MICs) and the walltime requested, the scheduler will automatically route your job to the appropriate queue. On the contrary, since the p_devel queue is defined on specific nodes with a different configuration (tuned for the optimal use of profiling and performance analysis tools), you need to explicitly specify its use via the flags -q of qsub or the directive #PBS -q <queue_name>.

PBS commands

The main user's commands of PBS are reported in the table below: Please consult the man pages for more information.

qsub	Submit a job
qstat	Status job, queue, Server
qdel	Delete job

Submit a job:

> qsub [opts ] job_script
> qsub -I [opts] -- /bin/bash  (interactive job)

The second command is related to a so-called "Interactive job": sets job's interactive attribute to TRUE. The job is queued and scheduled as any PBS batch job, but when executed, the standard input, output, and error streams of the job are connected to the terminal session in which qsub is running. When the job begins execution, all input to the job is taken from the terminal session. Use CNTR-D to close the session.

Displaying Job Status:

> qstat             (only jobs submitted by the user, same as qstat -u $USER)
> qstat    <job_id> (only the specified job)
> qstat -f <job_id> (full display of the specified job)

Displaying Queue Status:

> qstat -Q 
> qstat -Qf  <queuename>     (Long format of the specified  queue)

Delete a job:

> qdel  <jobID>.node351

More information about these commands are available with the man command

The User Environment

There are a number of environment variables provided to the PBS job. Some of them are taken from the user's environment and carried with the job. Others are created by PBS.

All PBS-provided environment variable names start with the characters PBS_. Some are then followed by a capital O (PBS_O_) indicating that the variable is from the job's originating environment (i.e. the user's).

The following short example lists some of the more useful variables, and typical values:

PBS_JOBNAME=jobb
PBS_ENVIRONMENT=PBS_BATCH
PBS_JOBID=6207.node129
PBS_QUEUE=parallel

PBS_O_WORKDIR=/gpfs/scratch/usercin/aer0
PBS_O_HOME=/eurora/home/usercin/aer0
PBS_O_QUEUE=parallel
PBS_O_LOGNAME=aer0
PBS_O_SHELL=/bin/bash
PBS_O_HOST=node129
PBS_O_MAIL=/var/spool/mail/aer0
PBS_O_PATH=/cineca/bin:/cineca/sysprod/pbs/default/bin: ...

There are a number of ways that you can use these environment variables to make more efficient use of PBS. For example PBS_ENVIRONMENT can be used to test if we were running under PBS. Another commonly used variable is PBS_O_WORKDIR which contains the name of the directory from which the user submitted the PBS job

(NOTE: PBS does execute job script in the $HOME directory by default!)

PBS Resources

A job requests resources through the PBS sintax; PBS matches requested resources with available resources, according to rules defined by the administrator; when resources are allocated to the job, the job can be executed.

There are different types of resources, i.e. server level resources like walltime, and chunked reources like number of cpus or nodes. Other resources may be added to manage access to software resources for example, particularly when resources are limited and lack of their availability leads to aborting the jobs when they are scheduled for execution. Details may be found in documentation (module help) of the interested applications.

The sintax of the request depends on which type is concerned:

#PBS -l <resourse>=<value>        (server level resources, e.g. walltime)
#PBS -l select=[N:]chunk[+[N:]chunk ...] (chunk resources, e.g. cpus, gpus,mpiprocs)

For example:

#PBS -l walltime=10:00
#PBS -l select=1:ncpus=1

Moreover, resources can be required in one of two possible ways:

1) using PBS directives in the job script

2) using options of the qsub command

PBS job script

A PBS job script consists of:

An optional shell specification
PBS directives
Tasks -- programs or commands

Once ready, the job must be submitted to PBS:

> qsub [options] <name of script>

The shell to be used by PBS, if given, is defined in the first line of the job script:

#!/bin/bash

PBS directives are used to request resources or set attributes. A directive begins with the default string #PBS. One or more directives can follow the shell definition in the job script.

The tasks can be programs or commands. This is where the user specifies an application to be run.

PBS directives: number and type of processors

The number and the type (at 2 GHz or 3GHz) of cpus required for a serial or parallel MPI/OpenMP/mixed job must be required with the "select" directive:

#PBS -l select=NN:ncpus=CC:cpuspeed=HHGHz:ngpus=GG:mpiprocs=TT[+N1....]

where:

NN = number of nodes (max depending on the queue)
CC = number of physical cpus per node (max 16)
HH = CPU clock rate
GG = number of physical gpus per node (max 2)
TT = number of MPI tasks per node

for example:

#PBS -l select=1:ncpus=1             --> serial job
#PBS -l select=2:ncpus=8:mpiprocs=8  --> MPI job (2 nodes and 8 proc per node)
#PBS -l select=2:ncpus=8:mpiprocs=1  --> mixed job (2 MPI tasks and 8 threads/task)
#PBS -l select=1:ncpus=16:mpiprocs=16:ngpus=2     
                                     --> MPI job with cuda (16 MPI tasks and 2 GPUs)
#PBS -l select=2:ncpus=16:mpiprocs=16+1:ncpus=5:mpiprocs=5 
                                     --> 37 MPI tasks on three nodes: 16+16+5

Please note that hyper-threadind is disabled, so it is warmly recommended to set a value for mpiprocs less or equal to ncpus. If you specify a higher number of mpiprocs you will overload the physical cores and slow down your own execution.

PBS directives: processing time

Resources as the computing time, must be requested in this form:

#PBS -l walltime=<value>

where <value> express the actual elapsed time (wall-clock) in the format hh:mm:ss

for example:

#PBS -l walltime=1:00:00 (one hour)

PBS directives: memory allocation

The default memory is 1 GB per node (for the classes debug, p_devel, parallel and np_longpar).

The user can specify the requested memory up to 14 GB, on 59 nodes:

#PBS -l select=NN:ncpus=CC :mpiprocs=TT:mem=14GB

There are 5 nodes with 32 GB, that you can require by specifying a value of mem larger than 16 GB up to 30 GB.

Other PBS directives

#PBS -N name        --> specify job name
#PBS -q destination --> specify the queue destination. For a list a description of available 
                        queues, please refer to the specific cluster description of the guide
#PBS -o path        --> redirects output file
#PBS -e path        --> redirects error file
#PBS -j eo          --> merge std-err and std-out
#PBS -m mail_events --> specify email notification (a=aborted,b=begin,e=end,n=no_mail)
#PBS -M user_list   --> set email destination (email addr)

qsub attributes

The attributes can also be set using the qsub command options:

> qsub [-N name]  [-q destination]  [-o path] [-e path] [-j eo]
       [-m mail_events] [-M user_list]                          <name of script>

The resources can also be required using the qsub command options.

> qsub  [-l walltime=<value>] [-select=<value>] [-A <account_no>] <name of script>

The qsub command options override directives.

Examples

Serial job script

For a typical serial job you can take the following script as a template, modifying it depending on your needs.

The script asks for 10 minutes wallclock time and runs a serial application (R). The input data are in file "data", the output file is "out.txt"; job.out will contain the std-out and std-err of the script. The working directory is $CINECA_SCRATCH/test/.

The account number (PBS -A) is required only for users defined on a username/account basis. To find out your account number, please use the "saldo -b" command.

#!/bin/bash
#PBS -o job.out
#PBS -j eo
#PBS -l walltime=0:10:00
#PBS -l select=1:ncpus=1
#PBS -q debug
#PBS -A <my_account>
# 
cd $CINECA_SCRATCH/test/ 
module load R 
R < data > out.txt

OpenMP job script

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=1:ncpus=16:mpiprocs=1
#PBS -o job.out
#PBS -e job.err
#PBS -q parallel
#PBS -A <my_account>

cd $PBS_O_WORKDIR

module load intel openmpi 
./myprogram

MPI Job Scripts

For a typical MPI job you can take the following script as a template, modifying it depending on your needs.

The script asks for 32 tasks and 1 hour of wallclock time, and runs a MPI application (myprogram) compiled with the intel compiler and the openmpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from.

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=2:ncpus=16:mpiprocs=16        # 2 nodes, 16 procs/node = 32 MPI tasks
#PBS -o job.out
#PBS -e job.err
#PBS -q parallel
#PBS -A <my_account>

cd $PBS_O_WORKDIR ! this is the dir where the job was submitted from

module load intel openmpi
mpirun ./myprogram < myinput > myoutput

MPI+OpenMP job script

The script asks for 2 MPI tasks and 16 OpenMP threads/node, 1 hours of wallclock time. The application (myprogram) was compiled with the intel compiler and the openmpi library. The input data are in file "myinput", the output file is "myoutput", the working directory is where the job was submitted from.

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l select=2:ncpus=16:mpiprocs=1
#PBS -o job.out
#PBS -e job.err
#PBS -q parallel
#PBS -A <my_account>

cd $PBS_O_WORKDIR

module load intel openmpi 
mpirun ./myprogram

Serial job script using 1 GPU

For a typical serial job using 1 GPU you can take the following script as a template, modifying it depending on your needs.

The script asks for 30 minutes wallclock time and runs a serial application on the GPU resource.

#!/bin/bash 
#PBS -l walltime=30:00 
#PBS -l select=1:ncpus=1:ngpus=1 
#PBS -o job.out 
#PBS -e job.err 
#PBS -q debug 
#PBS -A <my_account> 
cd $PBS_O_WORKDIR 

./myCUDAprogram

Chaining multiple jobs

In some cases, you may want to chain multiple jobs together, in a way that the output of a run can be used as input of the next run. This is tipycal when you perform Molecular Dynamics Simulations and you want to obtain a long trajectory from multiple simulation runs.

In order to exploit this feature you need to submit the job with the following syntax:

> qsub -W depend=afterok:JOBID.node129 job.sh

where JOBID is the job id (e.g. 2688) of the job you want to concatenate. There are multiple choices for the dependency, please refer to the PBS manual (PBS Professional User Guide 10.0) at page 164.

Further documentation

Man pages are available on the PLX system:

> man pbs
> man qsub
> man qstat
> man qdel
> man ...

The "PBS Professional User Guide 10.0" is available in the "Other Documents" folder of the hpc.cineca.it portal.

Page tree