At the NAF SGE is used as batch system. It is similar to LSF or PBS but has a different approache to defining resources. CPU time and memory are resources.

For more details see NAF:Batch System.

Submitting Jobs

Jobs are submitted using the qsub command. Resources, which are important for Athena jobs, are set using the -l option:

Either put the option on the command line or into the job script using the special comment #$. For more information about resources see the general documentation or e. g. man complex, for a list of available resource per queue use qstat -F. For additional options see man qsub.

Memory consumption for standard ATLAS jobs:

Example script (for athena):

#! /bin/zsh

echo -e "\nJob started at "`date`" on "`hostname --fqdn`"\n"

# (only accept jobs with correct resources)
#$ -w e
#
# (stderr and stdout are merged together to stdout)
#$ -j y
#
# (send mail on job's end and abort)
#$ -m ae -M max.mustermann@desy.de
#
# (put log files into current working directory)
#$ -cwd
#
# (use ATLAS project)
#$ -P atlas
#
# (choose memory and disc usage)
#$ -l h_vmem=1G
#$ -l h_stack=10M
#$ -l h_fsize=10G
#
# (choose time)
#$ -l h_cpu=12:00:00

# change to scratch directory
cd $TMPDIR

# if you need input files, you might consider copying them to the local working directory
# cp /scratch/input/bigfile .

# setup athena
echo seting up athena
asetup 15.6.12 --testarea=$HOME/atlas/testarea/15.6.12
ini atlas
atlas_test_config.sh

# run athena
athena.py AthExHelloWorld/AthExHelloWorld_jobOptions.py

# save your output in your afs directory
# cp output /scratch/output/bigfile

echo -e "\nJob stoped at "`date`" on "`hostname --fqdn`"\n"

exit 0

Please not, that the #$ syntax is used to specify options to the batch system. The lines starting with #$ are read at job submision time and are interepreted as options to the qsub command.

Monitoring Jobs

Job status can be monitored with the qstat or the qmon command. The job meta data (requested resources, used resources, job status) of all finished jobs are recorded in a data base, which can be queried using a web interface.

qmon also offers not only monitoring possibilities but also things like suspending, holding and resubmitting jobs in a graphical user interface.

Good Practice

Ganga

Ganga can submit jobs directly to the SGE batch system. Support for Athena and AthenaMC is becomming better and better over ime. Still, there are a few limitations. In practise you need to add a SGE() object as backend:

j.backend = SGE()
j.backend.extraopts ='-l h_cpu=0:10:00 -l h_vmem=0.5G -l site=zn'

This example also shows, how to change the requested resources for this job.

Documentation

The SGE man pages are installed at the NAF, e. g. man qsub. For further information see NAF: Batch System.

ATLAS: WorkBook/NAF/Batch System (last edited 2010-09-06 20:53:46 by WolfgangEhrenfeld)