#acl EditorsGroup:read,write All:read <> ---- At the NAF SGE is used as batch system. It is similar to LSF or PBS but has a different approache to defining resources. CPU time and memory are resources. For more details see [[http://naf.desy.de/general_naf_docu/working_with_the_local_batch_system|NAF:Batch System]]. == Submitting Jobs == Jobs are submitted using the qsub command. Resources, which are important for Athena jobs, are set using the -l option: * time: `-l h_cpu=12:00:00` (hard limit for CPU time: 12 hours) * virtual memory: `-l h_vmem=1G` (hard limit for virtual memory: 1 GB) * stack for threads (needed for athena): `-l h_stack=10M` (hard limit for stack size: 10 MB) * scratch disk space on workernode: `-l h_fsize=10G` (hard limit for space: 10 GB) * hostname: `-l hostname=tcx040` (run job only on host with hostname tcx040) * site (hh|zn): `-l site=hh' (run jobs preferential at hh, useful if you process data from /scratch) Either put the option on the command line or into the job script using the special comment #$. For more information about resources see the [[http://naf.desy.de/general_naf_docu/working_with_the_local_batch_system|general documentation]] or e. g. man complex, for a list of available resource per queue use qstat -F. For additional options see man qsub. Memory consumption for standard ATLAS jobs: * generation: 500M * simulation: 500M * reconstruction 2G+ * ntuple analysis: 500M Example script (for athena): {{{ #! /bin/zsh #! /bin/zsh echo -e "\nJob started at "`date`" on "`hostname --fqdn`"\n" # (only accept jobs with correct resources) #$ -w e # # (stderr and stdout are merged together to stdout) #$ -j y # # (send mail on job's end and abort) #$ -m ae -M max.mustermann@desy.de # # (put log files into current working directory) #$ -cwd # # (use ATLAS project) #$ -P atlas # # (choose memory and disc usage) #$ -l h_vmem=1G #$ -l h_stack=10M #$ -l h_fsize=10G # # (choose time) #$ -l h_cpu=12:00:00 # change to scratch directory cd $TMPDIR # if you need input files, you might consider copying them to the local working directory # cp /scratch/input/bigfile . # setup athena echo seting up athena asetup 15.6.12 --testarea=$HOME/atlas/testarea/15.6.12 ini atlas atlas_test_config.sh # run athena athena.py AthExHelloWorld/AthExHelloWorld_jobOptions.py # save your output in your afs directory # cp output /scratch/output/bigfile echo -e "\nJob stoped at "`date`" on "`hostname --fqdn`"\n" exit 0 }}} Please not, that the #$ syntax is used to specify options to the batch system. The lines starting with #$ are read at job submision time and are interepreted as options to the qsub command. == Monitoring Jobs == Job status can be monitored with the `qstat` or the `qmon` command. The job meta data (requested resources, used resources, job status) of all finished jobs are recorded in a data base, which can be queried using a [[https://www-zeuthen.desy.de/dv-bin/batchssl/stat/naf/jobs/|web interface]]. `qmon` also offers not only monitoring possibilities but also things like suspending, holding and resubmitting jobs in a graphical user interface. == Good Practice == * If you write out big files, write them first to the local disk and at the end of the job to some long term storage. == Ganga == Ganga can submit jobs directly to the SGE batch system. Support for Athena and AthenaMC is becomming better and better over ime. Still, there are a few limitations. In practise you need to add a `SGE()` object as backend: {{{ j.backend = SGE() j.backend.extraopts ='-l h_cpu=0:10:00 -l h_vmem=0.5G -l site=zn' }}} This example also shows, how to change the requested resources for this job. == Documentation == The SGE man pages are installed at the NAF, e. g. `man qsub`. For further information see [[http://naf.desy.de/general_naf_docu/working_with_the_local_batch_system|NAF: Batch System]].