ATLAS-D Tutorial 2009: TAGs

The introducory talk can be found in Indico or in the tags.pdf.

For this tutorial we will work with the egamma stream of the TopMix sample (version 5) produced by Richard Hawkings:

The objective is to make a mixed event sample (initially of about 200 pb^-1 at 10 TeV), representing the events selected (including background) for typical ttbar and single top selections (semileptonic and dileptonic selections). The mixing will be done at the AOD level, taking events from all the different signal and background samples, renumbering the events and removing the Monte Carlo truth. The resulting AOD events will then be written out to multiple AOD streams (electron, muon etc) as for real data. This sample should then be as similar to real data as possible, and provide a good basis for exercising data-driven analysis in the last months before real data is available.

For more details the the ATLAS TopMixing TWiki.

The egamma stream user.RichardHawkings.0108175.topmix_Egamma.AOD.v5 is replicated to many sites, especially DESY-ZN_PHYS-TOP, which is local to the NAF.

The TAG dataset is copied to /afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5.

Inspecting TAG files with ROOT

In this part of the exercise we will use plain ROOT for looking at the content of one TAG file from the TopMix egamma stream. Start from a new shell and setup ROOT from the ATLAS kit:

source ~/cmthome/setup.sh -tag=15.3.1

We will create a few files and collect them in the $TestArea/tags directory:

mkdir -p $TestArea/tags
cd $TestArea/tags

ln -s /afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root .

Now, we can open the first TAG ROOT file and look at some variables using the TBrowser:

root user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root

Within ROOT do the following:

root [0] TBrowser b

Note, that this TAG file was produced with release 14 and therefore the TAG information is stored in the CollectionTree. Now, inspect the TAG information:

Here is a selection of plots:
tagfile.png

The TAG file contains enough information to reconstruct the full four-vector of the candidates. For example, you can reconstruct the electron candidates and calculate the invariant mass of two electron pairs. This could look like:
Zee_peak.png

If you just want to check the names of the TAG variables within a file, you can either use ROOT

root [1] CollectionTree->Print()

or the CollListAttrib command from the ATLAS software:

CollListAttrib -src user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001 RootCollection

The argument after -src is the TAG ROOT filename without the .root extension.

The output looks like

--------------------------------------------------------------
Collection list:
NAME: user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001   TYPE: RootCollection   NFRAG: 1
--------------------------------------------------------------
Number of Tokens is: 3
Tokens are:
 NAME: StreamAOD_ref      TYPE: Token      INFO:
 NAME: Stream1_ref        TYPE: Token      INFO:
 NAME: StreamESD_ref      TYPE: Token      INFO:
--------------------------------------------------------------
Number of Attributes is: 261
Attributes are:
 NAME: EventNumber                  TYPE: unsigned int        INFO:
 NAME: LumiBlockN                   TYPE: unsigned int        INFO:
 NAME: Luminosity                   TYPE: float               INFO:
 NAME: NTrk                         TYPE: unsigned int        INFO:
 NAME: Nvx                          TYPE: unsigned int        INFO:
 NAME: RandomNumber                 TYPE: float               INFO:
 NAME: RunNumber                    TYPE: unsigned int        INFO:
 NAME: Stream                       TYPE: unsigned int        INFO:
 NAME: TimeStamp                    TYPE: unsigned int        INFO:
 NAME: VtxX                         TYPE: float               INFO:
 NAME: VtxY                         TYPE: float               INFO:
 NAME: VtxZ                         TYPE: float               INFO:
 NAME: isCalibration                TYPE: bool                INFO:
 NAME: isRealData                   TYPE: bool                INFO:
 NAME: isSimulation                 TYPE: bool                INFO:
 NAME: isTestBeam                   TYPE: bool                INFO:
 NAME: LooseElectronEta1            TYPE: float               INFO:
 NAME: LooseElectronEta2            TYPE: float               INFO:
 NAME: LooseElectronEta3            TYPE: float               INFO:
 NAME: LooseElectronEta4            TYPE: float               INFO:
 NAME: LooseElectronPhi1            TYPE: float               INFO:
 NAME: LooseElectronPhi2            TYPE: float               INFO:
 NAME: LooseElectronPhi3            TYPE: float               INFO:
 NAME: LooseElectronPhi4            TYPE: float               INFO:
 NAME: LooseElectronPt1             TYPE: float               INFO:
 NAME: LooseElectronPt2             TYPE: float               INFO:
 NAME: LooseElectronPt3             TYPE: float               INFO:
 NAME: LooseElectronPt4             TYPE: float               INFO:
 NAME: LooseElectronTightness1      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness2      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness3      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness4      TYPE: unsigned int        INFO:
 NAME: NLooseElectron               TYPE: unsigned int        INFO:

....

 NAME: SMWord                       TYPE: unsigned int        INFO:
 NAME: SUSYWord                     TYPE: unsigned int        INFO:
 NAME: TauIdWord                    TYPE: unsigned int        INFO:
 NAME: TopWord                      TYPE: unsigned int        INFO:
 NAME: DatasetID                    TYPE: int                 INFO:
 NAME: Fraction                     TYPE: float               INFO:
---------------------------------------------------------

Number of collections scanned is: 1

AOD Skimming using TAG files

First, we will create some simple job options to produce an AOD using TAG files. Afterwards, we will add more code, to do further event selection using TAG variables.

If you start from a new shell, go into the $TestArea/tags directory and setup athena:

source $HOME/cmthome/setup.sh -tag=15.3.1
mkdir -p $TestArea/tags
cd $TestArea/tags

At the moment the FilePeeker do not like the $CMTHOME variable as it is. If you see problems like

Py:AthFile           INFO opening [user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root]...
[26607] Traceback (most recent call last):
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 94, in forking
[26607]     result = func(*args, **kwargs)
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 453, in fopen_impl
[26607]     infos = FilePeeker(fname, self)()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 441, in __call__
[26607]     f = self._process_call()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 340, in _process_call
[26607]     file_type, file_name = _ftype(self.fname)
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 179, in _ftype
[26607]     with H.restricted_ldenviron(projects=['AtlasCore']):
[26607]   File "/tmp/atlas/kits/15.3.1/sw/lcg/external/Python/2.5.4/slc4_ia32_gcc34/lib/python2.5/contextlib.py", line 15, in __enter__
[26607]     return self.gen.next()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Helpers.py", line 108, in restricted_ldenviron
[26607]     cmt = CmtWrapper()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Cmt.py", line 273, in __init__
[26607]     assert len(self.projects_dag())>0, "empty projects-DAG tree: corrupted CMT environment ?"
[26607]   File "<string>", line 2, in projects_dag
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 41, in memoize
[26607]     mem_dict[args] = result = func(*args)
[26607] KeyError: 'CMTHOME'
Py:Athena            INFO leaving with code 0: "successful run" 

use the following fix:

ini atlas
fix_cmthome

If you want to use a TAG file for AOD skimming, basically you need to tell athena to read in the TAG file, where to find the corresponding AODs and then produce either a new TAG or AOD file. The TAG file(s) are set in the standard way, using the PoolTAGInput property from athenaCommonFlags. The corresponding input AOD files are resolved at run time using the PoolSvc. The TAGs include the GUID from the corresponding AOD and the PoolSvc is used to resovle the GUID to a filename. Hence we need to produce a PoolFileCatalog.xml file for the user.RichardHawkings.0108175.topmix_Egamma.AOD.v5 dataset. This can be done with dq2-ls -f -p -P -G user.RichardHawkings.0108175.topmix_Egamma.AOD.v5 or the prepared file can be taken from /afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml.

The following job options are based on the RecExCommon/aodtotag.py job options. Do a copy and paste into your working shell and the file mytagtoaodtag.py will be created.

cat > mytagtoaodtag.py << EOF
# steering file for AOD->AOD step based on tags using RecExCommon
from RecExConfig.RecFlags import rec

# turn of unnessary things
rec.doCBNT.set_Value_and_Lock(False)

# enable TAG/AOD reading and TAG/AOD writing if needed
rec.readTAG.set_Value_and_Lock(True)
rec.readAOD.set_Value_and_Lock(True)
rec.doWriteAOD.set_Value_and_Lock(True)
rec.doWriteTAG.set_Value_and_Lock(False) # change to True if do write TAG

# use the first tag file for processing
from AthenaCommon.AthenaCommonFlags import athenaCommonFlags
athenaCommonFlags.PoolTAGInput = ["user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root"]

# setup the PoolSvc to find the needed input AOD files
from AthenaCommon.AppMgr import ServiceMgr as svcMgr  
include( "AthenaPoolCnvSvc/ReadAthenaPool_jobOptions.py" )
from PoolSvc.PoolSvcConf import PoolSvc
svcMgr.PoolSvc.ReadCatalog +=["xmlcatalog_file:/afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml"]

# define an AOD outputname if not done already
if not 'PoolAODOutput' in dir():
    PoolAODOutput="tagsel_AOD.pool.root"

# include main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")

EOF

Note the leading xmlcatalog_file: in fron of the filename of the PoolFileCatalog.xml file.

Now, lets check if the job options are working:

athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.txt

You should have produced an AOD called tagsel_AOD.pool.root. Run checkFile.py on it to see how many events are in the file. From the permon output in the log file log.txt you should see a processing rate of around 1 second per event. Still, most of the time is spend in the initialisation of the job.

Now you should be able to use a different TAG file or use a list of TAG files.

In a second step we will refine our TAG selection further. We will look for Z -> ee events in the TopMix sample. For this we require:

Add the following line to the file mytagtoaodtag.py:

athenaCommonFlags.PoolInputQuery="NLooseElectron>1 && abs(LooseElectronPt1)>20000. && abs(LooseElectronPt2)>20000."

Note, that the standard ATLAS units for energies (MeV) need to be used and that the pT of the electron is multiplies by its charge. Hence you need to use the abs() function.

Also change the value of the rec.doWriteTAG to True so that we create a new TAG file with only the selected events.

Now, lets have a second go:

athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.sel.txt

The event processing rate should be something like ~20 seconds per event now. You can estimate the skimming efficiencies from the event number of the processed events. Note, that only selected events are counted. The skimming efficiency is around 1%.

How long would you think the proposed event selection for the 2.85 million events from the egamma TopMix stream would take? Too long for interactive work.

Now, you can send the job to the Grid using Ganga.

Using Ganga for TAG based AOD skimming

Now, that you have a job locally running, you can use Ganga to do the full processing. See section 8.6 of the full Ganga tutorial for instructions. This is optional.

TAG file creation using ELSSI

For the rest of the tutorial we will work with the ELSSI web service. It is located at https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm.

Again, we will work with the egamma TopMix stream (version 5). There are two Oracle database instances hosting this sample: CERN and DESY. Go to https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm and selection either CERN or DESY as data location and then TopMix as data source. Further, select the egamma stream and run number 108175. Add this selection by pressing the corresponding button and then start ELSSI by clicking on the Continue to event selection.

This time we will do the full event selection:

Create your query, review and perform it. Do the following queries:

For more details on ELSSI check out another ELSSI tutorial.

Download the new TAG file, plot the number of events and estimate the skimming rate.

ATLAS: WorkBook/NAF/ADT09TAG (last edited 2009-09-29 12:38:22 by WolfgangEhrenfeld)