#acl EditorsGroup:read,write All:read
<<TableOfContents(2)>>

= ATLAS-D Tutorial 2009: TAGs =

The introducory talk can be found in [[http://indico.cern.ch/getFile.py/access?contribId=9&sessionId=23&resId=0&materialId=slides&confId=52623|Indico]] or in the [[attachment:tags.pdf]]. <<BR>> <<BR>>

For this tutorial we will work with the egamma stream of the !TopMix sample (version 5) produced by Richard Hawkings: <<BR>> <<BR>>

The objective is to make a mixed event sample (initially of about 200 pb^-1 at 10 TeV), representing the events selected (including background) for typical ttbar and single top selections (semileptonic and dileptonic selections). The mixing will be done at the AOD level, taking events from all the different signal and background samples, renumbering the events and removing the Monte Carlo truth. The resulting AOD events will then be written out to multiple AOD streams (electron, muon etc) as for real data. This sample should then be as similar to real data as possible, and provide a good basis for exercising data-driven analysis in the last months before real data is available. <<BR>> <<BR>>

For more details the the [[https://twiki.cern.ch/twiki/bin/view/AtlasProtected/TopMixingExercise|ATLAS TopMixing TWiki]]. <<BR>> <<BR>>

The egamma stream `user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` is replicated to many sites, especially `DESY-ZN_PHYS-TOP`, which is local to the NAF.

The TAG dataset is copied to `/afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5`.

== Inspecting TAG files with ROOT ==

In this part of the exercise we will use plain ROOT for looking at the content of one TAG file from the !TopMix egamma stream. Start from a new shell and setup ROOT from the ATLAS kit:
{{{
source ~/cmthome/setup.sh -tag=15.3.1
}}}

We will create a few files and collect them in the `$TestArea/tags` directory:
{{{
mkdir -p $TestArea/tags
cd $TestArea/tags

ln -s /afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root .
}}}

Now, we can open the first TAG ROOT file and look at some variables using the TBrowser:
{{{
root user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root
}}}
Within ROOT do the following:
{{{
root [0] TBrowser b
}}}

Note, that this TAG file was produced with release 14 and therefore the TAG information is stored in the `CollectionTree`. Now, inspect the TAG information:
 * Plot at least the event number, the number of loose electrons and the pT of the first electrons.
 * Check some of the detector status variables`StatusXXXX` to see what kind of status is stored in this stream. Okay, everything should be GREEN(3). 
 * Check some trigger information for L1 and EF. Can you tell if the `EF_e20_loose` trigger fired?

Here is a selection of plots: <<BR>>
{{attachment:tagfile.png}} <<BR>>

The TAG file contains enough information to reconstruct the full four-vector of the candidates. For example, you can reconstruct the electron candidates and calculate the invariant mass of two electron pairs. This could look like: <<BR>>
{{attachment:Zee_peak.png}} <<BR>>

If you just want to check the names of the TAG variables within a file, you can either use ROOT
{{{
root [1] CollectionTree->Print()
}}}
or the `CollListAttrib` command from the ATLAS software:
{{{
CollListAttrib -src user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001 RootCollection
}}}
The argument after `-src` is the TAG ROOT filename without the `.root` extension.

The output looks like
{{{
--------------------------------------------------------------
Collection list:
NAME: user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001   TYPE: RootCollection   NFRAG: 1
--------------------------------------------------------------
Number of Tokens is: 3
Tokens are:
 NAME: StreamAOD_ref      TYPE: Token      INFO:
 NAME: Stream1_ref        TYPE: Token      INFO:
 NAME: StreamESD_ref      TYPE: Token      INFO:
--------------------------------------------------------------
Number of Attributes is: 261
Attributes are:
 NAME: EventNumber                  TYPE: unsigned int        INFO:
 NAME: LumiBlockN                   TYPE: unsigned int        INFO:
 NAME: Luminosity                   TYPE: float               INFO:
 NAME: NTrk                         TYPE: unsigned int        INFO:
 NAME: Nvx                          TYPE: unsigned int        INFO:
 NAME: RandomNumber                 TYPE: float               INFO:
 NAME: RunNumber                    TYPE: unsigned int        INFO:
 NAME: Stream                       TYPE: unsigned int        INFO:
 NAME: TimeStamp                    TYPE: unsigned int        INFO:
 NAME: VtxX                         TYPE: float               INFO:
 NAME: VtxY                         TYPE: float               INFO:
 NAME: VtxZ                         TYPE: float               INFO:
 NAME: isCalibration                TYPE: bool                INFO:
 NAME: isRealData                   TYPE: bool                INFO:
 NAME: isSimulation                 TYPE: bool                INFO:
 NAME: isTestBeam                   TYPE: bool                INFO:
 NAME: LooseElectronEta1            TYPE: float               INFO:
 NAME: LooseElectronEta2            TYPE: float               INFO:
 NAME: LooseElectronEta3            TYPE: float               INFO:
 NAME: LooseElectronEta4            TYPE: float               INFO:
 NAME: LooseElectronPhi1            TYPE: float               INFO:
 NAME: LooseElectronPhi2            TYPE: float               INFO:
 NAME: LooseElectronPhi3            TYPE: float               INFO:
 NAME: LooseElectronPhi4            TYPE: float               INFO:
 NAME: LooseElectronPt1             TYPE: float               INFO:
 NAME: LooseElectronPt2             TYPE: float               INFO:
 NAME: LooseElectronPt3             TYPE: float               INFO:
 NAME: LooseElectronPt4             TYPE: float               INFO:
 NAME: LooseElectronTightness1      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness2      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness3      TYPE: unsigned int        INFO:
 NAME: LooseElectronTightness4      TYPE: unsigned int        INFO:
 NAME: NLooseElectron               TYPE: unsigned int        INFO:

....

 NAME: SMWord                       TYPE: unsigned int        INFO:
 NAME: SUSYWord                     TYPE: unsigned int        INFO:
 NAME: TauIdWord                    TYPE: unsigned int        INFO:
 NAME: TopWord                      TYPE: unsigned int        INFO:
 NAME: DatasetID                    TYPE: int                 INFO:
 NAME: Fraction                     TYPE: float               INFO:
---------------------------------------------------------

Number of collections scanned is: 1
}}}


== AOD Skimming using TAG files ==

First, we will create some simple job options to produce an AOD using TAG files. Afterwards, we will add more code, to do further event selection using TAG variables. <<BR>> <<BR>>

If you start from a new shell, go into the `$TestArea/tags` directory and setup athena:
{{{
source $HOME/cmthome/setup.sh -tag=15.3.1
mkdir -p $TestArea/tags
cd $TestArea/tags
}}}

At the moment the `FilePeeker` do not like the $CMTHOME variable as it is. If you see problems like
{{{
Py:AthFile           INFO opening [user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root]...
[26607] Traceback (most recent call last):
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 94, in forking
[26607]     result = func(*args, **kwargs)
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 453, in fopen_impl
[26607]     infos = FilePeeker(fname, self)()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 441, in __call__
[26607]     f = self._process_call()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 340, in _process_call
[26607]     file_type, file_name = _ftype(self.fname)
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 179, in _ftype
[26607]     with H.restricted_ldenviron(projects=['AtlasCore']):
[26607]   File "/tmp/atlas/kits/15.3.1/sw/lcg/external/Python/2.5.4/slc4_ia32_gcc34/lib/python2.5/contextlib.py", line 15, in __enter__
[26607]     return self.gen.next()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Helpers.py", line 108, in restricted_ldenviron
[26607]     cmt = CmtWrapper()
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Cmt.py", line 273, in __init__
[26607]     assert len(self.projects_dag())>0, "empty projects-DAG tree: corrupted CMT environment ?"
[26607]   File "<string>", line 2, in projects_dag
[26607]   File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 41, in memoize
[26607]     mem_dict[args] = result = func(*args)
[26607] KeyError: 'CMTHOME'
Py:Athena            INFO leaving with code 0: "successful run" 
}}}
use the following fix:
{{{
ini atlas
fix_cmthome
}}}

If you want to use a TAG file for AOD skimming, basically you need to tell athena to read in the TAG file, where to find the corresponding AODs and then produce either a new TAG or AOD file.
The TAG file(s) are set in the standard way, using the `PoolTAGInput` property from `athenaCommonFlags`. The corresponding input AOD files are resolved at run time using the !PoolSvc. The TAGs include the GUID from the corresponding AOD and the !PoolSvc is used to resovle the GUID to a filename. Hence we need to produce a `PoolFileCatalog.xml` file for the `user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` dataset. This can be done with `dq2-ls -f -p -P -G user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` or the prepared file can be taken from `/afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml`.

The following job options are based on the `RecExCommon/aodtotag.py` job options. Do a copy and paste into your working shell and the file `mytagtoaodtag.py` will be created.
{{{
cat > mytagtoaodtag.py << EOF
# steering file for AOD->AOD step based on tags using RecExCommon
from RecExConfig.RecFlags import rec

# turn of unnessary things
rec.doCBNT.set_Value_and_Lock(False)

# enable TAG/AOD reading and TAG/AOD writing if needed
rec.readTAG.set_Value_and_Lock(True)
rec.readAOD.set_Value_and_Lock(True)
rec.doWriteAOD.set_Value_and_Lock(True)
rec.doWriteTAG.set_Value_and_Lock(False) # change to True if do write TAG

# use the first tag file for processing
from AthenaCommon.AthenaCommonFlags import athenaCommonFlags
athenaCommonFlags.PoolTAGInput = ["user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root"]

# setup the PoolSvc to find the needed input AOD files
from AthenaCommon.AppMgr import ServiceMgr as svcMgr  
include( "AthenaPoolCnvSvc/ReadAthenaPool_jobOptions.py" )
from PoolSvc.PoolSvcConf import PoolSvc
svcMgr.PoolSvc.ReadCatalog +=["xmlcatalog_file:/afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml"]

# define an AOD outputname if not done already
if not 'PoolAODOutput' in dir():
    PoolAODOutput="tagsel_AOD.pool.root"

# include main jobOption
include ("RecExCommon/RecExCommon_topOptions.py")

EOF
}}}

Note the leading `xmlcatalog_file:` in fron of the filename of the `PoolFileCatalog.xml` file.

Now, lets check if the job options are working:
{{{
athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.txt
}}}

You should have produced an AOD called `tagsel_AOD.pool.root`. Run `checkFile.py` on it to see how many events are in the file. From the permon output in the log file `log.txt` you should see a processing rate of around 1 second per event. Still, most of the time is spend in the initialisation of the job.

Now you should be able to use a different TAG file or use a list of TAG files. <<BR>> <<BR>>

In a second step we will refine our TAG selection further. We will look for Z -> ee events in the !TopMix sample. For this we require:
 * at least two loose electrons
 * pT of the first electron greater than 20 GeV
 * pT of the secon electron greater than 20 GeV
 * events passing the trigger `EF_e20_loose` (okay, forget about the trigger, this is not working at the moment)

Add the following line to the file `mytagtoaodtag.py`:
{{{
athenaCommonFlags.PoolInputQuery="NLooseElectron>1 && abs(LooseElectronPt1)>20000. && abs(LooseElectronPt2)>20000."
}}}
Note, that the standard ATLAS units for energies (MeV) need to be used and that the pT of the electron is multiplies by its charge. Hence you need to use the `abs()` function.

Also change the value of the `rec.doWriteTAG` to True so that we create a new TAG file with only the selected events.

Now, lets have a second go:
{{{
athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.sel.txt
}}}

The event processing rate should be something like ~20 seconds per event now. You can estimate the skimming efficiencies from the event number of the processed events. Note, that only selected events are counted. The skimming efficiency is around 1%.

How long would you think the proposed event selection for the 2.85 million events from the egamma !TopMix stream would take? Too long for interactive work.

Now, you can send the job to the Grid using Ganga.

== Using Ganga for TAG based AOD skimming ==

Now, that you have a job locally running, you can use Ganga to do the full processing. See section [[https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorialNAF#8_6_Using_TAG_Files_with_Ganga|8.6]] of the [[https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorialNAF|full Ganga tutorial]] for instructions. This is optional.


== TAG file creation using ELSSI ==

For the rest of the tutorial we will work with the ELSSI web service. It is located at https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm. 

Again, we will work with the egamma !TopMix stream (version 5). There are two Oracle database instances hosting this sample: CERN and DESY. Go to https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm and selection either CERN or DESY as data location and then !TopMix as data source. Further, select the egamma stream and run number 108175. Add this selection by pressing the corresponding button and then start ELSSI by clicking on the `Continue to event selection`.

This time we will do the full event selection:
 * events passing the trigger `EF_e20_loose`
 * at least two loose electrons
 * pT of the first electron greater than 20 GeV
 * pT of the secon electron greater than 20 GeV

Create your query, review and perform it. Do the following queries:
 * Start with a simple event count. You should select 105091 events.
 * Display some tag quantities as event number, number of loose electrons and pT of the leading electron in a table and a histogram.
 * Now, retrieve an events collection without skimming. This will create a new TAG file somewhere on CERN AFS space (/afs/cern.ch/atlas/maxidisk/d39) with your selected event. This TAG file can be used to skim AODs locally or on the Grid

For more details on ELSSI check out another [[https://twiki.cern.ch/twiki/bin/view/Atlas/EventTagTutorial2008OctoberBrowser|ELSSI tutorial]].

Download the new TAG file, plot the number of events and estimate the skimming rate.