#acl EditorsGroup:read,write All:read <> = ATLAS-D Tutorial 2009: TAGs = The introducory talk can be found in [[http://indico.cern.ch/getFile.py/access?contribId=9&sessionId=23&resId=0&materialId=slides&confId=52623|Indico]] or in the [[attachment:tags.pdf]]. <
> <
> For this tutorial we will work with the egamma stream of the !TopMix sample (version 5) produced by Richard Hawkings: <
> <
> The objective is to make a mixed event sample (initially of about 200 pb^-1 at 10 TeV), representing the events selected (including background) for typical ttbar and single top selections (semileptonic and dileptonic selections). The mixing will be done at the AOD level, taking events from all the different signal and background samples, renumbering the events and removing the Monte Carlo truth. The resulting AOD events will then be written out to multiple AOD streams (electron, muon etc) as for real data. This sample should then be as similar to real data as possible, and provide a good basis for exercising data-driven analysis in the last months before real data is available. <
> <
> For more details the the [[https://twiki.cern.ch/twiki/bin/view/AtlasProtected/TopMixingExercise|ATLAS TopMixing TWiki]]. <
> <
> The egamma stream `user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` is replicated to many sites, especially `DESY-ZN_PHYS-TOP`, which is local to the NAF. The TAG dataset is copied to `/afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5`. == Inspecting TAG files with ROOT == In this part of the exercise we will use plain ROOT for looking at the content of one TAG file from the !TopMix egamma stream. Start from a new shell and setup ROOT from the ATLAS kit: {{{ source ~/cmthome/setup.sh -tag=15.3.1 }}} We will create a few files and collect them in the `$TestArea/tags` directory: {{{ mkdir -p $TestArea/tags cd $TestArea/tags ln -s /afs/naf.desy.de/group/atlas/ADT09/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5/user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root . }}} Now, we can open the first TAG ROOT file and look at some variables using the TBrowser: {{{ root user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root }}} Within ROOT do the following: {{{ root [0] TBrowser b }}} Note, that this TAG file was produced with release 14 and therefore the TAG information is stored in the `CollectionTree`. Now, inspect the TAG information: * Plot at least the event number, the number of loose electrons and the pT of the first electrons. * Check some of the detector status variables`StatusXXXX` to see what kind of status is stored in this stream. Okay, everything should be GREEN(3). * Check some trigger information for L1 and EF. Can you tell if the `EF_e20_loose` trigger fired? Here is a selection of plots: <
> {{attachment:tagfile.png}} <
> The TAG file contains enough information to reconstruct the full four-vector of the candidates. For example, you can reconstruct the electron candidates and calculate the invariant mass of two electron pairs. This could look like: <
> {{attachment:Zee_peak.png}} <
> If you just want to check the names of the TAG variables within a file, you can either use ROOT {{{ root [1] CollectionTree->Print() }}} or the `CollListAttrib` command from the ATLAS software: {{{ CollListAttrib -src user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001 RootCollection }}} The argument after `-src` is the TAG ROOT filename without the `.root` extension. The output looks like {{{ -------------------------------------------------------------- Collection list: NAME: user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001 TYPE: RootCollection NFRAG: 1 -------------------------------------------------------------- Number of Tokens is: 3 Tokens are: NAME: StreamAOD_ref TYPE: Token INFO: NAME: Stream1_ref TYPE: Token INFO: NAME: StreamESD_ref TYPE: Token INFO: -------------------------------------------------------------- Number of Attributes is: 261 Attributes are: NAME: EventNumber TYPE: unsigned int INFO: NAME: LumiBlockN TYPE: unsigned int INFO: NAME: Luminosity TYPE: float INFO: NAME: NTrk TYPE: unsigned int INFO: NAME: Nvx TYPE: unsigned int INFO: NAME: RandomNumber TYPE: float INFO: NAME: RunNumber TYPE: unsigned int INFO: NAME: Stream TYPE: unsigned int INFO: NAME: TimeStamp TYPE: unsigned int INFO: NAME: VtxX TYPE: float INFO: NAME: VtxY TYPE: float INFO: NAME: VtxZ TYPE: float INFO: NAME: isCalibration TYPE: bool INFO: NAME: isRealData TYPE: bool INFO: NAME: isSimulation TYPE: bool INFO: NAME: isTestBeam TYPE: bool INFO: NAME: LooseElectronEta1 TYPE: float INFO: NAME: LooseElectronEta2 TYPE: float INFO: NAME: LooseElectronEta3 TYPE: float INFO: NAME: LooseElectronEta4 TYPE: float INFO: NAME: LooseElectronPhi1 TYPE: float INFO: NAME: LooseElectronPhi2 TYPE: float INFO: NAME: LooseElectronPhi3 TYPE: float INFO: NAME: LooseElectronPhi4 TYPE: float INFO: NAME: LooseElectronPt1 TYPE: float INFO: NAME: LooseElectronPt2 TYPE: float INFO: NAME: LooseElectronPt3 TYPE: float INFO: NAME: LooseElectronPt4 TYPE: float INFO: NAME: LooseElectronTightness1 TYPE: unsigned int INFO: NAME: LooseElectronTightness2 TYPE: unsigned int INFO: NAME: LooseElectronTightness3 TYPE: unsigned int INFO: NAME: LooseElectronTightness4 TYPE: unsigned int INFO: NAME: NLooseElectron TYPE: unsigned int INFO: .... NAME: SMWord TYPE: unsigned int INFO: NAME: SUSYWord TYPE: unsigned int INFO: NAME: TauIdWord TYPE: unsigned int INFO: NAME: TopWord TYPE: unsigned int INFO: NAME: DatasetID TYPE: int INFO: NAME: Fraction TYPE: float INFO: --------------------------------------------------------- Number of collections scanned is: 1 }}} == AOD Skimming using TAG files == First, we will create some simple job options to produce an AOD using TAG files. Afterwards, we will add more code, to do further event selection using TAG variables. <
> <
> If you start from a new shell, go into the `$TestArea/tags` directory and setup athena: {{{ source $HOME/cmthome/setup.sh -tag=15.3.1 mkdir -p $TestArea/tags cd $TestArea/tags }}} At the moment the `FilePeeker` do not like the $CMTHOME variable as it is. If you see problems like {{{ Py:AthFile INFO opening [user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root]... [26607] Traceback (most recent call last): [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 94, in forking [26607] result = func(*args, **kwargs) [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 453, in fopen_impl [26607] infos = FilePeeker(fname, self)() [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 441, in __call__ [26607] f = self._process_call() [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 340, in _process_call [26607] file_type, file_name = _ftype(self.fname) [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/AthFile/__init__.py", line 179, in _ftype [26607] with H.restricted_ldenviron(projects=['AtlasCore']): [26607] File "/tmp/atlas/kits/15.3.1/sw/lcg/external/Python/2.5.4/slc4_ia32_gcc34/lib/python2.5/contextlib.py", line 15, in __enter__ [26607] return self.gen.next() [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Helpers.py", line 108, in restricted_ldenviron [26607] cmt = CmtWrapper() [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Cmt.py", line 273, in __init__ [26607] assert len(self.projects_dag())>0, "empty projects-DAG tree: corrupted CMT environment ?" [26607] File "", line 2, in projects_dag [26607] File "/tmp/atlas/kits/15.3.1/AtlasCore/15.3.1/InstallArea/python/PyUtils/Decorators.py", line 41, in memoize [26607] mem_dict[args] = result = func(*args) [26607] KeyError: 'CMTHOME' Py:Athena INFO leaving with code 0: "successful run" }}} use the following fix: {{{ ini atlas fix_cmthome }}} If you want to use a TAG file for AOD skimming, basically you need to tell athena to read in the TAG file, where to find the corresponding AODs and then produce either a new TAG or AOD file. The TAG file(s) are set in the standard way, using the `PoolTAGInput` property from `athenaCommonFlags`. The corresponding input AOD files are resolved at run time using the !PoolSvc. The TAGs include the GUID from the corresponding AOD and the !PoolSvc is used to resovle the GUID to a filename. Hence we need to produce a `PoolFileCatalog.xml` file for the `user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` dataset. This can be done with `dq2-ls -f -p -P -G user.RichardHawkings.0108175.topmix_Egamma.AOD.v5` or the prepared file can be taken from `/afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml`. The following job options are based on the `RecExCommon/aodtotag.py` job options. Do a copy and paste into your working shell and the file `mytagtoaodtag.py` will be created. {{{ cat > mytagtoaodtag.py << EOF # steering file for AOD->AOD step based on tags using RecExCommon from RecExConfig.RecFlags import rec # turn of unnessary things rec.doCBNT.set_Value_and_Lock(False) # enable TAG/AOD reading and TAG/AOD writing if needed rec.readTAG.set_Value_and_Lock(True) rec.readAOD.set_Value_and_Lock(True) rec.doWriteAOD.set_Value_and_Lock(True) rec.doWriteTAG.set_Value_and_Lock(False) # change to True if do write TAG # use the first tag file for processing from AthenaCommon.AthenaCommonFlags import athenaCommonFlags athenaCommonFlags.PoolTAGInput = ["user.RichardHawkings.0108175.topmix_Egamma.TAG.v5._00001.root"] # setup the PoolSvc to find the needed input AOD files from AthenaCommon.AppMgr import ServiceMgr as svcMgr include( "AthenaPoolCnvSvc/ReadAthenaPool_jobOptions.py" ) from PoolSvc.PoolSvcConf import PoolSvc svcMgr.PoolSvc.ReadCatalog +=["xmlcatalog_file:/afs/naf.desy.de/group/atlas/ADT09/PoolFileCatalog_TopMix_egamma_v5.xml"] # define an AOD outputname if not done already if not 'PoolAODOutput' in dir(): PoolAODOutput="tagsel_AOD.pool.root" # include main jobOption include ("RecExCommon/RecExCommon_topOptions.py") EOF }}} Note the leading `xmlcatalog_file:` in fron of the filename of the `PoolFileCatalog.xml` file. Now, lets check if the job options are working: {{{ athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.txt }}} You should have produced an AOD called `tagsel_AOD.pool.root`. Run `checkFile.py` on it to see how many events are in the file. From the permon output in the log file `log.txt` you should see a processing rate of around 1 second per event. Still, most of the time is spend in the initialisation of the job. Now you should be able to use a different TAG file or use a list of TAG files. <
> <
> In a second step we will refine our TAG selection further. We will look for Z -> ee events in the !TopMix sample. For this we require: * at least two loose electrons * pT of the first electron greater than 20 GeV * pT of the secon electron greater than 20 GeV * events passing the trigger `EF_e20_loose` (okay, forget about the trigger, this is not working at the moment) Add the following line to the file `mytagtoaodtag.py`: {{{ athenaCommonFlags.PoolInputQuery="NLooseElectron>1 && abs(LooseElectronPt1)>20000. && abs(LooseElectronPt2)>20000." }}} Note, that the standard ATLAS units for energies (MeV) need to be used and that the pT of the electron is multiplies by its charge. Hence you need to use the `abs()` function. Also change the value of the `rec.doWriteTAG` to True so that we create a new TAG file with only the selected events. Now, lets have a second go: {{{ athena.py -c 'EvtMax=10' mytagtoaodtag.py| tee log.sel.txt }}} The event processing rate should be something like ~20 seconds per event now. You can estimate the skimming efficiencies from the event number of the processed events. Note, that only selected events are counted. The skimming efficiency is around 1%. How long would you think the proposed event selection for the 2.85 million events from the egamma !TopMix stream would take? Too long for interactive work. Now, you can send the job to the Grid using Ganga. == Using Ganga for TAG based AOD skimming == Now, that you have a job locally running, you can use Ganga to do the full processing. See section [[https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorialNAF#8_6_Using_TAG_Files_with_Ganga|8.6]] of the [[https://twiki.cern.ch/twiki/bin/view/Atlas/FullGangaAtlasTutorialNAF|full Ganga tutorial]] for instructions. This is optional. == TAG file creation using ELSSI == For the rest of the tutorial we will work with the ELSSI web service. It is located at https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm. Again, we will work with the egamma !TopMix stream (version 5). There are two Oracle database instances hosting this sample: CERN and DESY. Go to https://lxvm0341.cern.ch/tagservices/elssi_int_nightly/index.htm and selection either CERN or DESY as data location and then !TopMix as data source. Further, select the egamma stream and run number 108175. Add this selection by pressing the corresponding button and then start ELSSI by clicking on the `Continue to event selection`. This time we will do the full event selection: * events passing the trigger `EF_e20_loose` * at least two loose electrons * pT of the first electron greater than 20 GeV * pT of the secon electron greater than 20 GeV Create your query, review and perform it. Do the following queries: * Start with a simple event count. You should select 105091 events. * Display some tag quantities as event number, number of loose electrons and pT of the leading electron in a table and a histogram. * Now, retrieve an events collection without skimming. This will create a new TAG file somewhere on CERN AFS space (/afs/cern.ch/atlas/maxidisk/d39) with your selected event. This TAG file can be used to skim AODs locally or on the Grid For more details on ELSSI check out another [[https://twiki.cern.ch/twiki/bin/view/Atlas/EventTagTutorial2008OctoberBrowser|ELSSI tutorial]]. Download the new TAG file, plot the number of events and estimate the skimming rate.