Tutorial for newcomers

setup of framework

Central and core package of the analysis framework is SFrame.

Documentation wiki: http://sourceforge.net/apps/mediawiki/sframe/index.php?title=Main_Page

It is worthwhile reading the pages as it explains the idea and structure of the package.

Setting up SFrame

From the SFrame svn repository one can see which one the latest tagged version of SFrame is. Currently it is SFrame-03-04-10.

To get started you should create a new working directory and check-out SFrame to it:

mkdir Analysis
cd Analysis
svn co https://sframe.svn.sourceforge.net/svnroot/sframe/SFrame/tags/SFrame-03-04-10 SFrame

In order to be able to run SFrame you need a current version of ROOT. Either set up your own version or use the one that's available at DESY:

ini atlasfw
cd SFrame
source setup.sh

These steps you need to repeat every time you want to work with SFrame. To check if everything's working try to compile:

make

SFrame has some use examples in the user dir. They will also demonstrate the PROOF capabilities of SFrame, which we won't use in the following as PROOF showed to be unstable in connection with dCache.

Analysis with SFrame

SFrame is basically just an event loop, which you might know from your MakeClass exercises. SFrame, however, is much smarter and allows you to write a much cleaner and modular analysis. As mentioned before, SFrame also has PROOF capabilities. You can find out more on the SFrame-PROOF page.

In general, your analysis can consist of several cycles. You might, for instance, run a basic selection that will not change for the rest of your analysis on all available datasets. Currently, we don't use this in our analyses, but the examples in SFrame/user will show you an example use case. Your analysis cycle consists of the following building blocks (more details):

If you paid attention while reading through the different function, you might wonder about the luminosity calculation. We will come back to that later.

Creating your own package

Now let's start doing some work. Go back to your analysis directory and type the following:

sframe_new_package.sh MyTestPackage

This will create a new directory MyTestPackage and put some basic files such as a MakeFile into it. The package would already compile now, but wouldn't do anything.

Creating your own cycle

To have something runnable, you need to create a cycle.

cd MyTestPackage
sframe_create_cycle.py -n MyTestAnalysis

See if it compiles:

make

Fiddling around with xml settings

Unfortunately, your cycle doesn't run out-of-the-box. You need to put this JobConfig.dtd file into the config dir of your package (by copying SFrame/usr/config/JobConfig.dtd to the config directory of your package) and add the following line after the first line of the MyTestAnalysis_config.xml so that SFrame/the xml parser knows how to treat the tags (don't worry about the details).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE JobConfiguration PUBLIC "" "JobConfig.dtd" []>
<JobConfiguration JobName="MyTestAnalysisJob" OutputLevel="INFO">

We will now step by step get our package running which will help you later with debugging, when you're alone in your office. Let's try to run again:

cd config
sframe_main MyTestAnalysis_config.xml

SFrame now complains that element In doesn't contain attribute lumi. This is due to the fact that each input file in SFrame has an associated luminosity. How that is obtained will be shown later. For now, change the following line

                        <In FileName="YourInputFileComesHere"/>

to

                        <In FileName="/afs/ifh.de/group/atlas/scratch/topdata/16.0.3.3.3-Production-TauFix/user.clange.STop160333.mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e598_s933_s946_r1831_r1700.20110204.110204151127/user.clange.003195.SingleTop._00001.root" Lumi="1.0"/>

Now we have an input file defined. Try to run again. For now we set the lumi to 1.0.

You get an error that the library isn't found. In your configuration you can include external libraries and have to make them known in the xml file. The same thing applies to the package itself. You need to make it known to the cycle. Adjust the xml file the following way:

        <Library Name="libMyTestPackage"/>

And as we don't have any user configuration set up so far comment the UserConfig.

<!--            <UserConfig>
                        <Item Name="NameOfUserProperty" Value="ValueOfUserProperty"/>
                </UserConfig> -->

Now we should be at a stage where already a lot of things are working. Next thing SFrame complains about is that it doesn't have an input TTree name. SFrame needs to know about every InputTree you're using in your cycle. In the InputData section add the following line:

                        <InputTree Name="RecoTree" />

Now start running again and here we go: We have our first running Cycle!

Have a look at the output:

 ( INFO  )  SCycleController   : Initializing
 ( INFO  )  SCycleController   : Deleting all analysis cycle algorithms from memory
 ( INFO  )  SCycleController   : read xml file: 'MyTestAnalysis_config.xml'
 ( INFO  )  SCycleController   : Created cycle 'MyTestAnalysis'
 ( INFO  )  MyTestAnalysis     : Initializing from configuration
 ( INFO  )  MyTestAnalysis     : Reading SInputData: Data1 - Reco
 ( INFO  )  SCycleConfig       : ===========================================================
 ( INFO  )  SCycleConfig       :                     Cycle configuration
 ( INFO  )  SCycleConfig       :   - Running mode: LOCAL
 ( INFO  )  SCycleConfig       :   - Target luminosity: 1
 ( INFO  )  SCycleConfig       :   - Output directory: ./
 ( INFO  )  SCycleConfig       :   - Post-fix:
 ( INFO  )  SInputData         :  ---------------------------------------------------------
 ( INFO  )  SInputData         :  Type               : Data1
 ( INFO  )  SInputData         :  Version            : Reco
 ( INFO  )  SInputData         :  Total luminosity   : 1pb-1
 ( INFO  )  SInputData         :  NEventsMax         : -1
 ( INFO  )  SInputData         :  NEventsSkip        : 0
 ( INFO  )  SInputData         :  Cacheable          : No
 ( INFO  )  SInputData         :  Skip validation    : No
 ( INFO  )  SInputData         :  Input File         : '/Users/clange/Analyse/user.clange.STop160333.mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e598_s933_s946_r1831_r1700.20110204.110204151127/user.clange.003195.SingleTop._00001.root' (file) | '1' (lumi)
 ( INFO  )  SInputData         :  Tree               : 'RecoTree' (name) | 'Flat input tree' (type)
 ( INFO  )  SInputData         :  ---------------------------------------------------------
 ( INFO  )  SCycleConfig       : ===========================================================
 ( INFO  )  SCycleController   : Job 'MyTestAnalysisJob' configured
 ( INFO  )  SCycleController   : Time needed for initialisation:  0.019 s
 ( INFO  )  SCycleController   : Entering ExecuteAllCycles()
 ( INFO  )  SInputData         : Input type "Data1" version "Reco" : 4995 events
 ( INFO  )  SCycleController   : Executing Cycle #0 ('MyTestAnalysis') locally
 ( INFO  )  SCycleController   : Processing input data type: Data1 version: Reco
 ( INFO  )  MyTestAnalysis     : Initialised InputData "Data1" (Version:Reco) on worker node
 ( INFO  )  MyTestAnalysis     : Processing entry: 999 (999 / 4995 events processed so far)
 ( INFO  )  MyTestAnalysis     : Processing entry: 1999 (1999 / 4995 events processed so far)
 ( INFO  )  MyTestAnalysis     : Processing entry: 2999 (2999 / 4995 events processed so far)
 ( INFO  )  MyTestAnalysis     : Processing entry: 3999 (3999 / 4995 events processed so far)
 ( INFO  )  MyTestAnalysis     : Terminated InputData "Data1" (Version:Reco) on worker node
 ( INFO  )  SCycleController   : Writing output of "MyTestAnalysis" to: ./MyTestAnalysis.Data1.Reco.root
 ( INFO  )  SCycleController   : Overall cycle statistics:
 ( INFO  )  SCycleController   :       4995 Events - Real time   0.37 s  - 13643 Hz | CPU time   0.28 s  - 17839 Hz

You can see that SFrame runs over all 4995 events in the ntuple and spits out a file called MyTestAnalysis.Data1.Reco.root. As we didn't do anything, it is empty.

Here's a small exercise for you:

Code manipulation

Now that we have a running cycle, let's add same salt and pepper to it. First of all we need to connect a few branches. This is what one would naturally do for each input file, i.e. in BeginInputFile. Open src/MyTestAnalysis.cxx. Add the following lines:

   //
   // Connect the input variables:
   //
   ConnectVariable( m_recoTreeName.c_str(), (m_electronPrefix + "_N").c_str(), m_Electron_N );
   ConnectVariable( m_recoTreeName.c_str(), (m_electronPrefix + "_pt").c_str(), m_Electron_pt );
   ConnectVariable( m_recoTreeName.c_str(), (m_electronPrefix + "_eta").c_str(), m_Electron_eta );
   ConnectVariable( m_recoTreeName.c_str(), (m_electronPrefix + "_phi").c_str(), m_Electron_phi );
   ConnectVariable( m_recoTreeName.c_str(), (m_electronPrefix + "_e").c_str(), m_Electron_e );

This command works very similar to ConnectBranch. All those member variables m_* need to be declared in the header file of MyTestAnalysis. In the case of the ntuple we're using now they are of type std::vector<float>*, except for m_Electron_n, which is of type int. You always have to make sure you're using the right data type when connecting branches. You can check the type of each variable if you right-click on the desired branch in ROOT TBrowser and choose Inspect.

While you're editing the header file, also add a string names m_recoTreeName. Your header file should have the following content now:

private:

   //
   // Put all your private variables here
   //
   std::string m_recoTreeName;
   std::string m_electronPrefix;

   // branch names
   int m_Electron_N;
   std::vector<float>* m_Electron_pt;
   std::vector<float>* m_Electron_eta;
   std::vector<float>* m_Electron_phi;
   std::vector<float>* m_Electron_e;

In a minute we're going to do something smart about the m_recoTreeName, for now we just set it by hand. As this is a property that needs to be set for the whole cycle, add the following line in the constructor (MyTestAnalysis::MyTestAnalysis()

   m_recoTreeName = "RecoTree";
   m_electronPrefix = "Electron";

As we added a rather complicated data structure of std::vector<float> we would need to make it known to ROOT. This has, however, already been taken care of by one of the built-in SFrame classes. We just need to make it known to the Cycle. In the xml file add the following line:

        <Library Name="libGenVector" />

If you wanted to use more complicated data structures such as std::vector< std::vector< < float > > you would need to define them in the src/*LinkDef.h file the following way:

#pragma link C++ class std::vector< std::vector< < float > >+;

If you now compile and run again, the output won't change. To see that something's happening, you need to change the OutputLevel to DEBUG (in the xml). You will then see that the branches are connected.

Output Level intermezzo

While we're changing the output level, let's try the different available output levels. In BeginCycle add the following lines:

   //
   // Test how various printed lines look like:
   //
   m_logger << VERBOSE << "This is a VERBOSE line" << SLogger::endmsg;
   m_logger << DEBUG << "This is a DEBUG line" << SLogger::endmsg;
   m_logger << INFO << "This is an INFO line" << SLogger::endmsg;
   m_logger << WARNING << "This is a WARNING line" << SLogger::endmsg;
   m_logger << ERROR << "This is an ERROR line" << SLogger::endmsg;
   m_logger << FATAL << "This is a FATAL line" << SLogger::endmsg;
   m_logger << ALWAYS << "This is an ALWAYS line" << SLogger::endmsg;

As you can see the SFrame Logger allows you to configure your output in a very comfortable way. You should try to use the logger instead of cout, because it will save you recompilation when you're debugging (and also in many other cases).


Histogram booking and filling

Back to our analysis cycle. We now want to make the first step and fill a histogram. SFrame allows you to create and fill a histogram in one line without the need of declaring it. To fill something useful, let's loop over all electrons for each event and fill the pt into a histogram (in ExecuteEvent):

   for( Int_t i = 0; i < m_Electron_N; ++i ) {

      // Fill the example histogram:
      Book( TH1F( "El_pt_hist", "Electron p_{T} [MeV]", 100, 0.0, 150000.0 ) )->Fill( (*m_Electron_pt)[i] );

   }

If you fill a histogram in two places, it's better to book it in BeginInputData:

   Book( TH1F( "El_eta_hist", "Electron #eta", 20, -5.0, 5.0 ) );

and then fill it in ExecuteEvent (in the loop):

      Hist( "El_eta_hist" )->Fill( (*m_Electron_eta)[i] );

Writing output trees

For your secondary cycle or for multivariate analyses you might need an output tree with output branches. In your header file you need to declare the variables:

   //
   // The output variables
   //
   std::vector< float >    m_o_El_pt;

In BeginInputData you define the link between variable and branch output:

   //
   // Declare the output variables:
   //
   DeclareVariable( m_o_El_pt, "El_pt" );

In ExecuteEvent you should first clear the vector, then fill it:

   m_o_El_pt.clear();
...
      m_o_El_pt.push_back( (*m_Electron_pt)[i] );

The code will compile, but it won't run, because you need to define an output tree name in the xml:

                        <OutputTree Name="OutTree" />

Now run and have a look at the output ROOT file.

Making your code xml-configurable

One of the nice features of SFrame is the configurability via xml that makes your code very flexible. We've set the RecoTreeName by hand in one of the previous exercises. Now we will make it configurable. Replace the string assignment by:

   //
   // Declare the properties of the cycle:
   //
   DeclareProperty( "ElectronPrefix", m_electronPrefix = "Electron" );

The default value is not needed, but can be very useful. We can now use the UserConfig section in the xml and for example change the ElectronPrefix to Jet:

                <UserConfig>
                        <Item Name="ElectronPrefix" Value="Jet"/>
                </UserConfig>

Compile, run and you will see very different histograms/branches in the output file.

Now you know the most important basics of SFrame and we can continue to more specific stuff. SFrame has, of course, a lot more functionality such as:

but as we won't need that for this tutorial, you can have a look yourself some other time.

documentation

Besides the general SFrame documentation there are a couple of pages concerning documentation in this wiki on the AnalysisFramework. After having gone through this tutorial it is also YOUR responsibility to keep documentation up-to-date.

The most important page is the list of AvailablePackages. This page describes how to check-out every package that is available and possibly additional commands that are needed.

In addition, we have set up an automatic code documentation page using doxygen with weekly updates. This is still under construction but mostly working and will be helpful when you write your analysis.

Take your time and browse through the documentation (atlas/insider).

functionality of central/core packages

The analysis framework consists of a few central packages that save you a lot of coding and make collaboration with other group members easy and comfortable.

D3PDVariables

The D3PDVariables package provides a wrapper to the ntuple variables and at the same time provides the analyser with Particle classes and some comfort functionality. Main purpose:

repository: https://svnweb.cern.ch/trac/desyatfw/browser/CommonAnalysis/Common/D3PDVariables

Go into your analysis directory and get the package. We will put it into the Common directory, because it's used by all analyses:

mkdir Common
cd Common
svn co svn+ssh://svn.cern.ch/reps/desyatfw/CommonAnalysis/Common/D3PDVariables/trunk D3PDVariables

The package contains a python script to automatically generate the D3PDVariables from a few tab-separated text files. To create the variables issue from the package directory:

python scripts/CodeIt.py

The configuration files and code skeletons are located in scripts/Meta. It currently works for Electron, Muon, Jet, TrackParticle and Vertex. The structure of each text file is as follows:

detaillevel \t type \t variable name

Let's now integrate this package in our test analysis. As we already have some electron properties implemented, let's give it a try with muons. First of all, compile the D3PDVariables package. This will copy the package shared libraries to the SFrame directory.

In MyTestAnalysis header file add the following includes:

// External include(s):
#include "../../Common/D3PDVariables/include/MuonD3PDObject.h"
#include "../../Common/D3PDVariables/include/Muon.h"

and make a forward declaration:

namespace DESY {
  class Muon;
}

In the private section add the following:

  //
  // Input variable objects:
  //
  D3PD::MuonD3PDObject  m_muon;       ///< muon container

Now we need to initialise the object correctly in the cxx file. Extend the constructor:

MyTestAnalysis::MyTestAnalysis()
   : SCycleBase(),
     m_muon( this ) {

In BeginInputFile we can now connect the variables. Instead of several lines as before for electrons you just need one line:

  //
  // Connect all the D3PDObjects
  //
  m_muon.ConnectVariables(         m_recoTreeName.c_str(), 0, "Muon_" );

The number 0 denotes the detail level. The less variables you connect the faster your analysis runs. The detail levels are set in D3PDVariables.

We can now loop over all muons in the D3PD (MuonD3PDObject), create a Muon object, get the TLorentzVector and fill histograms:

   for( Int_t i = 0; i < m_muon.N; ++i ) {
      // set muon object
      DESY::Muon mymu( &m_muon, i );
      Book( TH1F( "Mu_pt_hist", "Muon p_{T} [MeV]", 100, 0.0, 150000.0 ) )->Fill( mymu.pt() );
      TLorentzVector* mu_tlv = mymu.getTLV();
      Book( TH1F( "Mu_rap_hist", "Muon rapidity", 20, -5.0, 5.0 ) )->Fill( mu_tlv->Rapidity() );
   }

Now we need to make the D3PDVariables package known to our cycle:

<Library Name="libD3PDVariables" />

Compile and run. If your OutputLevel is still set to debug you can see which variables are connected.

Exercise: Open the ntuple, pick one of the muon variables that is not yet defined in D3PDVariables and add it at detail level 4 in the text file. Mind: You can use the cxx and h files for testing, but the except for D3PDVariable/src/D3PDObjectsNames.cxx you should only edit the txt-files! Run the python script. Now there is one more thing you need to check. Open D3PDVariable/src/D3PDObjectsNames.cxx. Here you see the actual wrapping of object names to ntuple variable names. If the variable you picked is not yet there (in SingleTopDPDMaker section), add it there. Compile D3PDVariables. Add a histogram into which you fill the new variable. Compile MyTestPackage. Run MyTestAnalysis. It will crash, as you didn't adjust the detail level, take note of the error message, as this crash is common and you should be aware of it. Now adjust the detail level. Note: In D3PDObjectsNames.cxx you will have seen variable wrapping for other ntuple names. This is the place where the wrapping for TopD3PDs would go in. That will basically be the only change you need. Even if you started your analysis on SingleTopD3PDs, you can easily switch by just this little change!

SelectionTools

Package allowing for object selection and overlap removal

repository: https://svnweb.cern.ch/trac/desyatfw/browser/CommonAnalysis/Common/SelectionTools

This package should also go into Common.

svn co svn+ssh://svn.cern.ch/reps/desyatfw/CommonAnalysis/Common/SelectionTools/trunk SelectionTools
cd SelectionTools
make

Let's now add a muon selector to the analysis. In the header file add the include:

#include "../../Common/SelectionTools/include/MuonSelectorTool.h"

and in private section add an instance of MuonSelectorTool:

  //
  // The selector tools
  //
  MuonSelectorTool      m_muonSelector;      ///< selector for muon candidates

Extend the constructor as you did it before with m_muon:

MyTestAnalysis::MyTestAnalysis()
   : SCycleBase(),
     m_muon( this ),
     m_muonSelector( this, "MuonSelector" ) {

The string "MuonSelector" is important in case you want to change one of the cuts in your xml. You can of course have several instances of MuonSelector (with different names) and implement different selections.

The selectors need to be initialised in BeginInputData, which also needs to be extended with "id":

void MyTestAnalysis::BeginInputData( const SInputData& id ) throw( SError ) {
   //
   // Initialize the tool(s):
   //
   m_muonSelector.BeginInputData( id );

This will also print out the selection. Also remember to add the library to the xml. Let's now have the MuonSelector do something. We can just pass the Muon object in our loop to it:

      if( m_muonSelector.IsPassed( mymu ) ) {
         mymu.flagAsGood();
         Book( TH1F( "Mu_pt_sel_hist", "Selected Muon p_{T} [MeV]", 100, 0.0, 150000.0 ) )->Fill( mymu.pt() );
      }

If you compile and run now, your code will crash. The reason for that is a little subtle. You will encounter errors like these quite often. One reason might be that you didn't compile all packages consistently. To solve this do

make distclean && make

in each package starting from the most basic one. This is, however, not the problem here. SelectionTools use SFrame's own slim histograms SH1, but our cycle doesn't. To make them known to our cycle, we need to add the following line to the xml:

<Library Name="libSFramePlugIns" />

Now everything should work. The selection tools will also create some validation hists. Have a look at them!

Exercise:

DesyUtilities

The DesyUtilities package is the swiss-army knife of the analysis. Currently, it contains most tools needed for systematic variations and also contains a lot of utility functions.

We're just going to focus on one of them: SCycleBaseDesy

SCycleBaseDesy extends the SCycleBase with some extra functionality. It allows you to define your histograms in a text file. Instead of using string look-ups, it uses ints as identifiers, which is a lot faster. To make use of this, you have to make changes in several places in your code. Have a look at the https://svnweb.cern.ch/trac/desyatfw/browser/CommonAnalysis/Top/GoDesy/trunk, which uses the new histogramming style. Basic changes:

some more hints

inclusion of external packages

We've seen that inclusion of external packages can be difficult. Make sure you adjust the LinkDef.h file if needed. Also try to avoid mixing packages provided by other people with packages such as DesyUtilities. Try to check-out those packages to an extra directory so that one can easily update them.

writing your own package

When writing your own package, as we've done above, try to have as little dependencies as possible, because you can easily lose overview. Sometimes it's worth starting a new package from scratch instead of copying an old one.

coding rules

As there are several new people joining the group now and not all people talk to each other on a frequent basis, you should make sure that you only check-in running code and if you need to change the interface (which you should avoid), make this known to everyone! I will add you to the atlas-analysis-fw@desy.de list after this meeting.

There is a SFrame users mailing list: atlas-sframe-users@cern.ch - subscribe to it via hypernews. Attile provides very good support and in case of problems it often helps to search the archive.

Please make sure you comment your code properly. As we use doxygen, try to follow the standards.

things that were left out

ATLAS: Projects/TopPhysicsInternal/AnalysisFramework/Tutorial (last edited 2012-07-23 17:18:26 by StefanieTodt)