ATLAS-D Tutorial 2010: ROOT and PROOF

Inside D3PD

Start root on the NAF. You might need to call the ini command before the root command is available:

ini ROOT530
root -l

Inside the root session you can open a file directly from dCache with the .ls command you can see what is

in this file and confirm that opening the file was successful. Use the TFile::Get command to get a pointer to the "egamma"-tree.

TFile *f = TFile::Open("dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/data10_7TeV/NTUP_EGAMMA/f282_p207/data10_7TeV.00161379.physics_Egamma.merge.NTUP_EGAMMA.f282_p207_tid159706_00/NTUP_EGAMMA.159706._000372.root.1")
.ls
TTree *egamma = (TTree*) f->Get("egamma");

To see a list of variables in the tree do

 egamma->Print();

a text file with the printout can be found here.

Note the electron related varibles: el_n, el_pt, el_E, el_px, el_py, el_pz, el_tight

and the variables RunNumber, EventNumber

Use TTree::Scan to have a rough look variables EventNumber, el_n, el_pt, el_tight for the first few events.

egamma->Scan("EventNumber:el_n:el_pt:el_tight")

Remark: Note the columns row and instance. Each event has one row number but might have several instances in this table. The value of "plain" variables (just one value per event) is repeated in all instances. The entries of arrays are shown in the differnt instances.

Note that el_n is a (single) integer which gives the number of reconstructed electron (candidates) in the current event. el_pt and el_tight are arrays with el_n entries. Each entry in el_pt gives the transverse momentum for each of the electrons. The value of el_tight is either 0 or 1 and shows, if the corresponding electron meets tight selection criteria (1) or not (0).

Now, you might want to
- plot the pt spectrum of all electron candidates
- plot the distribution of number of "tight" electrons in each event ( using the Sum$() function )
- plot the pt spectrum of only the "tight" electrons.
- switch the view of the TCanvas with the plot to logarithmic y-axis view
- plot the sum of the electron pt of all "tight" electrons in a event

Solution

  egamma->Draw("el_pt");
  egamma->Draw("Sum$(el_tight)");

  egamma->Draw("el_pt","el_tight");
  gPad->SetLogy();
  gPad->SetLogy(0);

  egamma->Draw("Sum$(el_pt*el_tight)","Sum$(el_tight) != 0");
  //Note:
  egamma->Draw("Sum$(el_pt)","el_tight");
  // is different!. Sum$() always sums over all "instances" independent to the "selection" value

Setup Proofcluster

If you run proofcluster for the first time, or later proofcluster config, you are asked for configuration options. After running the configuration, please check the file /scratch/hh/current/atlas/users/<your username>/proofcluster/setup.conf .

Make sure the contents is the following:

master=localhost
workers=10
cmssw=nocmssw
protocolport=XXXX
port=XXXX
os=any
rootsys=/afs/naf.desy.de/products/root/amd64_rhel50/5.30.00
site=XX
queue=normal
autocopylog=n
ram=2048

protocolport should be a random number to avoid conflicts with other users
port should be another random number, usually it is protocolport+1
site must be either hh or zn, depending on the NAF site, you want to use.

Start the proofcluster:

ini atlas
change to a directory on Lustre or AFS (mandatory)
call proofcluster status to check, if you have already a proofcluster running
call proofcluster start to start your proofcluster
call proofcluster status to check the status of your proofcluster again
don't forget to call proofcluster stop after you are done with working with PROOF

Troubleshooting:

proofcluster status shows an "Master" entry, but no "Worker" entry
- the batch jobs are not running anymore, just call proofcluster stop and try again
proofcluster start hangs
- at "Waiting..."
  - there are not enough slots free, meeting your requirements. Consider lowering the number of worker nodes or decrease the amount of requested RAM
- at "Almost finished..."
  - the SGE (NAF) batch job to start your PROOF cluster started but does not finish.
    Check the status of your SGE batch job by running qstat -u <your username> and check, if your job is in the "Error" state (E). Most likely you called proofcluster start in a directory, that is not visible to the batch node. Currently proofcluster start must be called in a directory on Lustre or AFS. Delete your batch job by calling qdel <jobid> with the jobid seen by the qstat command. Change to a directory on Lustre or AFS and do proofcluster stop, proofcluster status, proofcluster start again.

PROOF Datasets

Check, that your proofcluster is configured correctly and running
First we connect manually to the PROOF cluster, you need your personal PROOF protocol port for this (protocolport from your setup.conf
.!hostname calls the UNIX-command hostname in the current proofsession
p->Exec(".!hostname") does the same on each PROOF worker in parallel

root -l
p = TProof::Open("localhost:<your protocol port>");
.!hostname
p->Exec(".!hostname");
cout << "Hello World" << endl;
p->Exec(" cout << \"Hello World\" << endl; ")

Instead of opening a PROOF session with TProof::Open manually, you can use the script, shown after proofcluster start

root -l /scratch/hh/current/atlas/users/<your user name>/proofcluster/connectproof.C

Use "gProof" instead of "p" in this case.

To work with datasets first choose a list of files. There are different file lists for this tutorial
- for proofcluster with "hh"-Site: /afs/naf.desy.de/group/atlas/ADT11/hh/XX/tutfilelist.txt ( 00 <= XX <= 09 )
To create a TFileCollection and after that a TDSet do the following in a root session:

TFileCollection *fc = new TFileCollection("tutfilelist","Tutorial file list");
fc->AddFromFile("/afs/naf.desy.de/group/atlas/ADT11/hh/XX/tutfilelist.txt");
fc->Print();

TDSet *ds = new TDSet("tutset","egamma");
ds->Add( fc->GetList() );
ds->Print();

for convenience in your root session, you can call

.x /afs/naf.desy.de/group/atlas/ADT11/hh/XX/createtutds.C

this will execute the commands above.
we can use ds->Draw() like TTree::Draw

now use the dataset and PROOF to
- draw the pt spectrum for all electron (cadidates)
- draw the pt spectrum for all only the "tight" electrons for events with exactly two "tight" electrons
  - note the excess at 40 GeV
- try to plot the invariant mass of the "tight" electrons for events with exactly two "tight" electrons
  - the selection stays the same like the previous polot
  - reminder: Minv = sqrt( Sum(E)² - Sum(px)² -Sum(py)² -Sum(pz)² )
  - reminder: in ROOT Sum$(el_px) sums the x-Momentum of _all_ electrons of selected events. To limit to the "tight" electrons use the Trick: Sum$(el_px*el_tight)
- redo the plot above. Only use events with two "tight" electrons and pt of the "tight" electrons > 60 GeV in total

Solution

root -l  /scratch/hh/current/atlas/users/<your user name>/proofcluster/connectproof.C
.x /afs/naf.desy.de/group/atlas/ADT11/hh/XX/createtutds.C

ds->Draw("el_pt");
ds->Draw("el_pt","Sum$(el_tight)==2");

ds->Draw("sqrt( Sum$(el_E*el_tight)^2-Sum$(el_px*el_tight)^2-Sum$(el_py*el_tight)^2-Sum$(el_pz*el_tight)^2 )>>masshisto(200,0,200000)", "Sum$(el_tight)==2");

ds->Draw("sqrt( Sum$(el_E*el_tight)^2-Sum$(el_px*el_tight)^2-Sum$(el_py*el_tight)^2-Sum$(el_pz*el_tight)^2)>>masshisto(200,0,200000)", "Sum$(el_tight)==2&&Sum$(el_pt*el_tight)>60000");

Remark: You can get the result histograms from the last query with gProof->GetOutputList()->Print() and then gProof->GetOutpoutList()->FindObject("htemp")->Draw()

PROOF with TSelector

For more complex computations, the Draw function reaches its limits. The TSelector class allows write complex with full access to all TTree variables.

A skeleton to process a given TTree can be generated automatically with the TTree::MakeSelector method. to create such a skeleton for the egamma-tree, do the following:

open an arbitrary file, containing a egamma-tree
get a pointer to this TTree
call egamma->MakeSelector()
- this will create the two files egamma.h and egamma.C
- you can give an string as argument to create a selector with a different name
  (e.g. egamma->MakeSelector("invMassSelector"))

f = TFile::Open("dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/data10_7TeV/NTUP_EGAMMA/f282_p207/data10_7TeV.00161379.physics_Egamma.merge.NTUP_EGAMMA.f282_p207_tid159706_00/NTUP_EGAMMA.159706._000372.root.1");
egamma = (TTree*) f->Get("egamma");
egamma->MakeSelector();

In this example the selector skeleton will be modified to plot the invariant mass of "tight" electrons in the egamma tree.

Remark: It is always possible to check whether the TSelector is formally correct by trying to compile the selector. This can be done by calling .L egamma.C+. With this, you can check after each of the following modifications, if you did some typos.

If you want to fill histograms (or use other variables) in your code, you first need to declare these the header file.

Modify the egamma.h file and add a TH1D* variable in the selector class. You also need include the header files for TH1D, because this header is not included in the skeleton. egamma.h

[...]
#include <TH1D.h>
[...]
public:
  TH1D * fMasshist;

The hook SlaveBegin is called before any event is processed. In contrast to the hook Begin, SlaveBegin is called on each worker node. When running with PROOF, each PROOF worker-node has an own instance of the TSelector in the memory. To be able to fill a histogram when looping over all events, on each PROOF node a histogram has to be created.

However, although we create an histogram on each node, we only want to have one result histogram. Luckily, PROOF merges all histograms into one, if we register the histogram to the fOutput list by calling fOutput->Add(fMasshist).

Find the SlaveBegin hook in the egamma.C file and modify it to contain the following.

void egamma::SlaveBegin(TTree * /*tree*/)
{
   // The SlaveBegin() function is called after the Begin() function.
   // When running with PROOF SlaveBegin() is called on each slave server.
   // The tree argument is deprecated (on PROOF 0 is passed).

   TString option = GetOption();
   fMasshist = new TH1D("masshist","Massen Histogramm; m_inv[GeV]",200,0,200000);
   fOutput->Add(fMasshist);

}

The hook Terminate is only called once on the client. In contrast TerminateSlave would be called on each worker node. This hook can be used to post-process the merged output from the PROOF-session. In our case the (parallely) created histogram will be plotted on the screen, but more sophisticated things, like fitting could be done here. Because we want to use the result of the PROOF session, we cannot just use the fMasshisto variable, but have to pick the histogram from the fOuput-list. Additionally, we have to include the TCanvas.h header, because it is not included in the automatically generated header.

Find the Terminate hook in the egamma.C file and modify it to contain the following code. egamma.C

[...]
#include <TCanvas.h>
[...]
void egamma::Terminate()
{
   // The Terminate() function is the last function to be called during
   // a query. It always runs on the client, it can be used to present
   // the results graphically or save the results to file.
  fMasshist = dynamic_cast<TH1D *>(fOutput->FindObject("masshist"));
  if ( fMasshist != NULL )
    {
      new TCanvas;
      fMasshist->Draw();
    }

}

Finally, we have to implement the code to calculate the invariant mass. The Process hook is called for each event in the dataset. Here, we can do complex calculations and "cut events away" by simply not filling them into the result-histogram. Within the Process function, we can have access to all the variables of the current event. However, although for each TTree-variable, a C++ variable exists, these variables are not filled by default.

In Process, we have to call fChain->GetTree()->GetEvent(entry) to read the full event, or b_<Tree variable name>->GetEntry(entry);. After this the events tree-variables are available in the corresponding C++-variable. Because reading the whole event into the memory might be a huge overhead, the second method is prefered for large trees.

Find the Process hook in the egamma.C file and modify it to contain the following code. egamma.C

[...]
#include <TLorentzVector.h>
[...]
Bool_t egamma::Process(Long64_t entry)
{
   // The Process() function is called for each entry in the tree (or possibly
   // keyed object in the case of PROOF) to be processed. The entry argument
   // specifies which entry in the currently loaded tree is to be processed.
   // It can be passed to either egamma::GetEntry() or TBranch::GetEntry()
   // to read either all or the required parts of the data. When processing
   // keyed objects with PROOF, the object is already loaded and is available
   // via the fObject pointer.
   //
   // This function should contain the "body" of the analysis. It can contain
   // simple or elaborate selection criteria, run algorithms on the data
   // of the event and typically fill histograms.
   //
   // The processing can be stopped by calling Abort().
   //
   // Use fStatus to set the return value of TTree::Process().
   //
   // The return value is currently not used.

  //fChain->GetTree()->GetEntry(entry);

  b_el_n->GetEntry(entry);
  b_el_tight->GetEntry(entry);

  int ntight = 0;
  for ( int i = 0; i != el_tight->size(); ++i  )
    if ( (*el_tight)[i] == 1 ) ++ntight;
  if (ntight != 2) return kFALSE;


  b_el_pt->GetEntry(entry);
  b_el_px->GetEntry(entry);
  b_el_py->GetEntry(entry);
  b_el_pz->GetEntry(entry);

  TLorentzVector sumP4(0,0,0,0);

  for ( int i = 0; i != el_n; ++i )
    {
      if ( (*el_tight)[i] == 1 )
        {
          TLorentzVector elTightP4;
          elTightP4.SetXYZM( (*el_px)[i], (*el_py)[i], (*el_pz)[i], .511 );
          sumP4 += elTightP4;
        }
      masshist->Fill( sumP4.M() );

    }


   return kTRUE;
}

Remark: Instead of modifying the automatically generated class, one can create a derived class, to implement the actual code of a TSelector. This allows just to generate a new TSelector, if the structure of the tree is changed. The following template might help you to create such a class. However, to use this derived selector, the base selector has to be loaded on each PROOF slave. Because of this, you have to call gProof->Load("egamma.C+") before you can use the myegamma selector.

#include "egamma.h"

#ifndef _myegamma_h_
#define _myegamma_h_

#include "TH1D.h"

class myegamma : public egamma
{
 protected:
  TH1D *fMasshist;

  virtual void    SlaveBegin(TTree *tree);
  virtual Bool_t  Process(Long64_t entry);
  virtual void    Terminate();
};

#endif //_myegamma_h_

#include "myegamma.h"


void myegamma::SlaveBegin(TTree * /*tree*/)
{
   // The SlaveBegin() function is called after the Begin() function.
   // When running with PROOF SlaveBegin() is called on each slave server.
   // The tree argument is deprecated (on PROOF 0 is passed).

   TString option = GetOption();

}


Bool_t myegamma::Process(Long64_t entry)
{
   // The Process() function is called for each entry in the tree (or possibly
   // keyed object in the case of PROOF) to be processed. The entry argument
   // specifies which entry in the currently loaded tree is to be processed.
   // It can be passed to either egamma::GetEntry() or TBranch::GetEntry()
   // to read either all or the required parts of the data. When processing
   // keyed objects with PROOF, the object is already loaded and is available
   // via the fObject pointer.
   //
   // This function should contain the "body" of the analysis. It can contain
   // simple or elaborate selection criteria, run algorithms on the data
   // of the event and typically fill histograms.
   //
   // The processing can be stopped by calling Abort().
   //
   // Use fStatus to set the return value of TTree::Process().
   //
   // The return value is currently not used.


   return kTRUE;
}

void myegamma::Terminate()
{
   // The Terminate() function is the last function to be called during
   // a query. It always runs on the client, it can be used to present
   // the results graphically or save the results to file.

}

ATLAS: WorkBook/NAF/ADT11PROOF (last edited 2011-09-21 07:27:47 by WolfgangEhrenfeld)