#acl EditorsGroup:read,write NAFEditorsGroup:read,write All:read
= ATLAS-D Tutorial 2010: ROOT and PROOF =
== Inside D3PD ==
* Start root on the NAF. You might need to call the ini command before the {{{root}}} command is available:
{{{
ini ROOT530
root -l
}}}
* Inside the root session you can open a file directly from dCache with the {{{.ls}}} command you can see what is
in this file and confirm that opening the file was successful. Use the {{{TFile::Get}}} command to get a pointer to the "egamma"-tree.
{{{
TFile *f = TFile::Open("dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/data10_7TeV/NTUP_EGAMMA/f282_p207/data10_7TeV.00161379.physics_Egamma.merge.NTUP_EGAMMA.f282_p207_tid159706_00/NTUP_EGAMMA.159706._000372.root.1")
.ls
TTree *egamma = (TTree*) f->Get("egamma");
}}}
* To see a list of variables in the tree do
{{{
egamma->Print();
}}} a text file with the printout can be found [[attachment:egammatree.txt|here]].
* Note the electron related varibles: {{{ el_n, el_pt, el_E, el_px, el_py, el_pz, el_tight }}}
and the variables {{{ RunNumber, EventNumber }}}
* Use {{{TTree::Scan}}} to have a rough look variables {{{EventNumber, el_n, el_pt, el_tight}}} for the first few events.
{{{
egamma->Scan("EventNumber:el_n:el_pt:el_tight")
}}}
<
> Remark: Note the columns ''row'' and ''instance''. Each event has one ''row number'' but might have several ''instances'' in this table. The value of "plain" variables (just one value per event) is repeated in all ''instances''. The entries of arrays are shown in the differnt ''instances''.
<
> Note that {{{el_n}}} is a (single) integer which gives the number of reconstructed electron (candidates) in the current event. {{{el_pt}}} and {{{el_tight}}} are arrays with {{{el_n}}} entries. Each entry in {{{el_pt}}} gives the transverse momentum for each of the electrons. The value of {{{el_tight}}} is either 0 or 1 and shows, if the corresponding electron meets tight selection criteria (1) or not (0).
* Now, you might want to
* plot the pt spectrum of all electron candidates
* plot the distribution of number of "tight" electrons in each event ( using the {{{Sum$()}}} function )
* plot the pt spectrum of only the "tight" electrons.
* switch the view of the {{{TCanvas}}} with the plot to logarithmic y-axis view
* plot the sum of the electron pt of all "tight" electrons in a event
<
> Solution
{{{
egamma->Draw("el_pt");
egamma->Draw("Sum$(el_tight)");
egamma->Draw("el_pt","el_tight");
gPad->SetLogy();
gPad->SetLogy(0);
egamma->Draw("Sum$(el_pt*el_tight)","Sum$(el_tight) != 0");
//Note:
egamma->Draw("Sum$(el_pt)","el_tight");
// is different!. Sum$() always sums over all "instances" independent to the "selection" value
}}}
== Setup Proofcluster ==
If you run {{{proofcluster}}} for the first time, or later {{{proofcluster config}}}, you are asked for configuration options. After running the configuration, please check the file {{{ /scratch/hh/current/atlas/users//proofcluster/setup.conf }}}.
Make sure the contents is the following:
{{{
master=localhost
workers=10
cmssw=nocmssw
protocolport=XXXX
port=XXXX
os=any
rootsys=/afs/naf.desy.de/products/root/amd64_rhel50/5.30.00
site=XX
queue=normal
autocopylog=n
ram=2048
}}}
* {{{protocolport}}} should be a random number to avoid conflicts with other users
* {{{port}}} should be another random number, usually it is {{{protocolport}}}+1
* {{{site}}} must be either hh or zn, depending on the NAF site, you want to use.
<
>
Start the proofcluster:
* {{{ini atlas}}}
* change to a directory on Lustre or AFS (mandatory)
* call {{{proofcluster status}}} to check, if you have already a proofcluster running
* call {{{proofcluster start}}} to start your proofcluster
* call {{{proofcluster status}}} to check the status of your proofcluster again
* don't forget to call {{{proofcluster stop}}} after you are done with working with PROOF
Troubleshooting:
* {{{proofcluster status}}} shows an "Master" entry, but no "Worker" entry
* the batch jobs are not running anymore, just call {{{proofcluster stop}}} and try again
* {{{proofcluster start}}} hangs
* at "{{{Waiting...}}}"
* there are not enough slots free, meeting your requirements. Consider lowering the number of worker nodes or decrease the amount of requested RAM
* at "{{{Almost finished...}}}"
* the SGE (NAF) batch job to start your PROOF cluster started but does not finish. <
> Check the status of your SGE batch job by running {{{qstat -u }}} and check, if your job is in the "Error" state (E). Most likely you called {{{proofcluster start}}} in a directory, that is not visible to the batch node. Currently {{{proofcluster start}}} must be called in a directory on Lustre or AFS. Delete your batch job by calling {{{qdel }}} with the jobid seen by the qstat command. Change to a directory on Lustre or AFS and do {{{proofcluster stop}}}, {{{proofcluster status}}}, {{{proofcluster start}}} again.
== PROOF Datasets ==
* Check, that your proofcluster is configured correctly and running
* First we connect manually to the PROOF cluster, you need your personal PROOF protocol port for this ({{{protocolport}}} from your setup.conf
* {{{.!hostname}}} calls the UNIX-command {{{hostname}}} in the current proofsession
* {{{p->Exec(".!hostname")}}} does the same on each PROOF worker in parallel
{{{
root -l
p = TProof::Open("localhost:");
.!hostname
p->Exec(".!hostname");
cout << "Hello World" << endl;
p->Exec(" cout << \"Hello World\" << endl; ")
}}}
* Instead of opening a PROOF session with {{{TProof::Open}}} manually, you can use the script, shown after {{{proofcluster start}}}
{{{
root -l /scratch/hh/current/atlas/users//proofcluster/connectproof.C
}}}
Use "gProof" instead of "p" in this case.
<
>
* To work with datasets first choose a list of files. There are different file lists for this tutorial
* for proofcluster with "hh"-Site: /afs/naf.desy.de/group/atlas/ADT11/hh/XX/tutfilelist.txt ( 00 <= XX <= 09 )
* To create a TFileCollection and after that a TDSet do the following in a root session:
{{{
TFileCollection *fc = new TFileCollection("tutfilelist","Tutorial file list");
fc->AddFromFile("/afs/naf.desy.de/group/atlas/ADT11/hh/XX/tutfilelist.txt");
fc->Print();
TDSet *ds = new TDSet("tutset","egamma");
ds->Add( fc->GetList() );
ds->Print();
}}}
* for convenience in your root session, you can call
{{{
.x /afs/naf.desy.de/group/atlas/ADT11/hh/XX/createtutds.C
}}}
. this will execute the commands above.
* we can use {{{ds->Draw()}}} like {{{TTree::Draw}}}
<
>
* now use the dataset and PROOF to
* draw the pt spectrum for all electron (cadidates)
* draw the pt spectrum for all only the "tight" electrons for events with exactly two "tight" electrons
* note the excess at 40 GeV
* try to plot the invariant mass of the "tight" electrons for events with exactly two "tight" electrons
* the selection stays the same like the previous polot
* reminder: Minv = sqrt( Sum(E)^2^ - Sum(px)^2^ -Sum(py)^2^ -Sum(pz)^2^ )
* reminder: in ROOT Sum$(el_px) sums the x-Momentum of _all_ electrons of selected events. To limit to the "tight" electrons use the Trick: {{{Sum$(el_px*el_tight)}}}
* redo the plot above. Only use events with two "tight" electrons and pt of the "tight" electrons > 60 GeV in total
Solution
{{{
root -l /scratch/hh/current/atlas/users//proofcluster/connectproof.C
.x /afs/naf.desy.de/group/atlas/ADT11/hh/XX/createtutds.C
ds->Draw("el_pt");
ds->Draw("el_pt","Sum$(el_tight)==2");
ds->Draw("sqrt( Sum$(el_E*el_tight)^2-Sum$(el_px*el_tight)^2-Sum$(el_py*el_tight)^2-Sum$(el_pz*el_tight)^2 )>>masshisto(200,0,200000)", "Sum$(el_tight)==2");
ds->Draw("sqrt( Sum$(el_E*el_tight)^2-Sum$(el_px*el_tight)^2-Sum$(el_py*el_tight)^2-Sum$(el_pz*el_tight)^2)>>masshisto(200,0,200000)", "Sum$(el_tight)==2&&Sum$(el_pt*el_tight)>60000");
}}}
Remark: You can get the result histograms from the last query with {{{gProof->GetOutputList()->Print()}}} and then {{{gProof->GetOutpoutList()->FindObject("htemp")->Draw()}}}
== PROOF with TSelector ==
For more complex computations, the {{{Draw}}} function reaches its limits. The {{{TSelector}}} class allows write complex with full access to all {{{TTree}}} variables.
A skeleton to process a given {{{TTree}}} can be generated automatically with the {{{TTree::MakeSelector}}} method. to create such a skeleton for the ''egamma''-tree, do the following:
* open an arbitrary file, containing a ''egamma''-tree
* get a pointer to this {{{TTree}}}
* call {{{egamma->MakeSelector()}}}
* this will create the two files egamma.h and egamma.C
* you can give an string as argument to create a selector with a different name <
> (e.g. {{{ egamma->MakeSelector("invMassSelector")}}})
{{{
f = TFile::Open("dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/data10_7TeV/NTUP_EGAMMA/f282_p207/data10_7TeV.00161379.physics_Egamma.merge.NTUP_EGAMMA.f282_p207_tid159706_00/NTUP_EGAMMA.159706._000372.root.1");
egamma = (TTree*) f->Get("egamma");
egamma->MakeSelector();
}}}
In this example the selector skeleton will be modified to plot the invariant mass of "tight" electrons in the egamma tree.
<
> ''Remark: It is always possible to check whether the {{{TSelector}}} is formally correct by trying to compile the selector. This can be done by calling {{{.L egamma.C+}}}. With this, you can check after each of the following modifications, if you did some typos.'' <
>
If you want to fill histograms (or use other variables) in your code, you first need to declare these the header file.
Modify the {{{egamma.h}}} file and add a {{{TH1D*}}} variable in the selector class. You also need include the header files for TH1D, because this header is not included in the skeleton. {{{egamma.h}}}
{{{
[...]
#include
[...]
public:
TH1D * fMasshist;
}}}
The hook {{{SlaveBegin}}} is called before any event is processed. In contrast to the hook {{{Begin}}}, {{{SlaveBegin}}} is called on each worker node. When running with PROOF, each PROOF worker-node has an own instance of the TSelector in the memory. To be able to fill a histogram when looping over all events, on each PROOF node a histogram has to be created.
However, although we create an histogram on each node, we only want to have one result histogram. Luckily, PROOF merges all histograms into one, if we register the histogram to the {{{fOutput}}} list by calling {{{fOutput->Add(fMasshist)}}}.
Find the {{{SlaveBegin}}} hook in the {{{egamma.C}}} file and modify it to contain the following.
{{{
void egamma::SlaveBegin(TTree * /*tree*/)
{
// The SlaveBegin() function is called after the Begin() function.
// When running with PROOF SlaveBegin() is called on each slave server.
// The tree argument is deprecated (on PROOF 0 is passed).
TString option = GetOption();
fMasshist = new TH1D("masshist","Massen Histogramm; m_inv[GeV]",200,0,200000);
fOutput->Add(fMasshist);
}
}}}
The hook {{{Terminate}}} is only called once on the client. In contrast {{{TerminateSlave}}} would be called on each worker node. This hook can be used to post-process the merged output from the PROOF-session. In our case the (parallely) created histogram will be plotted on the screen, but more sophisticated things, like fitting could be done here. Because we want to use the result of the PROOF session, we cannot just use the {{{fMasshisto}}} variable, but have to pick the histogram from the {{{fOuput}}}-list. Additionally, we have to include the {{{TCanvas.h}}} header, because it is not included in the automatically generated header.
Find the {{{Terminate}}} hook in the {{{egamma.C}}} file and modify it to contain the following code. {{{egamma.C}}}
{{{
[...]
#include
[...]
void egamma::Terminate()
{
// The Terminate() function is the last function to be called during
// a query. It always runs on the client, it can be used to present
// the results graphically or save the results to file.
fMasshist = dynamic_cast(fOutput->FindObject("masshist"));
if ( fMasshist != NULL )
{
new TCanvas;
fMasshist->Draw();
}
}
}}}
Finally, we have to implement the code to calculate the invariant mass. The {{{Process}}} hook is called for each event in the dataset. Here, we can do complex calculations and "cut events away" by simply not filling them into the result-histogram. Within the {{{Process}}} function, we can have access to all the variables of the current event. However, although for each {{{TTree}}}-variable, a {{{C++}}} variable exists, these variables are not filled by default.
In {{{Process}}}, we have to call {{{fChain->GetTree()->GetEvent(entry)}}} to read the full event, or {{{b_->GetEntry(entry);}}}. After this the events tree-variables are available in the corresponding C++-variable. Because reading the whole event into the memory might be a huge overhead, the second method is prefered for large trees.
Find the {{{Process}}} hook in the {{{egamma.C}}} file and modify it to contain the following code. {{{egamma.C}}}
{{{
[...]
#include
[...]
Bool_t egamma::Process(Long64_t entry)
{
// The Process() function is called for each entry in the tree (or possibly
// keyed object in the case of PROOF) to be processed. The entry argument
// specifies which entry in the currently loaded tree is to be processed.
// It can be passed to either egamma::GetEntry() or TBranch::GetEntry()
// to read either all or the required parts of the data. When processing
// keyed objects with PROOF, the object is already loaded and is available
// via the fObject pointer.
//
// This function should contain the "body" of the analysis. It can contain
// simple or elaborate selection criteria, run algorithms on the data
// of the event and typically fill histograms.
//
// The processing can be stopped by calling Abort().
//
// Use fStatus to set the return value of TTree::Process().
//
// The return value is currently not used.
//fChain->GetTree()->GetEntry(entry);
b_el_n->GetEntry(entry);
b_el_tight->GetEntry(entry);
int ntight = 0;
for ( int i = 0; i != el_tight->size(); ++i )
if ( (*el_tight)[i] == 1 ) ++ntight;
if (ntight != 2) return kFALSE;
b_el_pt->GetEntry(entry);
b_el_px->GetEntry(entry);
b_el_py->GetEntry(entry);
b_el_pz->GetEntry(entry);
TLorentzVector sumP4(0,0,0,0);
for ( int i = 0; i != el_n; ++i )
{
if ( (*el_tight)[i] == 1 )
{
TLorentzVector elTightP4;
elTightP4.SetXYZM( (*el_px)[i], (*el_py)[i], (*el_pz)[i], .511 );
sumP4 += elTightP4;
}
masshist->Fill( sumP4.M() );
}
return kTRUE;
}
}}}
Remark: Instead of modifying the automatically generated class, one can create a derived class, to implement the actual code of a TSelector. This allows just to generate a new {{{TSelector}}}, if the structure of the tree is changed. The following template might help you to create such a class. However, to use this derived selector, the base selector has to be loaded on each PROOF slave. Because of this, you have to call {{{gProof->Load("egamma.C+")}}} before you can use the {{{myegamma}}} selector.
{{{
#include "egamma.h"
#ifndef _myegamma_h_
#define _myegamma_h_
#include "TH1D.h"
class myegamma : public egamma
{
protected:
TH1D *fMasshist;
virtual void SlaveBegin(TTree *tree);
virtual Bool_t Process(Long64_t entry);
virtual void Terminate();
};
#endif //_myegamma_h_
}}}
{{{
#include "myegamma.h"
void myegamma::SlaveBegin(TTree * /*tree*/)
{
// The SlaveBegin() function is called after the Begin() function.
// When running with PROOF SlaveBegin() is called on each slave server.
// The tree argument is deprecated (on PROOF 0 is passed).
TString option = GetOption();
}
Bool_t myegamma::Process(Long64_t entry)
{
// The Process() function is called for each entry in the tree (or possibly
// keyed object in the case of PROOF) to be processed. The entry argument
// specifies which entry in the currently loaded tree is to be processed.
// It can be passed to either egamma::GetEntry() or TBranch::GetEntry()
// to read either all or the required parts of the data. When processing
// keyed objects with PROOF, the object is already loaded and is available
// via the fObject pointer.
//
// This function should contain the "body" of the analysis. It can contain
// simple or elaborate selection criteria, run algorithms on the data
// of the event and typically fill histograms.
//
// The processing can be stopped by calling Abort().
//
// Use fStatus to set the return value of TTree::Process().
//
// The return value is currently not used.
return kTRUE;
}
void myegamma::Terminate()
{
// The Terminate() function is the last function to be called during
// a query. It always runs on the client, it can be used to present
// the results graphically or save the results to file.
}
}}}