#acl EditorsGroup:read,write NAFEditorsGroup:read,write All:read

= ATLAS-D Tutorial 2009: DQ2 =

== Links ==

 * https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedDataManagement
 * https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Clients
 * https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2Tutorial
 * https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo

== Exercises ==

=== DQ2 setup ===

Log into the NAF again and set up dq2:
{{{
ini dq2
}}}
This will automatically setup the Grid UI. Usually, you don't want to mix this with your athena setup.

For some dq2 commands and operations you need a valid Grid proxy. Either use the autoproxy service:
{{{
ini autoproxy
}}}
or create a new Grid proxy, if your old one expired:
{{{
voms-proxy-info --all
voms-proxy-init --voms atlas:/atlas/de --valid 96:00
}}}

=== dq2-ls ===

`dq2-ls` is the primary dq2 tool for users to get information about datasets. It can tell you, if a dataset exists:
{{{
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060443
}}}
`dq2-ls` lists the dataset name if it exists. The same is also true for container datasets:
{{{
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t52/
}}}

`dq2-ls` understands wildcards (only *), while most other DQ2 tools don't.
{{{
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53*
}}}
will list the dataset container and two tid datasets.

See `dq2-ls -h` for all available options. Here are some useful ones for peaking into a dataset:
 * -f: list files with LFN in dataset 
 * -f -p: list files with PFN in dataset (needs correct DQ2 site via -L)
 * -f -L DQ2SITE: list files with LFN and checks if file is available at DQ2SITE

Try out the following commands and try to spot the difference in the output.
{{{
dq2-ls -f mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
dq2-ls -f -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
dq2-ls -f -p mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
dq2-ls -f -p -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
}}}
The last command gives you the SRM of each file, which can be used to access these files locally at the NAF.

=== replicas ===

tid datasets are located at one or more Grid sites. The -r option of `dq2-ls` can be used to get this information or the dedicated dq2 commands
`dq2-list-dataset-replicas` and `dq2-list-dataset-replicas-container`:

{{{
dq2-ls -r mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444
dq2-list-dataset-replicas mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444
dq2-ls -r mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
dq2-list-dataset-replicas-container mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ 
}}}
You see, that you can do a lot with `dq2-ls`.

=== dq2-get ===

But you can not download files from a dataset with `dq2-ls`. Here, you need to use `dq2-get`, which usually tries to guess the best parameters, e. g. site and access protocol, for the download. See `dq2-get -h` for all available options. Plain `dq2-get` will download the complete dataset or container dataset, which can be a lot of files. Be careful. With the option -f you can select a list of files for download. 

For the tutorial we will use an example dataset with only two small files to avoid overloading the Grid storage element hosting these files.

Search for a user dataset from Wolfgang Ehrenfeld including tutorial and test:
{{{
dq2-ls user09.WolfgangEhrenfeld*tutorial*test*
}}}

List the files in the dataset and download either the first or second file:
{{{
dq2-ls -f user09.WolfgangEhrenfeld.test_for_tutorial.test.TEST.v0
dq2-get -f test.v0.txt.1 user09.WolfgangEhrenfeld.test_for_tutorial.test.TEST.v0
}}}
`dq2-get` creates a directory with the name of the dataset and downloads the files into it.

Again, be cautious as much as possible about how much you download. Use the `-f` options as often as possible. `dq2-get` will check the current directory for existing files and will not download existing files again.


=== processing files locally from a dataset ===

If you want to develop code, you usually need one input file for testing. For best performance you should download it using `dq2-get`. In some caes you want to increase your statistics, but do not want or can not download all necessary files. And you do not want to run on the Grid. Well, you can directly access the files from the `DESY-HH` and `DESY-ZN` sites locally in the NAF using either `ROOT` or `athena`.

Start from a new shell and create a `PoolFileCatalog` for your dataset:
{{{
dq2-ls -f -p -P -G -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060445
}}}

This will create a file `PoolFileCatalog.xml` in the current directory. There, the LFN is mapped to the PFN in the local access protocol, which should be dcap. Have a look into the file.

Now, we can do a simple test and look at the MC information of the first two events using `athena`. First, create the job options to read 2 events from file `AOD.060445._00002.pool.root.1` and call the `PrintMC` algorithm:
{{{
cat > mcdump.py << EOF
#############################
## get a handle to the application manager
from AthenaCommon.AppMgr import ServiceMgr as svcMgr

## load pool support
import AthenaPoolCnvSvc.ReadAthenaPool

## input
svcMgr.EventSelector.InputCollections = [ 'LFN:AOD.060445._00002.pool.root.1' ]
theApp.EvtMax = 2

### get top sequence
from AthenaCommon.AlgSequence import AlgSequence
topSequence = AlgSequence()

### include print MC
from TruthExamples.TruthExamplesConf import PrintMC
printMC = PrintMC()
printMC.McEventKey = 'GEN_AOD'
topSequence += printMC

########## EOF ###############
EOF

}}}
Plese note the leading `LFN:` before the LFN to tell athena to look up the PFN for this file using the `PoolFileCatalogue`.

And now run athena with these job options:
{{{
source ~/cmthome/setup.sh -tag=15.3.1
athena.py mcdump.py
}}}
The output is quite long. You can either pipe the output into less or into a file.