ATLAS-D Tutorial 2009: DQ2
Links
Exercises
DQ2 setup
Log into the NAF again and set up dq2:
ini dq2
This will automatically setup the Grid UI. Usually, you don't want to mix this with your athena setup.
For some dq2 commands and operations you need a valid Grid proxy. Either use the autoproxy service:
ini autoproxy
or create a new Grid proxy, if your old one expired:
voms-proxy-info --all voms-proxy-init --voms atlas:/atlas/de --valid 96:00
dq2-ls
dq2-ls is the primary dq2 tool for users to get information about datasets. It can tell you, if a dataset exists:
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444 dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060443
dq2-ls lists the dataset name if it exists. The same is also true for container datasets:
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t52/
dq2-ls understands wildcards (only *), while most other DQ2 tools don't.
dq2-ls mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53*
will list the dataset container and two tid datasets.
See dq2-ls -h for all available options. Here are some useful ones for peaking into a dataset:
- -f: list files with LFN in dataset
- -f -p: list files with PFN in dataset (needs correct DQ2 site via -L)
- -f -L DQ2SITE: list files with LFN and checks if file is available at DQ2SITE
Try out the following commands and try to spot the difference in the output.
dq2-ls -f mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ dq2-ls -f -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ dq2-ls -f -p mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ dq2-ls -f -p -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
The last command gives you the SRM of each file, which can be used to access these files locally at the NAF.
replicas
tid datasets are located at one or more Grid sites. The -r option of dq2-ls can be used to get this information or the dedicated dq2 commands dq2-list-dataset-replicas and dq2-list-dataset-replicas-container:
dq2-ls -r mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444 dq2-list-dataset-replicas mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060444 dq2-ls -r mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/ dq2-list-dataset-replicas-container mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53/
You see, that you can do a lot with dq2-ls.
dq2-get
But you can not download files from a dataset with dq2-ls. Here, you need to use dq2-get, which usually tries to guess the best parameters, e. g. site and access protocol, for the download. See dq2-get -h for all available options. Plain dq2-get will download the complete dataset or container dataset, which can be a lot of files. Be careful. With the option -f you can select a list of files for download.
For the tutorial we will use an example dataset with only two small files to avoid overloading the Grid storage element hosting these files.
Search for a user dataset from Wolfgang Ehrenfeld including tutorial and test:
dq2-ls user09.WolfgangEhrenfeld*tutorial*test*
List the files in the dataset and download either the first or second file:
dq2-ls -f user09.WolfgangEhrenfeld.test_for_tutorial.test.TEST.v0 dq2-get -f test.v0.txt.1 user09.WolfgangEhrenfeld.test_for_tutorial.test.TEST.v0
dq2-get creates a directory with the name of the dataset and downloads the files into it.
Again, be cautious as much as possible about how much you download. Use the -f options as often as possible. dq2-get will check the current directory for existing files and will not download existing files again.
processing files locally from a dataset
If you want to develop code, you usually need one input file for testing. For best performance you should download it using dq2-get. In some caes you want to increase your statistics, but do not want or can not download all necessary files. And you do not want to run on the Grid. Well, you can directly access the files from the DESY-HH and DESY-ZN sites locally in the NAF using either ROOT or athena.
Start from a new shell and create a PoolFileCatalog for your dataset:
dq2-ls -f -p -P -G -L DESY-ZN_MCDISK mc08.105410.GMSB1_jimmy_susy.merge.AOD.e352_s462_r635_t53_tid060445
This will create a file PoolFileCatalog.xml in the current directory. There, the LFN is mapped to the PFN in the local access protocol, which should be dcap. Have a look into the file.
Now, we can do a simple test and look at the MC information of the first two events using athena. First, create the job options to read 2 events from file AOD.060445._00002.pool.root.1 and call the PrintMC algorithm:
cat > mcdump.py << EOF ############################# ## get a handle to the application manager from AthenaCommon.AppMgr import ServiceMgr as svcMgr ## load pool support import AthenaPoolCnvSvc.ReadAthenaPool ## input svcMgr.EventSelector.InputCollections = [ 'LFN:AOD.060445._00002.pool.root.1' ] theApp.EvtMax = 2 ### get top sequence from AthenaCommon.AlgSequence import AlgSequence topSequence = AlgSequence() ### include print MC from TruthExamples.TruthExamplesConf import PrintMC printMC = PrintMC() printMC.McEventKey = 'GEN_AOD' topSequence += printMC ########## EOF ############### EOF
Plese note the leading LFN: before the LFN to tell athena to look up the PFN for this file using the PoolFileCatalogue.
And now run athena with these job options:
source ~/cmthome/setup.sh -tag=15.3.1 athena.py mcdump.py
The output is quite long. You can either pipe the output into less or into a file.