ATLAS-D Tutorial 2011: Distributed Data Management

Links for reference:

DQ2 Setup

DQ2 Setup at the NAF

Log into the NAF and set up DQ2 :

ini dq2

This will automatically setup the Grid UI. Usually, you don't want to mix this with your athena setup.

For some dq2 commands and operations you need a valid Grid proxy. Either use the autoproxy service:

ini autoproxy

or create a new Grid proxy, if your old one expired:

voms-proxy-info --all 
voms-proxy-init --voms atlas:/atlas/de --valid 96:00

DQ2 Setup from anywhere else

Assuming you run on a machine with /afs access.

source /afs/cern.ch/atlas/offline/external/GRID/ddm/DQ2Clients/setup.sh

This will automatically setup the Grid UI and DQ2. Then:

voms-proxy-init --voms atlas:/atlas/de --valid 96:00
voms-proxy-info --all 

Voila ! You are ready to use DQ2.

Basic commands

You can access to all DQ2 commands by typing dq2-+Tab. There is a long list of commands. Don't worry, you only need a few of them.

dq2-whoami/dq2-finger

Let's first try dq2-whoami/dq2-finger :

dq2-whoami  

efeld

dq2-finger  

nickname   : efeld
dn         : /O=GermanGrid/OU=DESY/CN=Wolfgang Ehrenfeld
email      : wolfgang.ehrenfeld@cern.ch

dq2-finger returns some information on you. This is what is stored in DQ2 Database. All these information are taken from VOMS (https://lcg-voms.cern.ch:8443/vo/atlas/vomrs). Please check you have a nickname, if you don't, set it up on the VOMS page. The nickname is important because it is used by Distributed analysis tools to create user datasets (these datasets have pattern user.nikname.xxxx). Please check also that your email is a valid one. This email will be used by DQ2 to send you notifications (e.g. if some of your files have been lost).

dq2-ls

This is probably the DQ2 tool you'll use all the time. This is to list dataset/containers, files in datasets... There are plenty of options that can be listed with :

dq2-ls -h

All DQ2 tools have such help option. Do not hesitate to use this.

Let's try to list some datasets :

dq2-ls data10_7TeV.00161272.physics_CosmicCalo.merge.AOD* 

dq2-ls data10_7TeV.00161272.physics_CosmicCalo.merge.AOD.f282_m573 
dq2-ls data10_7TeV.00161272.physics_CosmicCalo.merge.AOD.x36_m573

As you can see dq2-ls supports wildcards. In this case, DQ2 returns 2 datasets. You can use AMI to understand the difference between them. If no dataset exist with the pattern specified, dq2-ls returns nothing.

dq2-ls data10_7TeV.01161272.physics_CosmicCalo.merge.AOD*

Let's do an other try :

mc10_7TeV.105200.T1_McAtNlo_Jimmy.*AOD.e844_s*

mc10_7TeV.105200.T1_McAtNlo_Jimmy.recon.AOD.e844_s933_s946_r2302/
mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300_tid457301_00
mc10_7TeV.105200.T1_McAtNlo_Jimmy.recon.AOD.e844_s933_s946_r2302_tid457248_00
mc10_7TeV.105200.T1_McAtNlo_Jimmy.recon.AOD.e844_s933_s946_r2302_tid457285_00
mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/
mc10_7TeV.105200.T1_McAtNlo_Jimmy.recon.AOD.e844_s933_s946_r2302_tid457247_00
mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300_tid457303_00
mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300_tid457302_00

Here, dq2-ls returns many datasets. The one ending with a slash / are containers. They should be used. The contain the datasets containing _tid in the name. Do you know the difference between recon.AOD and merge.AOD?

You can get more information about a datasets (for example owner or state) from the meta data:

dq2-get-metadata mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300_tid457301_00

Once a (tid) dataset is frozen, it will be added to the dataset container mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/ (notice the trailing /). If you are interested to list all the datasets in a given container, you can do:

dq2-list-datasets-container mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/

In ATLAS, most of the datasets/containers are not on a single site, but they can have multiple replicas. To list the replicas for datasets you can do :

dq2-ls -r  mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/
dq2-list-dataset-replicas  mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300_tid457303_00 --all
dq2-list-dataset-replicas-container mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/

Now, you might be interested to see the list of files in the dataset. This can be done with the following options : * -f: list files with LFN in dataset * -f -p: list files with PFN in dataset (needs correct DQ2 site via -L) * -f -L DQ2SITE: list files with LFN and checks if file is available at DQ2SITE

Try out the following commands and try to spot the difference in the output.

dq2-ls -f  mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/
dq2-ls -f -L FZK-LCG2_DATADISK  mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/
dq2-ls -f -p mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/
dq2-ls -f -p -L FZK-LCG2_DATADISK mc10_7TeV.105200.T1_McAtNlo_Jimmy.merge.AOD.e844_s933_s946_r2302_r2300/

So now, you know how to list dataset/container replicas and also how to list files in a dataset/container. You might be interested to download some of these datasets. This can be done via dq2-get.

dq2-get

Once again, use dq2-get -h to get all possible options. Let's try to download a dataset. For the tutorial we will use an example dataset with only 6 small files to avoid overloading the Grid storage element hosting these files.

dq2-ls user.efeld.test.dummyDS.*

List the files in the dataset and download either the first or second file:

dq2-ls -f user.efeld.test.dummyDS.20110919
dq2-get -f efeld.dummyfile.20110919.1MB.4 user.efeld.test.dummyDS.20110919

dq2-get creates a directory with the name of the dataset and downloads the files into it.

Again, be cautious as much as possible about how much you download. Use the -f options as often as possible. dq2-get will check the current directory for existing files and will not download existing files again.

dq2-put

dq2-put is a tool to upload files on a Grid area to make the files visible from other users.

mkdir dummy 
cd dummy
for i in `seq 0 5` ; do  echo $i ; dd if=/dev/urandom of=mynickname.dummyfile.20110920.1MB.$i bs=100000 count=10 ; done 
cd - 
dq2-put -d -L DESY-HH_SCRATCHDISK -s dummy user.mynickname.test.dummyDS.20110920

Check that the dataset is visible in DQ2 using the commands we used before. If you don't plan to add new files to the dataset, you can freeze it :

dq2-freeze-dataset user.mynickname.test.dummyDS.20110920

To freeze a dataset is highly recommended before you subscribe it to an other site. If you don't freeze it, it will be automatically frozen after 10 days. If you don't need your dataset anymore, you can delete the replica you created :

dq2-delete-replicas user.mynickname.test.dummyDS.20110920 DESY-HH_SCRATCHDISK

This will delete the dataset replica. If the replica was the last one on the Grid, the dataset definition will be erased. BTW, you should able only to delete dataset replicas that belongs to you. If you try :

dq2-delete-replicas user.efeld.test.dummyDS.20110919 DESY-HH_SCRATCHDISK

You should receive an error message.

/!\ This is important to remember that SCRATCHDISK sites are scratch area, which means that the datasets here will stay not longer than 30 days (can be even less if the site gets full) ! If you want to persistify your dataset on the Grid, you need to move it on a permanent storage (LOCALGROUPDISK)

DATRI

DATRI is a Data Transfer Request Interface. It allows to request the replication of dataset from a site A to a site B. First thing to do is to register you in DATRI. This needs then to be approved, it can takes some time. Once you have been approved, you can request replication of your preferred datasets. Every German user are allowed to request replication to German LOCALGROUPDISK sites. All requests below 500 GB are automatically approved. Requests above 500 GB need approval by a DDM responsible.

A dataset replication can also be done manually with DQ2 commands, but this is highly NOT RECOMMENDED, since it requires some advanced knowledge about DQ2/Grid. If you try this and your replication is not processed correctly, don't expect any support from DDM people.

ATLAS: WorkBook/NAF/ADT11DDM (last edited 2011-09-19 19:02:23 by WolfgangEhrenfeld)