Download presentation
Presentation is loading. Please wait.
Published byKory May Modified over 8 years ago
1
Distributed Analysis Tutorial Dietrich Liko
2
Overview Three grid flavors in ATLAS EGEE OSG Nordugrid Distributed Analysis Activities GANGA/LCG PANDA/OSG Other tools How to access to the grid ? Certificate VOMS How to find your data ? Where is the data stored Which data is really available ?
3
Three grids …. Grids have different middleware Different software to submit jobs Different catalogs to store the data We have to aim to hide this differences from the ATLAS user
4
EGEE Job submission via LCG Resource Broker The new gLite RB is on its way … LFC File catalog Also CondorG submission is possible Requires some expertise and has no support from the service provider
5
Resource Broker Model RB CE
6
OSG/Panda PANDA is an integrated production and distributed analysis system Pilot job based and similar to DIRAC & Alien Simple File Catalogs at sites Again CondorG submission possible
7
Panda Model Task queue CE
8
Nordugrid ARC middleware for job submission Powerful and simple RLS Filecatalog At this time mainly production and not yet a place for general ATLAS Distributed Analysis
9
ARC Model CE
10
How can we live with that ? Data management layer to hide this differences – Don Quixote 2 Tools that aim to hide the difficulties to submit jobs pathena/PANDA on OSG GANGA on LCG In the future better interoperability On level of the ATLAS tools On the level of the middleware
11
Distributed Analysis Data Analysis AOD & ESD analysis TAG based analysis pathena/PANDA GANGA/LCG User Production Prodsys LJSF GANGA (DQ2 Integration)
12
pathena/PANDA Lightweight client Integrated to Athena release Very nice work A lot of work has been done to support better user jobs Short queues, multitasking pilots etc. A large set of data is available Available since some time Tadashi will tell you more about it
13
GANGA/LCG Text UI & GUI A pathena-like interface is available Multiple backends LCG/EGEE LSF – works also with CAT queues PBS And others
14
Progress on LCG Many datasets available at CERN and LYON Job priorities and short queues are being implemented Short queue: CERN, LYON, NIKHEF, FZK, RAL and some Tier-2 Priorities: NIKHEF, CERN, IFIC (PPS) As of today one can perform distributed analysis at CERN and in LYON We hope that within this year all the other Tier-1 centers and some Tier-2’s will follow See later this week in the Tier1/Tier-2 coordination
15
GANGA Status Significant developments over summer Data available at CERN and LYON, GANGA would work on most sites Short queues/priorities Full DQ2 integration Transparent access to local resources (e.g. CAT queues) Still in the pipeline Move data and priorities to all Tier-1’s Get the gLite Resource Broker into production Start iterations with users
16
Tools for simulation GANGA (see later today) LJSF Prodsys Executor Condor based submission systems
17
Dashboard Monitoring We are setting up a framework to monitor distributed analysis jobs MonaLisa based (OSG, LCG) RGMA Imperial collage DB Production system We plan to instrument submission system to be able to understand their usage
18
Since September 1 st …
19
Login to the grid grid-proxy-init Basic access as of today voms-proxy-init –voms atlas Can give access to special rights Today: Job Priorities on LCG to separate Production from Analysis
20
How to find out which data exists AMI Metadata http://ami3.in2p3.fr:8080/AMI/ Prodsys database http://cern.ch/atlas-php/DbAdmin/Ora/php- 4.3.4/proddb/monitor/Datasets.php http://cern.ch/atlas-php/DbAdmin/Ora/php- 4.3.4/proddb/monitor/Datasets.php Dataset browser http://gridui02.usatlas.bnl.gov:25880/server/pa ndamon/query?overview=dslist http://gridui02.usatlas.bnl.gov:25880/server/pa ndamon/query?overview=dslist
21
How to access data ? Download with dq2_get, analyze locally Works now, is not scalable Data is distributed on sites, jobs are send to sites to analyze the data DA wants to promote this way of working
22
Dataset distribution In principle data should be everywhere AOD & ESD during this year ~ 30 TB max Three steps Not all data can be consolidated Other grids, Tier-2 Distribution between Tier-1 not yet perfect Distribution to Tier-2’s can only be the next step
23
CSC11 AOD Data at Tier-1 DatasetsCompleteFilesSize ASGC96145634520 BNL226131170531736 CERN253106166101712 CNAF1021739 FZK1631510172 LYON72136518786 RAL10158993 SARA2025138 PIC73916105 TRIUMF7351077
24
CSC11 ESD Data at Tier-1 DatasetsCompleteFilesSize ASGC62633442721 BNL14196114949507 CERN99743724569 CNAF21163213 FZK0000 LYON1011 RAL80403428 SARA1011 PIC10147193 TRIUMF5255518
25
Monitoring of transfers
26
Dataset conclusion AOD Analysis at BNL, CERN, LYON ESD Analysis only at BNL We have still to work hard to complete the “collection” of data We have to push hard to achieve equal distribution between sites Nevertheless: Its big progress to some month ago!
27
Dataset details BNL http://www.usatlas.bnl.gov/~dial/atprod/v alidation/html/bnl_datasets.html http://www.usatlas.bnl.gov/~dial/atprod/v alidation/html/bnl_datasets.html CERN http://lapp.in2p3.fr/atlas/Informatique/Offli ne/CERNCAF_csc11/AOD/list_CC.html http://lapp.in2p3.fr/atlas/Informatique/Offli ne/CERNCAF_csc11/AOD/list_CC.html LYON http://lapp.in2p3.fr/atlas/Informatique/Offli ne/LYONDISK_csc11/AOD/list_CC.html http://lapp.in2p3.fr/atlas/Informatique/Offli ne/LYONDISK_csc11/AOD/list_CC.html
28
DQ2 end user tools dq2_ls List dataset and files dq2_get Download a dataset dq2_put Create a dataset dq2_poolFCjob0 Create a PoolFileCatalog to locally access data Details: https://uimon.cern.ch/twiki/bin/view/Atlas/UsingDQ2 https://uimon.cern.ch/twiki/bin/view/Atlas/UsingDQ2
29
Lets try out dq2 end user tools Login on lxplus source /afs/cern.ch/project/gd/LCG- share/sl3/etc/profile.d/grid_env.sh alias dq2 = /afs/cern.ch/atlas/offline/external/GRID/ddm/pro02/dq2 source /afs/usatlas.bnl.gov/Grid/Don- Quijote/dq2_user_client/setup.sh.CERN
30
Summary Several tools are available to perform Distributed Analysis Integrated with DQ2 Data is being collected and also distributed Still a lot of work in front of us We learn how to handle user jobs Job Priorities on LCG Multitasking pilots in PANDA
31
Next steps Increase the number of sites We have to push getting the data at all Tier-1. They are the backbone of the ATLAS data distribution Interoperability Will for sure be an issue for the next software week
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.