Presentation is loading. Please wait.

Presentation is loading. Please wait.

28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of.

Similar presentations


Presentation on theme: "28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of."— Presentation transcript:

1 28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of course always improving! :-) ) fast, some of the infos could be already old Many thanks to all the previous presentations and tutorials on the same item, from which lots of slides were taken

2 28 Nov 2007 Alessandro Di Girolamo 2 Outline The ATLAS Computing Model ATLAS data movement policy The DDM system The ATLAS dashboard dq2 enduser tools Exercises Conclusions

3 28 Nov 2007 Alessandro Di Girolamo 3 ATLAS Trigger & DAQ

4 28 Nov 2007 Alessandro Di Girolamo 4 Data Processing stages The experiment and the simulation

5 28 Nov 2007 Alessandro Di Girolamo 5 Data replication and distribution Redundancy has been foreseen to optimize data access for reconstruction (Tier1) and analysis (Tier2/3) Data Replication is centralized and automatic RAW –Original data at Tier0 –1 complete replica distributed over the Tier1s ESD –ESD for first reconstruction at Tier0, 2 full copies over the Tier1s –ESD reprocessed as “first” ESD (2 copies over the Tier1s) AOD –Full replica on each Tier1 –Replicated on Tier2: at least one full AOD set on the T2s of each cloud Each Tier2 can specify datasets more relevant for its users TAG –TAG databases replicated on each Tier1 (Oracle) –TAG databases partially replicated on Tier2 (root files) In each Tier2 all TAG root files of the AOD hosted on the site

6 28 Nov 2007 Alessandro Di Girolamo 6 ATLAS Tiers Structure Event Filter Farm at CERN –Located near the Experiment, assembles data into a stream to the Tier 0 Center Tier 0 Center at CERN –Prompt first-pass processing on expr/calib/phys streams with old calibrations –Raw data -> Mass storage at CERN and to Tier 1 centers –24-48 h later, process full physics data streams with new calib constants –Ship ESD, AOD to Tier 1 centers -> Mass storage at CERN CERN Analysis Facility –Analysis with limited access to ESD and RAW/calibration data on demand Tier 1 Centers distributed worldwide (10 centers) –Re-reconstruction of raw data, producing new ESD, AOD –Reprocess (1/2months) all resident RAW with better calibration and sw –Scheduled, group access to full ESD and AOD Tier 2 Centers distributed worldwide (+30 centers) –Monte Carlo Simulation, producing ESD, AOD. Move ESD, AOD ->Tier 1 –On demand user physics analysis of shared datasets Tier 3 Centers distributed worldwide –Physics analysis –Data private and local, summary datasets

7 28 Nov 2007 Alessandro Di Girolamo 7 The Cloud Model: TiersOfATLAS ATLAS Tier1s and Tier2s are logically organized in CLOUDS: –Mostly this reflects geography and EGEE ROC organization, but it’s not always the case –Mostly driven by network topology A Cloud include one Tier1 and several Tier2 –Every Tier1 and Tier2 provides both CPU and Storage capacity for ATLAS The Tier1s run central services for the Cloud, like the DDM Site Services (that we’ll describe later) for dataset subscription handling http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/ddm/releases/TiersOfATLASCache.py

8 28 Nov 2007 Alessandro Di Girolamo 8 an example: the Italian Cloud (IT)SE CE MILANO(Tier2) SE CE SE CE SE CE LOCALLFC FTS SECE CNAF (Tier1) VOBOX ROMA1(Tier2)NAPOLI(Tier2)LNF(Tier2)

9 28 Nov 2007 Alessandro Di Girolamo 9 Data Movement Policy An efficient system requires –An efficient placement of data –Multiple replicas of the data –Jobs going to the data Tools (i.e. Ganga…) steer jobs to closer Ces Each Tier1 should serve “its” Tier2 data needs –Full AOD set and group Derived Physics Datasets (DPD) in each cloud This requires that tools place data quickly and efficiently: if that fails, data movements by individuals makes the situation worse for everyone

10 28 Nov 2007 Alessandro Di Girolamo 10 User Data Movement Policy Users need to access the files they produce –They need ATLAS data tools on Tier2/3s Risk: some users may attempt to move large data volumes to T2 or T3: –SE overload –Network congestion –Order of 10 GB/day/user: who cares? 50 GB/day/user: rate throttled 1/10TB/day/user: User throttled –Planned large movements possible if negotiated

11 28 Nov 2007 Alessandro Di Girolamo 11 Distributed Data Management Managing ATLAS data on a PetaByte scale Responsibility and Objective of ATLAS DDM (dq2): Bookkeeping of all ATLAS file-based (datasets) experiment and user data Manage movement of data across sites: –automatic data transfer mechanism using distributed site services Enforcing access control and quotas Principal features: Scalable global data discovery and access via catalog hierarchy: datasets as collection of logical files Data movement based on Subscriptions to datasets

12 28 Nov 2007 Alessandro Di Girolamo 12 DDM components Simplified DDM schema DQ2 dataset catalogs DQ2 “Queued Transfers” Local File Catalog File Transfer Service DQ2 Subscription Agents Part of DQ2 Not part of DQ2

13 28 Nov 2007 Alessandro Di Girolamo 13 DDM catalogs One central and multiple regional catalog instances Dataset Name: Dataset Content Catalog Dataset Name: Content: Dataset Location Catalog Dataset Location: CNAF,LYONCNAF,LYON CENTRAL CATALOGS LOCAL CATALOGS CNAF LFCCNAF LFC Entries:

14 28 Nov 2007 Alessandro Di Girolamo 14 Dataset An aggregation of data (one or more physical file), which are processed together and serve collectively as input or output of a computation or data acquisition process –Flexible definition (i.e. grouping related data, data movement purpose, …) Expected O(10 3 ) new datasets per day Dataset Versions –Very useful to track changes in data (keeping same dataset name) –Files can be removed or added between different versions Dataset States: –Open: latest version is open, so new files may be added or existing files may be removed –Closed: latest version is closed, no changes can be done, but a new version may be created –Frozen: latest version is closed, No versions may be added

15 28 Nov 2007 Alessandro Di Girolamo 15 Dataset Subscription A site can subscribe to data –Dataset A is present in site Y but not in site X –X subscribe to dataset A –A is transferred to site X and registered properly in catalogs Site ‘X’: Dataset ‘A’ Dataset ‘A’ | Site ‘X’ Site ‘Y’: Subscriptions: Does not contain Dataset A Contains Dataset A evgen 2evgen 1

16 28 Nov 2007 Alessandro Di Girolamo 16 Subscription of Open/Closed/Frozen datasets If you subscribe a Closed/Frozen version of a dataset A to site X –Files will be transferred to the site –The subscription will be honored and disregarded –Destination version dataset will be COMPLETE at the site If you subscribe an Open dataset A to the site X –Files will be transferred to the site –The subscription will remain ACTIVE If new files are added to the dataset and stored in Y, they will be streamed to X –The destination version dataset will be INCOMPLETE at the site even if all files are there Expiration time of Subscriptions

17 28 Nov 2007 Alessandro Di Girolamo 17 The Dashboard The Dashboard Project: –Started inside the ARDA group of the EGEE/LCG project in 2005 –Evolved into a python framework providing a set of flexible tools allowing coverage of other Grid application areas –The framework: Data Access Layer Service configuration (agents) Web application Command line tools APIs –Easy access to the information HTTP query interface Output in HTML, XML, CSV for integration with external tools –The Dashboard currently cover Job Monitoring (for all LHC experiment + VLEMED/Biomed) (ATLAS url )url Data management Site efficiency/reliability …. Others –The ATLAS DashboardThe ATLAS Dashboard http://dashboard.cern.ch /

18 28 Nov 2007 Alessandro Di Girolamo 18 The ATLAS DDM Dashboard Main focus on ATLAS specific services (dq2 system), receiving infos from the different agents via HTTP callbacks –Transfer state changes –Dataset complete –Transfer complete –Transfer/registration errors But also on grid fabric services –Data management related services up and running –Storage space availability Data is put together in a structured way –Oracle database at CERN Different tools (agents) responsible for generating statistics and metrics Two different instances of the DDM Dashboard: –Tier0 –Production ProductionProduction & Tier0Tier0

19 28 Nov 2007 Alessandro Di Girolamo 19 The ATLAS DDM Dashboard Serves different sets of use cases, coming from different types of users Site/system operators –How is the overall system doing? –How is site X doing? –What is the most common error, and what is triggering it? End Users / production coordinators How much data? A lot! –Millions of file transfers, each reporting different steps Average week means 2 million hits –Especially critical when systems misbehave –Lot of work on partitioning the data, optimizing the db and the web server setup (Apache) ProductionProduction & Tier0Tier0

20 28 Nov 2007 Alessandro Di Girolamo 20 ATLAS DDM Dashboard example 1

21 28 Nov 2007 Alessandro Di Girolamo 21 ATLAS DDM Dashboard example 2

22 28 Nov 2007 Alessandro Di Girolamo 22 ATLAS DDM Dashboard example 3

23 28 Nov 2007 Alessandro Di Girolamo 23 ATLAS DDM Dashboard example 4

24 28 Nov 2007 Alessandro Di Girolamo 24 ATLAS DDM Dashboard example 5

25 28 Nov 2007 Alessandro Di Girolamo 25 ATLAS DDM Dashboard example 6

26 28 Nov 2007 Alessandro Di Girolamo 26 ATLAS DDM Dashboard example 7

27 28 Nov 2007 Alessandro Di Girolamo 27 ATLAS DDM Dashboard example 8

28 28 Nov 2007 Alessandro Di Girolamo 28 ATLAS DDM Dashboard example 9

29 28 Nov 2007 Alessandro Di Girolamo 29 ATLAS DDM Dashboard example 10

30 28 Nov 2007 Alessandro Di Girolamo 30 ATLAS DDM Dashboard example 11

31 28 Nov 2007 Alessandro Di Girolamo 31 ATLAS DDM Dashboard example 12

32 28 Nov 2007 Alessandro Di Girolamo 32 ATLAS DDM Dashboard example 13

33 28 Nov 2007 Alessandro Di Girolamo 33 ATLAS DDM Dashboard example 14

34 28 Nov 2007 Alessandro Di Girolamo 34 ATLAS DDM Dashboard example 15

35 28 Nov 2007 Alessandro Di Girolamo 35 On the Dashboard User Guide a lot of examplesUser Guide ATLAS DDM Dashboard API example Here an example used often for debugging purposes: /afs/cern.ch/user/d/digirola/public/tutorial_infn/dq2_check/SURL_errors.py #!/usr/bin/python.. transferErrors = dataQuery.listFileEvents(['CNAFTAPE'],states=['ATTEMPT_DONE'],startDate=datetime.now()- timedelta(hours=24),endDate=datetime.now(), limit=100) …. … source the dashboard env: source /afs/cern.ch/sw/arda/dashboard/atlas/setenv.sh and run simply python SURL_errors.py

36 28 Nov 2007 Alessandro Di Girolamo 36 Users view on DDM Central issue: HOWTO I analyze “my” data? –N.b. Users should not move data!! End-user tools (by Tadashi Maeno) provide a subset of DDM functionality: –Quick access to dq2 datasets for end-users but not for production activities Data Access: –Users are strongly encouraged to register their output data(sets) onto dq2 –Users can upload data(sets) onto SE Be careful, not always scripts to perform consistency check and cleanup are present. A lot of work have been done, but still there is a lot of other important things to do

37 28 Nov 2007 Alessandro Di Girolamo 37 Exercise on Searching for your dataset Checking the contents of a dataset Navigating the dataset browser Accessing local data Copying over remote data Putting your data on the Grid

38 28 Nov 2007 Alessandro Di Girolamo 38 On the twiki https://twiki.cnaf.infn.it/cgi-bin/twiki/view/Sandbox/DDMEndUserTutorialCNAF

39 28 Nov 2007 Alessandro Di Girolamo 39 Set up your environment Log on a UI –ssh ui01-lcg.cr.cnaf.infn.it and/or ssh lxplus.cern.ch Set up the Grid environment source /afs/cern.ch/project/gd/LCG- share/sl4/etc/profile.d/grid_env.sh Get a valid proxy –voms-proxy-init -voms=atlas Setup dq2 environment –source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN

40 28 Nov 2007 Alessandro Di Girolamo 40 Check your LFC catalog Which LFC are you using? –Remember you could check on ToA! –LFC @ CNAF for local users lfc.cr.cnaf.infn.it –echo $LFC_HOST –export LFC_HOST=lfc.cr.cnaf.infn.it –LFC @ CERN for lxplus users prod-lfc-atlas-local.cern.ch –echo $LFC_HOST –export LFC_HOST= prod-lfc-atlas-local.cern.ch Test the LFC catalog –lfc-ls -l /grid/atlas/

41 28 Nov 2007 Alessandro Di Girolamo 41 Searching datasets dq2_ls –-f option to list files in LRC –-g list file in global file catalog –-r list replica sites

42 28 Nov 2007 Alessandro Di Girolamo 42 Browsing dataset The dataset browser: http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?overview=dslist –Choose a site –Select a project –…. Contacts at CERN and at CNAF: –use the contacts!

43 28 Nov 2007 Alessandro Di Girolamo 43 Browsing dataset

44 28 Nov 2007 Alessandro Di Girolamo 44 Browsing dataset

45 28 Nov 2007 Alessandro Di Girolamo 45 Browsing dataset

46 28 Nov 2007 Alessandro Di Girolamo 46 Request Subscription Subscriptions are centralized –Jobs are going to the data, not data to the job!! –You can create you own subscriptions that will be analyzed and eventually scheduled for approval http://gridui02.usatlas.bnl.gov:25880/server/pandamon/query?mode=reqsubs0

47 28 Nov 2007 Alessandro Di Girolamo 47 Request Subscription

48 28 Nov 2007 Alessandro Di Girolamo 48 Getting files from datasets If you really cannot wait!! Try to get all files from a dataset: dq2_get copies the local stored files to the local working dir –mkdir local –cd –dq2_ls -f …. –dq2_get …

49 28 Nov 2007 Alessandro Di Girolamo 49 Getting only some files dq2_get -h –Dataset name lfn1 lfn2 …

50 28 Nov 2007 Alessandro Di Girolamo 50 Create your own dataset Copy a file into your local Grid storage Register these files into dq2 –Take care to the names you’ve given to your files!

51 28 Nov 2007 Alessandro Di Girolamo 51 Some internal dq2 commands source /afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2.shsource /afs/cern.ch/atlas/offline/external/GRID/ddm/pro03/dq2.sh dq2-list-dataset dq2-list-dataset Accepts also wild cards dq2-register-dataset (lfn1 guid1 [[size1] [checksum1]] lfn2 guid2 [[size2] [checksum2]]...) Allows to create a dataset and register files into it dq2-register-evgens (lfn1 guid1 …) Same as dq2-register-dataset, but DS must exist dq2-register-location dq2-register-location dq2-close-dataset dq2-close-dataset dq2-get-metadata dq2-get-metadata Provided various info about DS (how many versions, creationdate …) dq2-list-files dq2-list-files dq2-register-subscription [options] dq2-register-subscription [options] Options allow to specify SOURCE input site, SHARE, VERSION of subscribed dataset

52 28 Nov 2007 Alessandro Di Girolamo 52 DDM Work in Progress New dq2 release 0.5 (mid dec): –DDM Location Catalog Browser (link)link –DDM Site index catalog (link) : allow users to split jobs following the location of the fileslink Guid (row index), bit map with location of replicas on the sites …

53 28 Nov 2007 Alessandro Di Girolamo 53 DDM Location Catalog Browser http://atlddmtrack.cern.ch/mysite/

54 28 Nov 2007 Alessandro Di Girolamo 54 DDM Location Catalog Browser http://atlddmtrack.cern.ch/mysite/

55 28 Nov 2007 Alessandro Di Girolamo 55 DDM Site Index http://atlddmtrack.cern.ch/mysite/details/

56 28 Nov 2007 Alessandro Di Girolamo 56 DDM Dashboard Work in Progress More statistic on files/datasets time information View of single dataset status on all sites Dataset completeness information Dataset categorization

57 28 Nov 2007 Alessandro Di Girolamo 57 Conclusions Things are improving quickly: –More than one hand is needed from everyone –Put your hands IN the system, users suggestions/ideas are always useful T0-T1 Export Left: end of July Down: mid October If problems, ask to Claudia Ciocca, the IT ATLAS Tier1 contact

58 28 Nov 2007 Alessandro Di Girolamo 58 Backup …

59 28 Nov 2007 Alessandro Di Girolamo 59 Event Data Model: Data Flow SFO output rate ~ 200Hz, ~ 300MB/s RAW Events divided in streams and written to files of 2GB (1 file / stream / SFO / lumi_block) –Now we have different streams Express stream Calibration stream Debugging stream Physic stream (Muons, Electromag., B-phy, MinBias etc) Files organized in “dataset” Studies of: –Skimming (event selections) –Thinning (selectiong containers or objects from a container) –Slimming (selection of properties of an object) The selection and direct access to individual events is via TAG db –TAG is a keyed list of variables and/or events

60 28 Nov 2007 Alessandro Di Girolamo 60 ATLAS computing requirements n.b.: numbers may require adjustment CPU (MSi2k)Disk (PB)Tape (PB) 200820102008201020082010 Tier-03.76.10.150.52.411.4 CERN Analysis Facility2.14.61.02.80.41.0 Sum of Tier-1s18.15010407.728.7 Sum of Tier-2s17.551.57.722.1 Total41.4112.218.965.410.541.1

61 28 Nov 2007 Alessandro Di Girolamo 61 ATLAS naming convention Dataset and file names follow a certain convention. Details can be found on the File naming convention Twiki. Here is an example: trig0_calib0_csc11.005200.T1_McAtNlo_Jimmy.recon.ESD.v12000402_tid0 04515._00002.pool.root This is an ESD from reconstruction run with the trigger on (trig) using geometry CSC-01-00-00 (trig0) from an input file with the same geometry and the calibration htis from the dead material on (calib0) that orginated from the csc11 event generation using the python script DC3.005200.T1_McAtNlo_Jimmy.py. It was produced by production task 4515 v12000402 means that release 12.0.4 (and cache 12.0.42) was used in the reconstruction Information about which script was used for event generation (in this case, event generation using the python script DC3.005200.T1_McAtNlo_Jimmy.py) can be found when you trace the provenance of this dataset https://twiki.cern.ch/twiki/bin/view/Atlas/FileNamingConvention


Download ppt "28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of."

Similar presentations


Ads by Google