Download presentation
Presentation is loading. Please wait.
Published byNorma Pitts Modified over 9 years ago
1
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29, 2006
2
Kaushik De 2 Introduction Computing needs for HENP experiments keep growing Computing models have evolved to meet needs We have seen many important paradigm shifts in the past: farm computing, distributed information systems (world-wide web), distributed production, and now distributed analysis (DA) Many lessons from the past – from FNAL, SLAC, RHIC and LHC I will talk about some general ideas, with examples from ATLAS and D0 experiments – see other talks here for additional LHC specific details
3
July 29, 2006 Kaushik De 3 Distributed Analysis Goals Mission statement: remote physics analysis on globally distributed computing systems Scale: set by experimental needs Lets look at LHC example from ATLAS in 2008 10,000-20,000 CPU’s distributed at ~40 sites 100 TB transferred from CERN per day (100k files per day) 20-40 PB data stored worldwide from 1 st year at LHC Simultaneous access to data for distributed production and DA Physicists (users) will need access to both large scale storage and CPU from thousands of desktops worldwide DA systems are being designed to meet these challenges at the LHC, while learning from current & past experiments
4
July 29, 2006 Kaushik De 4 Distributed Analysis Challenges Distributed production is now routinely done in HENP For MC production and reprocessing of data - not yet LHC scale Scale: few TB’s of data generated/processed daily in ATLAS Scope: organized activity, managed by experts Lessons learned from production Robust software systems to automatically recover from grid failures Robust site services – with hundreds of sites, there are daily failures Robust data management – pre-location of data, cataloguing, transfers Distributed analysis is in early stages of testing Moving from Regional Analysis Center model (ex. D0) to fully distributed analysis model – computing on demand Presents new challenges, in addition to those faced in production Chaotic by nature – hundreds of users, random fluctuations in demand Robustness becomes even more critical – software, sites, services
5
July 29, 2006 Kaushik De 5 Role of Grid Middleware Basic grid middleware for Distributed Analysis Most HEP experiments use VDT (which includes Globus) Security and Accounting - GSI authentication, Virtual Organizations Tools for secure file transfer and job submission to remote systems Data location catalogues (RLS, LFC) Higher level middleware through international Grid projects Resource brokers (ex. LCG, gLite, CondorG…) Tools for reliable file transfer (FTS…) User and group account management (VOMS) Experiments build application layers on top of middleware To manage experiment specific workflow Data (storage) management tools, and database applications
6
July 29, 2006 Kaushik De 6 Divide and Conquer Experiments optimize/factorize both data and resources Data factorization Successive processing steps lead to compressed physics objects End user does physics analysis using physics objects only Limited access to detailed data for code development, calibration Periodic centralized reprocessing to improve analysis objects Resource factorization Tiered model of data location and processors Higher tiers hold archival data and perform centralized processing Middle tiers for MC generation and some (re)processing Middle and lower tiers play important role in distributed analysis Regional centers are often used to aggregate nearby resources
7
July 29, 2006 Kaushik De 7 Example of Data Factorization in ATLAS Warning – such projections are often underestimated for DA
8
July 29, 2006 Kaushik De 8 Example from D0 from A. Boehnlein
9
July 29, 2006 Kaushik De 9 Computing Model Data handling Services (SAM, Dbservers) Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data CLuEDO Central Storage Remote Analysis Systems Fix/skim Resource Factorization Example D0 Computing Model from A. Boehnlein
10
July 29, 2006 Kaushik De 10 ATLAS Computing Model Expected resources 10 Tier 1’s each with 500- 1000 CPU’s, ~1 PB disk, ~1 PB tape 30 Tier 2’s each with 100-500 CPU’s, 100-500 TB disk Satellite Tier 3 sites – small clusters, user facilities 10 Gb/s network backbone Tier 0 – repository for raw data, first pass processing Tier 1 – repository of full set of processed data, reprocessing capabilities, repository for MC data generated at Tier 2’s Tier 2 – MC production, repository of data summaries Distributed analysis – uses resources at all Tier’s
11
July 29, 2006 Kaushik De 11 ATLAS CM Resource Requirements Projected resources needed in 2008, assuming 20% MC
12
July 29, 2006 Kaushik De 12 Data Management Systems DA needs robust distributed data management systems Example from D0 – SAM 10 years of development/experience Has evolved from data/metadata catalogue to grid enabled workflow system for central production and user analysis (in progress) Example from ATLAS – DQ2 3 years of development/experience Has evolved from data catalogue API to data management system Central catalogue for data collection information (datasets) Distributed catalogues for dataset content - file level information Asynchronous site services for data movement by subscription Client-server architecture with REST-style HTTP calls
13
July 29, 2006 Kaushik De 13 The Panda Example Production and Distributed Analysis system in ATLAS Similar to batch systems for the grid (central job queue) Marriage of three ideas Common system for distributed production and analysis Distributed production jobs submitted through web interface Distributed analysis jobs submitted through command line interface Jobs processed through the same workflow system (with common API) Production operations group maintains Panda as a reliable service for users, working closely with site administrators Local analysis jobs and distributed analysis jobs with same interface Use case – physicist develops and tests code on local data, submits to grid for dataset processing (thousands of files) using same interface ATLAS software framework Athena becomes ‘pathena’ in Panda Highly optimized for and coupled to ATLAS DDM system DQ2
14
July 29, 2006 Kaushik De 14 Some ATLAS DA User Examples Use case 1: User wants to run analysis on 1000 AOD files (1M events) User copies a few data files using DQ2 User develops and tests analysis code (Athena) on these local files User runs pathena over 1000 files on the grid to create Ntuples User retrieves Ntuples for final analysis and to makes plots Use case 2: User needs to process 20,000 ESD files (1M events) Or user wants to generate large signal MC sample User requests centralized production through web interface Use case 3: User needs small MC sample or to process few file on grid User runs GUI or CL tools (Ganga, AtCom, LJSF, pathena…)
15
July 29, 2006 Kaushik De 15 Panda (pathena) DA Status
16
July 29, 2006 Kaushik De 16 Panda – User Accounting Example
17
July 29, 2006 Kaushik De 17 Conclusion Distributed production works well – still needs to scale up Distributed analysis is new challenge – both for current and future experiments in HENP Scale of resources and users unprecedented at LHC Many systems being tested – I showed only one example Robustness of services and data management critically important Looking to the future Self organizing systems Agent based systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.