Download presentation
Presentation is loading. Please wait.
Published bySolomon Short Modified over 9 years ago
1
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL
2
December 17th 2008RAL PPD Computing Christmas Lectures 2 Outline Introduction Computing model –File types, data flows etc Production system –Monitoring –Performance this year Physics analysis Outlook for 2009 Some slides “borrowed” from Kors Bos –Mistakes are mine
3
December 17th 2008RAL PPD Computing Christmas Lectures 3 Introduction Not much time to cover the whole of ATLAS computing! Focus on Distributed Computing (~ Grid) –Ignore detector, trigger etc –Ignore offline software (athena, sim, reco, …) Just the big picture –Not RAL- or UK-specific –Not going to explain the Grid –Still very complex –Some parts still subject to change Successes and problems
4
December 17th 2008RAL PPD Computing Christmas Lectures 4 Tiers of ATLAS Tier structure is common for the LHC experiments, but some usage is ATLAS-specific Tier-0 @ CERN: does initial processing of raw data Tier-1, e.g. RAL: reprocessing, simulation, group analysis (no users!) –Typical Tier-1 is ~ 10% of total Tier-2, e.g. Southgrid: simulation, ATLAS-wide user analysis Tier-3, e.g. RAL PPD: Local user analysis Tiers are logical concepts: physical sites may merge functions –RAL Tier-1 has no Tier-2 component, but that’s unusual Tier-1 + associated Tier-2s form a “cloud” –logical unit for task + data assignment
5
December 17th 2008RAL PPD Computing Christmas Lectures 5 Data types HITS – simulated data from GEANT –~ 4 Mb/event RDO (Raw Data Out) – raw data from the detector or simulation –~ 2 Mb/event ESD (Event Summary Data) – output from reconstruction –~ 1 Mb/event AOD (Analysis Object Data) – reduced format used for most analysis (= DST) –~ 200 Kb/event DPD (Derived Physics Data) – root ntuple format for specific purpose (several types) –~ 10 Kb/event For guidance, expect ~ 10 million events/day in normal data-taking, so e.g. ~ 10 Tb/day for ESD.
6
December 17th 2008RAL PPD Computing Christmas Lectures 6 Tier-1 t0atlas Dataflow for ATLAS DATA Group Analysis TAPE ATLASDATADISK End User Analysis DPD Tier-2 Tier-3 ATLASUSERDISK ATLASLOCALGROUPDISK DPD ATLASGROUP 6 ESD AOD RDO ESD ATLASDATADISK ATLASDATATAPE ESD AOD Tier-0 Group Analysis ATLASGROUP DPD Reprocessing Other Tier-1 ESD AOD ESD RDO AOD End User Analysis DPD
7
December 17th 2008RAL PPD Computing Christmas Lectures 7 Tier-1 Tier-2 Simulation ATLASPRODDISK Tier-0 ATLASMCDISK OtherTier-1 Data flow for Simulation Production TAPE Pile-up ATLASMCDISK Reconstruction Mixing ATLASPRODDISK 7 RDO HITS RDO HITS RDO HITS RDO ESD AOD ESD AOD RDO
8
December 17th 2008RAL PPD Computing Christmas Lectures 8 Production system ATLAS has recently moved to a pilot job system similar to LHCb (PANDA - Production ANd Distributed Analysis) –PANDA originated in the US, but recently moved to CERN –Tasks <- jobs –Pilot jobs sent to each site, when they start they pull jobs from a central repository Data management by DQ2 (DQ=Don Qixote!) –Files -> datasets -> containers –Data moved to sites according to computing model, then jobs sent to where the data sits –Job output stored on local Storage Element, then moved with DQ2 –Dataset movement can be requested by anyone, but can only be triggered by authorised people Metadata stored in AMI (ATLAS Metadata Interface)
9
December 17th 2008RAL PPD Computing Christmas Lectures 9 Production dashboard
10
December 17th 2008RAL PPD Computing Christmas Lectures 10 DQ2 dashboard
11
December 17th 2008RAL PPD Computing Christmas Lectures 11 Experience in 2008 Many tests of different aspects of the production system –CCRC (Common Computing Readiness Challenge) in May All experiments testing at once –FDR (Full/Final Dress Rehearsal) –Reprocessing tests –Functional tests (regular low-priority system tests) –Simulation Production General results are good –The system works! –Many detailed problems at sites –Lots of babysitting
12
December 17th 2008RAL PPD Computing Christmas Lectures 12 CCRC: Results NOMINAL PEAK ERRORS
13
December 17th 2008RAL PPD Computing Christmas Lectures 13 CCRC: all experiments
14
December 17th 2008RAL PPD Computing Christmas Lectures 14 Transfers over one month
15
December 17th 2008RAL PPD Computing Christmas Lectures 15 Efficiencies over one month
16
December 17th 2008RAL PPD Computing Christmas Lectures 16 Simulation Production over one month
17
December 17th 2008RAL PPD Computing Christmas Lectures 17 User Analysis Grid-based analysis framework/procedures still in development –No real data yet –Many people use lxplus@CERN –Some Grid pioneers –GANGA tool is popular (shared with LHCb, developed in the UK) “Traditional” Grid job submission vs pilot jobs not yet decided Run anywhere vs run locally? –Grid concept is that all users can run at all sites, but “Tier-3” resources can be local (how local?) –Pilot jobs make it hard for sites to control whose jobs run User data storage prototype –No storage quotas on Grid storage May need a big stick! –GROUPDISK – managed by physics groups –LOCALGROUPDISK – for local (= country) users –USERDISK – scratch storage, anyone can write, files are cleaned after ~ 1 month Little experience so far, but tests now starting –Seems that bandwidth to storage may be a bottleneck
18
December 17th 2008RAL PPD Computing Christmas Lectures 18 Outlook for 2009 Many ongoing activities –Simulation production –Cosmics Once the detector is back together –Functional tests Specific tests –“10 million files” Testing Tier-1 to Tier-1 transfers –Reprocessing –CCRC09? –FDR? Analysis challenges –Analysis is the big challenge! Real data …
19
December 17th 2008RAL PPD Computing Christmas Lectures 19 Are we ready? Yes, but … Production system works –Tested well above nominal rates –Bulk production of simulated data now standard operation –Computing and storage resources ~ adequate At least for now Constant barrage of problems, many people on shift and lots of manual intervention –One point recently when 7 Tier-1s were down simultaneously! –24*7 cover now at Tier-1 –Some critical people are leaving Analysis on the Grid still largely untested –Real data will bring a lot of new, inexperienced users –Will they be able to cope with the typical failure rate on the Grid?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.