Download presentation
Presentation is loading. Please wait.
Published byDouglas Walker Modified over 8 years ago
1
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon 5-3-2007
2
5/3/2007Atlas Tutorial2 Guidelines Atlas Computing Model Atlas Data management Atlas tests MC Production
3
5/3/2007Atlas Tutorial3 Tier 1 ATLAS Tier-1 Data Flow (2008) Tier-0 T1 Other Tier-1s disk buffer RAW 1.6 MB/event 320MB/s disk storage 43 MB/s 51+20 MB/s Tier-2s cloud Mass Storage ESD 1 MB/event 200MB/s AOD 0.1 MB/event 20MB/s Processor’s Farm 20 MB/s Reprocessing(s) Month later+…. RAW ESD AOD ESD AODESD AOD +ESD
4
5/3/2007Atlas Tutorial4 disk buffer Data Flow: Monte Carlo Production and User’s Analysis Tier-2s cloud Tier 1 disk storage User’s AnalysisProcessor’s MC AODAOD ESD T1 Other Tier-1s Mass Storage MC HITS RDO
5
5/3/2007Atlas Tutorial5 Situation in 2006-2007 Previous slides: How Atlas is expected to run when LHC data flows out. Ante LHC running, i.e. before Nov 2007: – Monte-Carlo is processed at T1s and Tiers2 – All Data Production and Distribution tests exercised with MC
6
5/3/2007Atlas Tutorial6 Guidelines How to apply the ATLAS Computing Model? Atlas Data management
7
5/3/2007Atlas Tutorial7 ATLAS Data Management Atlas uses 3 grids: LCG, OSG and NorduGrid with their own services Requires an ATLAS layer over the Grid middleware Atlas Model of computing and data distribution: – Storage capacity spread in T1 sites Different storage systems with different access technologies. – Computing power distributed over all Tiers, 1, 2, 3 to produce MC and process data Tool to Distribute the data must: – Allow high performance and reliable data movement – Include information about data location and replication – Support multiple grid flavours.
8
5/3/2007Atlas Tutorial8 ATLAS Distribution tool : DDM Stephane’s talk
9
5/3/2007Atlas Tutorial9 Guidelines Exercising the Model and preparing for real Data: Atlas Tests
10
5/3/2007Atlas Tutorial10 Tests TIER-1 Lyon CSC: Computer System Commissioning: setup of tests and milestones. Performance and functional tests of data transfers T0 to T1 and T1 to T2 – June-July 2006, September-October 2006 – Going on in 2007 (march 2007…) Goal: Get a stable and efficient system of data distribution. New in 2007: CDR Computing Dress Rehearsal to exercise the full Atlas Data Model
11
5/3/2007Atlas Tutorial11 Performance tests T0=>T1 Almost reached the goal for few hours Problems from various sides (availability of the sites, of the services, access to the catalogs ….) July 2006
12
5/3/2007Atlas Tutorial12 Atlas Performance tests T1=>T2 ATLAS: continuous transfer from T1 to T2 sites initiated by the Tier 1 July 2006:
13
5/3/2007Atlas Tutorial13 Atlas Performance tests T1=>T2 Transfers to 7 Sites, T2 and non-T2 simultaneously Some problem of limitations in the bandwidth for simultaneous transfers July 2006
14
5/3/2007Atlas Tutorial14 Performance tests T0=>T1 October Overall weaker throughput due to Multi-VO Simultaneous tests Some drops understood (castor) but most not October 2006
15
5/3/2007Atlas Tutorial15 Multi-VO tests 2 days tests involving multi VO Generate data at Tier-0 according to the rate transfer of each experiment Transfer to all sites
16
5/3/2007Atlas Tutorial16 Multi-VO tests Reached nominal transfer rates after few improvements… Transfer Alice-Atlas-CMS to LYON Tier-1
17
5/3/2007Atlas Tutorial17 BADOK
18
5/3/2007Atlas Tutorial18 Problems: identified or not Many improvements during 2006 year and increase in magnitude of the overall tests and fixes last quarter of 2006. But stable running not yet achieved for very different reasons: Persistent and transient site failures; Frequent failures for FTS transfers : big problem when multi VO runs. LFC server hanging and failures: solved Upgrade h/w on some sites for VO BOX: fixed Memory leaks and other overflow conditions on DDM tool when running for long periods of time: fixed. Throughput per stream per site seems to vary heavily (and some streams very slow): not understood
19
5/3/2007Atlas Tutorial19 Problems but also successes Large file sizes always leads to much more stable running: Still not totally understood Non Stable data generation (Castor configuration…) : Significant downtimes and problems maintaining constant stream for Tier-1 export Monitoring: – Missing automated alarms – Missing clear view of errors, per site – Missing overall success metrics per dataset Lack of Manpower! BUT despite this list, many successes at the end of 2006 and very reactive and concerned behaviour of Lyon T1 and cloud.
20
5/3/2007Atlas Tutorial20 Guidelines Exercising the Model and preparing for real Data: Monte Carlo production
21
5/3/2007Atlas Tutorial21 Monte-Carlo Production in Lyon Running: 291(max:955), queued:443(max:1101), Production rate:81%(max:100%) Result: Impressive increase in the efficiency of Data production in Lyon Cloud. Autumn 2006: executor installed in Lyon to distribute the production jobs within the Lyon Cloud. Production shift organization Setup of priorities to boost production jobs based on role in the certificate
22
5/3/2007Atlas Tutorial22 AOD Replication : pre-testing ASG C BNLCERNCNAFFZKLYONNGPICRAL SAR A TRIU MF ASGC BNL CERN CNAF FZK LYON NG NDG F PIC RAL SARA TRIU MF FROM TO Data Transfer testedData Transfer failedData Transfer not testedin progress
23
5/3/2007Atlas Tutorial23 Monte-Carlo Production in Lyon Cloud 16% of LCG for 2006 22% for October-November 16% of LCG for 2006 22% for October-November
24
5/3/2007Atlas Tutorial24 Monte-Carlo Production in Lyon Cloud
25
5/3/2007Atlas Tutorial25 Monte-Carlo Production in Lyon Still big room for improvement in the performances – Too high failure rate at or before start of jobs or due to site/middleware issues (no loss of CPU) – Failure at output: registration problem, srm, etc.
26
5/3/2007Atlas Tutorial26 Summary Main baselines of the Atlas Computing Model established but still working on improvements.Main baselines of the Atlas Computing Model established but still working on improvements. 2006: decisive transition to operation mode: continuous production of high statistics MC samples;2006: decisive transition to operation mode: continuous production of high statistics MC samples; Successful tests of Data Distribution in agreement with CSC (Computer System Commissioning)Successful tests of Data Distribution in agreement with CSC (Computer System Commissioning) Bottlenecks and problems still ahead but most are identified and work is going for a solution.Bottlenecks and problems still ahead but most are identified and work is going for a solution. Improvements expected for : reliability, stability, monitoring.Improvements expected for : reliability, stability, monitoring. Lyon site is very actively progressing towards full readiness for first data in the end of 2007Lyon site is very actively progressing towards full readiness for first data in the end of 2007
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.