Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Worldwide LHC Computing Grid WLCG Service Ramp-Up LHCC Referees’ Meeting, January 2007.

Similar presentations


Presentation on theme: "The Worldwide LHC Computing Grid WLCG Service Ramp-Up LHCC Referees’ Meeting, January 2007."— Presentation transcript:

1 The Worldwide LHC Computing Grid WLCG Service Ramp-Up LHCC Referees’ Meeting, January 2007

2 Ramp-Up Outline  The clear goal for 2007 is to be ready for first data taking ahead of the machine itself This translates to: –Dress Rehearsals in the 2 nd half of the year –Preparation for these in the 1 st half –Continuous service operation and hardening –Continual (quasi-continuous) experiment production Different views: –Experiment, site, Grid-specific, WLCG… Will focus on first and (mainly) last of these… –Other views, in particular site views, will come shortly

3 3 WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing of full data chain from DAQ to T-2 chain was major item from last CR Timely testing of full data chain from DAQ to T-2 chain was major item from last CR DAQ  T-0 still largely untested DAQ  T-0 still largely untested

4 Service Ramp-Up As discussed at last week’s WLCG Collaboration Workshop, much work has already been done on service hardening –Reliable hardware, improved monitoring & logging, middleware enhancements  Much still remains to be done – this will be an on-going activity during the rest of 2007 and probably beyond The need to provide as much robustness in the services themselves – as opposed to constant baby-sitting – is well understood There are still new / updated services to deploy in full production (see previous slide)  It is unrealistic to expect that all of these will be ready prior to the start of the Dress Rehearsals Foresee a ‘staged approach’ – focussing on maintaining and improving both service stability and functionality (‘residual services’)  Must remain in close contact with both experiments and sites on schedule and service requirements – these will inevitably change with time Draft of experiment schedule (from December 2006) attached to agenda Updated schedules presented last Friday during WLCG w/s (pointer)

5 5  Running continously throughout the year (increasing rate)  Simulation production  Cosmic ray data-taking (detector commissioning)  January to June:  Data streaming tests  February and May:  Intensive Tier0 tests  From February onwards:  Data Distribution tests  From March onwards:  Distributed analysis (intensive tests)  May to July:  Calibration Data Challenge  June to October  Full Dress Rehearsal  November:  GO! ATLAS 2007 Timeline

6 Stefano Belforte INFN Trieste 6 Timeline February  Deploy PhEDEx 2.5  T0-T1, T1-T1, T1-T2 independent transfers  Restart job robot  Start work on SAM  FTS full deployment March  SRM v2.2 tests start  T0-T1(tape)-T2 coupled transfers (same data)  Measure data serving at sites (esp. T1)  Production/analysis share at sites verified April  Repeat transfer tests with SRM v2.2, FTS v2  Scale up job load  gLite WMS test completed (synch. with Atlas) May  Start ramping up to CSA07 June

7 WLCG Milestones These high-level milestones are complementary to the experiment-specific milestones and more detailed goals and objectives listed in the WLCG Draft Service Plan (see attachment to agenda) –Similar to that prepared and maintained in previous years –Regularly reviewed and updated through LCG ECM –Regular reports on status and updates to WLCG MB / GDB  Focus is on real production scenarios & (moving rapidly to) end to end testing –Time for component testing is over – we learnt a lot but not enough! –Time before data taking is very short – let alone the dress rehearsals All data rates refer to the Megatable and to pp running Any ‘factors’, such as accelerator and/or service efficiency, are mentioned explicitly –N.B. ‘catch-up’ is a proven feature of the end-end FTS service

8 Q1 2007 – Tier0 / Tier1s 1.Demonstrate Tier0-Tier1 data export at 65% of full nominal rates per site using experiment-driven transfers –Mixture of disk / tape endpoints as defined by experiment computing models, i.e. 40% tape for ATLAS; transfers driven by experiments –Period of at least one week; daily VO-averages may vary (~normal) 2.Demonstrate Tier0-Tier1 data export at 50% of full nominal rates (as above) in conjunction with T1-T1 / T1-T2 transfers –Inter-Tier transfer targets taken from ATLAS DDM tests / CSA06 targets 3.Demonstrate Tier0-Tier1 data export at 35% of full nominal rates (as above) in conjunction with T1-T1 / T1-T2 transfers and Grid production at Tier1s –Each file transferred is read at least once by a Grid job –Some explicit targets for WMS at each Tier1 need to be derived from above 4.Provide SRM v2.2 endpoint(s) that implement(s) all methods defined in SRM v2.2 MoU, all critical methods pass tests –See attached list; Levels of success: threshold, pass, success, (cum laude) –This is a requirement if production deployment is to start in Q2!

9 Q2 2007 – Tier0 / Tier1s As Q1, but using SRM v2.2 services at Tier0 and Tier1, gLite 3.x-based services and SL(C)4 as appropriate, (higher rates? (T1 T1/2)) Provide services required for Q3 dress rehearsals –Includes, for example, production Distributed Database Services at required sites & scale Full detail to be provided in coming weeks…

10 Measuring Our Level of Success Existing tools and metrics, such as CMS PhEDEx quality plots, ATLAS DDM transfer status, provide clear and intuitive views  These plots are well known to the sites and provide a good measure of current status as well as showing evolution with time Need metrics for WMS related to milestone 3 –CMS CSA06 metrics are a good model

11

12 12 DDM Functional Test 2006 (9 Tier-1s, 40 Tier-2s) Tier-1Tier-2sSept 06Oct 06Nov 06 ASGC IPAS, Uni MelbourneFailed within the cloud Failed for Melbourne T1-T1 not testd BNL GLT2, NET2,MWT2,SET2, WT2done 2+GB & DPM CNAF LNF,Milano,Napoli,Roma165% failure rate done FZK CSCS, CYF, DESY-ZN, DESY-HH, FZU, WUPFailed from T2 to FZK dCache problem T1-T1 not testd LYON BEIIJING, CPPM, LAPP, LPC, LPHNE, SACLAY, TOKYO donedone, FTS conn =< 6 NG not tested PIC IFAE, IFIC, UAMFailed within the cloud done RAL CAM, EDINBOURGH, GLASGOW, LANCS, MANC, QMUL Failed within the cloud Failed for Edinbrg. done SARA IHEP, ITEP, SINPFailedIHEP not tested IHEP in progress TRIUMF ALBERTA, TORONTO, UniMontreal, SFU, UVICFailed within the cloud FailedT1-T1 not testd New DQ2 release (0.2.12) After SC4 test

13 Summary 2007 will be an extremely busy and challenging year!  For those of us who have been working on LHC Computing for 15+ years (and others too…) it will nonetheless be extremely rewarding ¿Is there a more important Computing Challenge on the planet this year ?  The ultimate goal – to enable the exploitation of the LHC’s physics discovery potential – is beyond measure

14 Megatable Extract Tier1 CentreALICE (x4)ATLASCMSLHCbTarget IN2P3, Lyon6109.231.510.5157.2 GridKA, Germany11.988.226.36.3132.7 CNAF, Italy5.288.236.86136.2 FNAL, USA--105- BNL, USA-287.2-- RAL, UK2.4102.226.36.3137.2 NIKHEF, NL3.4109.2-9.1121.7 ASGC, Taipei-65.126.3-91.4 PIC, Spain-49.710.53.563.7 Nordic Data Grid Facility4.749.7--54.4 TRIUMF, Canada-48.3-- US ALICE8.2--- TOTALS41.8997262.741.71343.2


Download ppt "The Worldwide LHC Computing Grid WLCG Service Ramp-Up LHCC Referees’ Meeting, January 2007."

Similar presentations


Ads by Google