CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton University For the LHC Experiments: ALICE, ATLAS, CMS and LHCb (Though with.

Slides:

Advertisements

Similar presentations

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Advertisements

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.

CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

LCG-France, 22 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, CPPM, Marseille LCG-France Meeting, 22 July 2004, CERN.

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;

8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.

Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.

Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

LCG ARDA status Massimo Lamanna 1 ARDA in a nutshell ARDA is an LCG project whose main activity is to enable LHC analysis on the grid ARDA is coherently.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.

29 Sept 2004 CHEP04 A. Fanfani INFN Bologna 1 A. Fanfani Dept. of Physics and INFN, Bologna on behalf of the CMS Collaboration Distributed Computing Grid.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.

Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.

Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

EGEE is a project funded by the European Commission under contract IST NA4/HEP work F Harris (Oxford/CERN) M.Lamanna(CERN) NA4 Open meeting.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.

David Stickland CMS Core Software and Computing

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.

GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

June 22 L. Silvestris CMS/ARDA CMS/ARDA Lucia Silvestris INFN-Bari.

ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.

SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.

ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.

Real Time Fake Analysis at PIC

INFN GRID Workshop Bari, 26th October 2004

Data Challenge with the Grid in ATLAS

INFN-GRID Workshop Bari, October, 26, 2004

The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.

ALICE Physics Data Challenge 3

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

MC data production, reconstruction and analysis - lessons from PDC’04

Simulation use cases for T2 in ALICE

R. Graciani for LHCb Mumbay, Feb 2006

ATLAS DC2 & Continuous production

The LHCb Computing Data Challenge DC06

Presentation transcript:

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton University For the LHC Experiments: ALICE, ATLAS, CMS and LHCb (Though with entirely my own bias and errors)

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 2/32 Outline Elements of a Computing Model Recap of LHC Computing Scales Recent/Current LHC Data Challenges Critical Issues

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 3/32 The Real Challenge… Quoting from David Williams’ Opening Talk –Computing hardware isn’t the biggest challenge –Enabling the collaboration to bring their combined intellect to bear on the problems is the real challenge Empowering the intellectual capabilities in very large collaborations to contribute to the analysis of the new energy frontier –Not (just) from sociological/political perspective –You never know where the critical ideas or work will come from –The simplistic vision of central control is elitist and illusory

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 4/32 Goals of An (Offline) Computing Model Worldwide Collaborator access and ability to contribute to the analysis of the experiment: Safe Data Storage Feedback to the running experiment Reconstruction according to priority scheme with graceful fall back solutions –Introduce no Deadtime Optimum (Or at least acceptable) usage of Computing Resources Efficient Data Management

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 5/32 Where are LHC Computing Models today? In a state of flux! –Last chance to influence initial purchasing planning in next 12 months Basic principles –were contained in MONARC and went into the first assessment of LHC computing requirements –Aka “Hoffmann Report”; Instrumental in establishment of LCG But, now we have two critical milestones to meet –Computing Model Papers (Dec 2004) –Experiment and LCG TDR’s (July 2005) And, we more or less know what will, or won’t, be ready in time –Restrict enthusiasm to attainable goals Process now to review models in all four experiments. –Very interested in running experiment experiences, (This Conference!) –Need the maximum expertise and experience included in these final CM’s

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 6/32 LHC Computing Scale (Circa 2008) CERN T0/T1 –Disk Space [PB]5 –Mass Storage Space [ PB]20 –Processing Power [MSI2K]20 –WAN [10Gb/s]~5? Tier-1s (Sum of ~10) –Disk Space [PB]20 –Mass Storage Space [ PB]20 –Processing Power [MSI2K]45 –WAN [10Gb/s/Tier-1]~1? Tier-2s (Sum of ~40) –Disk Space [PB]12 –Mass Storage Space [ PB] 5 –Processing Power [MSI2K]40 –WAN [10Gb/s/Tier-2]~.2? Cost Sharing 30% At CERN, 40% T1s, 30% T2’s CERN T0/T1 Cost Sharing T1 Cost Sharing T2 Cost Sharing

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 7/32 Elements of a Computing Model (I) Data Model –Event data sizes, formats, streaming –Data “Tiers” (DST/ESD/AOD etc) Roles, accessibility, distribution,… –Calibration/Conditions data Flow, latencies, update freq –Simulation. Sizes, distribution –File size Analysis Model –Canonical group needs in terms of data, streams, re-processing, calibrations –Data Movement, Job Movement, Priority management –Interactive analysis Computing Strategy and Deployment –Roles of Computing Tiers –Data Distribution between Tiers –Data Management Architecture –Databases. Masters, Updates, Hierarchy –Active/Passive Experiment Policy Computing Specifications –Profiles (Tier N & Time) Processors, Storage, Network (Wide/Local), DataBase services, Specialized servers, Middleware requirements

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 8/32 Common Themes Move a copy of the raw data away from CERN in “real-time” –Second secure copy 1 copy at CERN 1 copy spread over N sites –Flexibility. Serve raw data even if Tier-0 saturated with DAQ –Ability to run even primary reconstruction offsite Streaming online and offline –(Maybe not a common theme yet) Tier-1 centers in-line to Online and Tier-0 Simulation at T2 centers –Except LHCb, if simulation load remains high, use Tier-1 ESD Distributed n copies over N Tier-1 sites –Tier-2 centers run complex selections at Tier-1, download skims AOD Distributed to all (?) tier-2 centers –Maybe not a common theme. How useful is AOD, how early in LHC? Some Run II experience indicating long term usage of “raw” data Horizontal Streaming –RAW, ESD, AOD,TAG Vertical Streaming –Trigger streams, Physics Streams, Analysis Skims

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 9/32 Purpose and structure of ALICE PDC04 Test and validate the ALICE Offline computing model: –Produce and analyse ~10% of the data sample collected in a standard data- taking year –Use the entire ALICE off-line framework: AliEn, AliRoot, LCG, PROOF… –Experiment with Grid enabled distributed computing –Triple purpose: test of the middleware, the software and physics analysis of the produced data for the Alice PPR Three phases –Phase I - Distributed production of underlying Pb+Pb events with different centralities (impact parameters) and of p+p events –Phase II - Distributed production mixing different signal events into the underlying Pb+Pb events (reused several times) –Phase III – Distributed analysis Principles: –True GRID data production and analysis: all jobs are run on the GRID, using only AliEn for access and control of native computing resources and, through an interface, the LCG resources –In phase III GLite+ARDA

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 10/32 Master job submission, Job Optimizer (splitting in sub-jobs), RB, File catalogue, process monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Storage CERN CASTOR: disk servers, tape Output files File transfer system: AIOD LCG is one AliEn CE Job structure and production (phase I)

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 11/32 CEs: 15 directly controlled through AliEn + CERN-LCG and Torino-LCG (Grid.it) Phase I CPU contributions

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 12/32 Issues Too many files in MSS Stager (Also CMS) –Solved by splitting the data on two stagers ( Persistent problems with local configurations reducing the availability of GRID sites –Frequent black holes –Problems often come back (e.g. nfs mounts!) –Local disk space on WN Quality of the information in the II Workload Management System does not ensure an even distribution of jobs in the different centres Lack of support for bulk operations makes the WMS response time critical KeyHole approach and lack of appropriate monitoring and reporting tools make debugging difficult

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 13/32 Phase II (started 1/07) – statistics In addition to phase I –Distributed production of signal events and merging with phase I events –Network and file transfer tools stress –Storage at remote SEs and stability (crucial for phase III) Conditions, jobs …: –110 conditions total –1 million jobs –10 TB produced data –200 TB transferred from CERN –500 MSI2k hours CPU End by 30 September

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 14/32 Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Storage CERN CASTOR: underlying events Local SEs CERN CASTOR: backup copy Storage Primary copy Local SEs Output files Underlying event input files zip archive of output files Register in AliEn FC: LCG SE: LCG LFN = AliEn PFN edg(lcg) copy&register File catalogue Structure of event production in phase II

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 15/32 Master job submission, Job Optimizer (N sub-jobs), RB, File catalogue, processes monitoring and control, SE… Central servers CEs Sub-jobs Job processing AliEn-LCG interface Sub-jobs RB Job processing CEs Local SEs Primary copy Local SEs Input files File catalogue Job splitter File catalogue Metadata lfn 1 lfn 2 lfn 3 lfn 7 lfn 8 lfn 4 lfn 5 lfn 6 PFN = (LCG SE:) LCG LFN PFN = AliEn PFN Query LFN’s Get PFN’s User query Structure analysis in phase 3

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 16/32 ALICE DC04 Conclusions The ALICE DC04 started out with (almost unrealistically) ambitious objectives They are coming very close to reach these objectives and LCG has played an important role. They are ready and willing to move to gLite as soon as possible and contribute to its evolution with their feedback

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 17/32 Consider DC2 as a three-part operation: –part I: production of simulated data (July-September 2004) running on “Grid” Worldwide –part II: test of Tier-0 operation (November 2004) Do in 10 days what “should” be done in 1 day when real data-taking start Input is “Raw Data” like output (ESD+AOD) will be distributed to Tier-1s in real time for analysis –part III: test of distributed analysis on the Grid access to event and non-event data from anywhere in the world both in organized and chaotic ways Requests –~30 Physics channels ( 10 Millions of events) –Several millions of events for calibration (single particles and physics samples) ATLAS-DC2 operation

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 18/32 LCGNGGrid3LSF LCG exe LCG exe NG exe G3 exe LSF exe super prodDB dms RLS jabber soap jabber Don Quijote Windmill Lexor AMI Capone Dulcinea ATLAS Production system

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 19/32 CPU usage & Jobs

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 20/32 ATLAS DC2 Status Major efforts in the past few months –Redesign of the ATLAS Event Data Model and Detector Description –Integration of the LCG components (G4; POOL; …) –Introduction of the Production System Interfaced with 3 Grid flavors (and “legacy” systems) Delays in all activities have affected the schedule of DC2 –Note that Combined Test Beam is ATLAS 1st priority –And DC2 schedule was revisited To wait for the readiness of the software and of the Production system DC2 –About 80% of the Geant4 simulation foreseen for Phase I has been completed using only Grid and using the 3 flavors coherently; –The 3 Grids have been proven to be usable for a real production and this is a major achievement BUT –Phase I progressing slower than expected and it’s clear that all the involved elements (Grid middleware; Production System; deployment and monitoring tools over the sites) need improvements –It is a key goal of the Data Challenges to identify these problems as early as possible.

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 21/32 Testing CMS Computing Model in DC04 Focused on organized (CMS-managed) data flow/access Functional DST with streams for Physics and Calibration –DST size ok, almost usable by “all” analyses; (new version ready now) Tier-0 farm reconstruction –500 CPU. Ran at 25Hz. Reconstruction time within estimates. Tier-0 Buffer Management and Distribution to Tier-1’s –TMDB: a CMS-built Agent system communicating via a Central Database. –Manages dynamic dataset “state”, not a file catalog Tier-1 Managed Import of Selected Data from Tier-0 –TMDB system worked. Tier-2 Managed Import of Selected Data from Tier-1 –Meta-data based selection ok. Local Tier-1 TMDB ok. Real-Time analysis access at Tier-1 and Tier-2 –Achieved 20 minute latency from Tier 0 reconstruction to job launch at Tier-1 and Tier-2 Catalog Services, Replica Management –Significant performance problems found and being addressed

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 22/32 PIC Barcelona FZK Karlsruhe CNAF Bologna RAL Oxford IN2P3 Lyon T1 T0 FNAL Chicago T1 DC04 Data Challenge Focused on organized (CMS-managed) data flow/access T0 at CERN in DC04 –25 Hz Reconstruction –Events filtered into streams –Record raw data and DST –Distribute raw data and DST to T1’s T1 centres in DC04 –Pull data from T0 to T1 and store –Make data available to PRS –Demonstrate quasi-realtime analysis of DST’s T2 centres in DC04 –Pre-challenge production at > 30 sites –Modest tests of DST analysis T2 Legnaro T2 CIEMAT Madrid Florida T2 IC London T2 Caltech

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 23/32 Tier-2 Physicist T2storage ORCA Local Job Tier-2 Physicist T2storage ORCA Local Job Tier-1 Tier-1 agent T1storage ORCA Analysis Job MSS ORCA Grid Job Tier-1 Tier-1 agent T1storage ORCA Analysis Job MSS ORCA Grid Job DC04 layout Tier-0 Castor IB fake on-line process RefDB POOL RLS catalogue TMDB ORCA RECO Job GDB Tier-0 data distribution agents EB LCG-2 Services Tier-2 Physicist T2storage ORCA Local Job Tier-1 Tier-1 agent T1storage ORCA Analysis Job MSS ORCA Grid Job

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 24/32 Next Steps Physics TDR requires physicist access to DC04 data –Re-reconstruction passes –Alignment studies –Luminosity effects Estimate 10M events/month throughput required CMS Summer-Timeout to focus new effort on –DST format/contents –Data Management “RTAG” –Workload Management deployment for physicist data access now –Cross-project coordination group focused on end-user Analysis Use Requirements of Physics TDR to build understanding of analysis model, while doing the analysis –Make it work for Physics TDR Component Data Challenges in 2005 –Not a big-bang where everything has to work at the same time Readiness challenge in –100% Startup scale –Concurrent Production, Distribution, Ordered and Chaotic Analysis

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 25/32 LHCb DC’04 aims Gather information for LHCb Computing TDR Physics Goals: –HLT studies, consolidating efficiencies. –B/S studies, consolidate background estimates + background properties. Requires quantitative increase in number of signal and background events: – signal events (~80 physics channels). – specific backgrounds. – background (B inclusive + min. bias, 1:1.8). Split DC’04 in 3 Phases: –Production: MC simulation (Done). –Stripping: Event pre-selection (To start soon). –Analysis (In preparation).

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 26/32 DIRAC Job Management Service DIRAC Job Management Service DIRAC CE LCG Resource Broker Resource Broker CE 1 DIRAC Sites Agent CE 2 CE 3 Production manager Production manager GANGA UI User CLI JobMonitorSvc JobAccountingSvc AccountingDB Job monitor InfomationSvc FileCatalogSvc MonitoringSvc BookkeepingSvc BK query webpage BK query webpage FileCatalog browser FileCatalog browser User interfaces DIRAC services DIRAC resources DIRAC Storage DiskFile gridftp bbftp rfio DIRAC Services & Resources

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 27/32 Phase 1 Completed DIRAC alone LCG in action /day LCG paused /day LCG restarted 186 M Produced Events Phase 1 Completed

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 28/32 LCG Performance (I) Submitted Jobs Cancelled Jobs Aborted Jobs (Before Running) 211k Submitted Jobs After Running: -113 k Done (Successful) -34 k Aborted

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 29/32 LCG Performance (II) LCG Job Submission Summary Table LCG Efficiency: 61 %

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 30/32 LHCb DC’04 Status LHCb DC’04 Phase 1 is over. The Production Target has been achieved: –186 M Events in 424 CPU years. –~ 50% on LCG Resources (75-80% at the last weeks). Right LHCb Strategy: –Submitting “empty” DIRAC Agents to LCG has proven to be very flexible allowing a good success rate. Big room for improvements, both on DIRAC and LCG: –DIRAC needs to improve in the reliability of the Servers: big step already during DC. –LCG needs improvement on the single job efficiency: ~40% aborted jobs. –In both cases extra protections against external failures (network, unexpected shutdowns…) must be built in. Congratulation and warm thanks to the complete LCG team for their support and dedication

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 31/32 Personal Observations on Data Challenge Results Tier-0 Operations at 25% scale demonstrated –Job couplings from objectivity era - gone Directed data flow/management T0>T1>T2. Worked (Intermittently) Massive Simulation on LCG, Grid3, NorduGrid. Worked Beginning to get experience with input-data-intensive jobs Not many users out there yet stressing the chaotic side –The next 6 months are critical, we have to see broad and growing adoption, not having a personal grid user certificate will have to seem odd Many problems are classical computer center ones –Full disks, reboots, SW installation, Dead disks,.. –Actually this is bad news. No Middleware silver-bullet. Hard work getting so many centers up to required performance

CHEP ‘04 LHC Computing Models and Data Challenges David Stickland Princeton Univ. 30 Sept 2004 Page 32/32 Critical Issues for early 2005 Data Management –Building Experiment Data Management Solutions Demonstrating End-User access to remote resources –Data and processing Managing Conditions and Calibration databases –And their global distribution Managing Network expectations –Analysis can place (currently) impossible loads on network and DM components Planning for the future, while maintaining priority controls Determining the pragmatic mix of Grid responsibilities and experiment responsibilities –Recall the “Data” in DataGrid, LHC is Data Intensive, –Configuring the experiment and grid software to use generic resources is wise –But (I think) data location will require a more ordered approach in practice