Dan Tovey, University of Sheffield GridPP: Experiment Status & User Feedback Dan Tovey University Of Sheffield
Dan Tovey, University of Sheffield Introduction This talk will be in two parts: 1.The good news Details of Grid use by the experiments 2.The less good news Feedback from the experiments regarding their experiences
Dan Tovey, University of Sheffield ATLAS Grid Use Almost all resources for ATLAS are Grid-based –Three Grid flavours to work with – LCG-2, NorduGrid, Grid3 –Considerable issues of interoperability/federation Next large exercise is Rome Physics Workshop in June –Generation mixture of Grid and non-Grid, but much non-Grid for convenience –Simulation/Digitisation/Reconstruction All on Grid –Analysis Some Grid-based analysis already, distributed analysis being rolled-out in Spring Rome will use a mixture of Grid/non-Grid analysis
Dan Tovey, University of Sheffield ATLAS Issues Interoperability –Currently, production system has to layer job scheduling over system for each deployment –Absolute need for a unified file catalogue system Currently layer additional catalogue over others Information system/policy –Inaccurate advertisement of sites –SE saturation Internal – production system need better clean-up and more robust back-up SE should advertise if it is really for storage! SCR$MONTH class required? Lines of reporting need to be improved/clarified LCG issues –LCG-Castor failures –RLS corruption Resource issues –Still trying to ensure required resources for 2007/2008
Dan Tovey, University of Sheffield General CMS Outlook Tool development –Going very well: leading contributions from the UK in the most important areas –Integration between tools is starting –Moving away from LCG-style data management, for now –Our modular approach can re-integrate LCG tools later on if needed Collaboration status / plans –Computing Model now blessed and publicly available –Computing TDR well under way –UK making a strong contribution –Use of Tier-1 / Tier-2 resources in the UK will start to grow rapidly as Grid-enabled DST analysis begins
Dan Tovey, University of Sheffield LCG Aug 2004 DIRAC27% LCG73% LHCb Production Desktop Provides control and decomposes complex workflows into steps and modules. DIRAC alone LCG in action LCG paused LCG restarted 186 M Produced Events DC04 Phase 1 186M events. 424 CPU years LCG(UK)Tier 17.7% Tier 2London3.9% Tier 2South2.3% Tier 2North1.4% DIRAC(UK)Imperial2.0% Liverpool3.1% Oxford0.1% ScotGrid5.1% 3-5M events/day 1.8M events/day DIRAC May 2004 DIRAC89% LCG11% ~50% run on LCG resources LCG Efficiency = 61% UK second largest producer (25%) after CERN.
Dan Tovey, University of Sheffield DC05 – Real Time Data Challenge Mimic LHC data taking, test HLT, reconstruction and streaming software from pre- simulated data. Production phase (150M events) April- June 05. LHCb-2005 DC04 Phase 2 – Stripping (Ongoing) Using Production Desktop developed in UK (G. Kuznetsov). Data reduction on 65TB distributed over Tier 1 sites. Using DIRAC with input data and LCG for data access. SRM was on critical path – available at CERN, PIC, CNAF. Production version unavailable at RAL, UK not participating. DC04 Phase 3 – End User Analysis Use GANGA Grid interface (UK project). Improvements Jan–March 05 (A. Soroko at CERN) User training started (e.g. Cambridge event funded by GridPP in December 04) Distributed analysis from March 2005 with datasets replicated to RAL. Production Desktop
Dan Tovey, University of Sheffield Infrastructure QCDgrid is primarily a data-grid and is aimed at providing a storage and processing infrastructure for QCDOC (5 Tflop-sustained QCD simulation facility) QCDOC is now installed and being shaken down in Edinburgh along with ‘Tier 1’ 50 Tbyte store. ‘Tier 2’ storage nodes have been installed in Edinburgh, Liverpool, Swansea and Southampton. (4 x 12.5 Tbyte) Additional storage/access nodes are operating at RAL and Columbia Processing clusters at Edinburgh (QCDOC FE), Liverpool, Southampton,….
Dan Tovey, University of Sheffield Usage and Uptake Run by Grid administrator + local sysadmins + users Main Edinburgh + Liverpool. All UKQCD primary data already stored and all secondary data produced by grid-retrieval and (currently non-grid) processing. Secondary data is also stored back on QCDgrid (metadata markup not yet automated). All QCDOC data to be archived on QCDgrid with NO tape copies. Job submission software allows submision to any grid-enabled system (only requires Globus) No. of actual users (~8) is quite low at the moment because production data from QCDOC has not (quite) started to flow.
Dan Tovey, University of Sheffield ZEUS MC Grid In % of ZEUS MC is being produced via Grid. RAL, UCL and Scotgrid-Glasgow accept ZEUS VO. 27% of ZEUS Grid MC comes from UK.
Dan Tovey, University of Sheffield ZEUS MC Grid Grid integrated with previous MC production. ZEUS MC production Million Events. Now with grid on target for 458 Million in Monte Carlo data from Grid is being used in ZEUS physics analysis.
Dan Tovey, University of Sheffield User Feedback Degree of engagement between GridPP and experiments questioned by OsC. Questionnaire distributed to all experiments asking for views. Put simply (and bluntly): results suggest strong barriers to successful take-up of Grid in general and LCG in particular by most experiments. Dissatisfaction especially with –stability, –support, –site configuration, –data management and movement More work needed by LCG and GridPP to address these issues encouraging discussion yesterday of some issues.
Dan Tovey, University of Sheffield Usage Statistics collected for grid use: –Overall –GridPP supported overall –GridPP supported in UK Some reason for optimism –Some expts using Grid significantly Still large spikes at ~0…
Dan Tovey, University of Sheffield General User Feedback Perception that Grid techniques are being forced upon experiments through e.g. switch to Grid-only access to Tier-1. Problem of conflict between UK Grid strategy and the priorities of wider international collaborations –This could potentially harm UK physics return. Concern that some experiments having to integrate complex existing software infrastructure with the Grid with little or no available effort or ear-marked financial support. –It is clear that Portal project is going to be key. Shift in emphasis needed towards more pro-active approach aimed at helping experiments to achieve their ‘real-world’ data processing goals GridPP2