Ian Bird LCG Project Leader WLCG Status Report CERN-RRB-2008-038 15 th April, 2008 Computing Resource Review Board.

Slides:



Advertisements
Similar presentations
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Advertisements

Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Assessment of Core Services provided to USLHC by OSG.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Ian Bird LHCC Referee meeting 23 rd September 2014.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
Ian Bird LCG Project Leader LHCC Referee Meeting Project Status & Overview 22 nd September 2008.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Ian Bird LCG Project Leader OB Summary GDB 10 th June 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008.
LCG CCRC’08 Status WLCG Management Board November 27 th 2007
Ian Bird GDB CERN, 9 th September Sept 2015
Procedure to follow for proposed new Tier 1 sites Ian Bird CERN, 27 th March 2012.
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY.
Future computing strategy Some considerations Ian Bird WLCG Overview Board CERN, 28 th September 2012.
Project Status Report Ian Bird Computing Resource Review Board 20 th April, 2010 CERN-RRB
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th November, 2008 Computing Resource Review Board.
LHC Computing, CERN, & Federated Identities
CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
LHC Computing – the 3 rd Decade Jamie Shiers LHC OPN meeting October 2010.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
ARDA Massimo Lamanna / CERN Massimo Lamanna 2 TOC ARDA Workshop Post-workshop activities Milestones (already shown in December)
Ian Bird, CERN 1 st February Dec 2015
Ian Bird LCG Project Leader WLCG Status Report 7 th May, 2008 LHCC Open Session.
LCG Introduction John Gordon, STFC-RAL GDB June 11 th, 2008.
LCG Introduction John Gordon, STFC-RAL GDB November 7th, 2007.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
Ian Bird LCG Project Leader Summary of EGI workshop.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Evolution of storage and data management
Bob Jones EGEE Technical Director
EGEE Middleware Activities Overview
Olof Bärring LCG-LHCC Review, 22nd September 2008
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Input on Sustainability
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
LHC Data Analysis using a worldwide computing grid
Collaboration Board Meeting
Presentation transcript:

Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th April, 2008 Computing Resource Review Board

2 Recent grid use  Across all grid infrastructures  Preparation for, and execution of CCRC’08 phase 1  Move of simulations to Tier 2s Tier 2: 54% CERN: 11% Tier 1: 35% Federations not yet reporting: Finland India (IN-INDIACMS-TIFR) Norway Sweden Ukraine Federations not yet reporting: Finland India (IN-INDIACMS-TIFR) Norway Sweden Ukraine

3 Recent grid activity  These workloads (reported across all WLCG centres) are at the level anticipated for 2008 data taking 230k /day

4 Combined Computing Readiness Challenge – CCRC’08  Objective was to show that we can run together (4 experiments, all sites) at 2008 production scale:  All functions, from DAQ  Tier 0  Tier 1s  Tier 2s  Two challenge phases were foreseen: 1.Feb: not all 2008 resources in place – still adapting to new versions of some services (e.g. SRM) & experiment s/w 2.May: all 2008 resources in place – full 2008 workload, all aspects of experiments’ production chains  Agreed on specific targets and metrics – helped integrate different aspects of the service  Explicit “scaling factors” set by the experiments for each functional block (e.g. data rates, # jobs, etc.)  Targets for “critical services” defined by experiments – essential for production, with analysis of impact of service degradation / interruption  WLCG “MoU targets” – services to be provided by sites, target availability, time to intervene / resolve problems …

5 SRM v2.2 Deployment  Deployment plan was defined and agreed last September, but schedule was very tight  Deployment of dCache 1.8.x and Castor with srm v2.2 was achieved at all Tier0/Tier 1 by December  Today 174 srm v2 endpoints are in production  During February phase of CCRC’08 relatively few problems were found:  Short list of SRM v2 issues highlighted, 2 are high priority  Will be addressed with fixes or workarounds for May  Effort in testing was vital  Still effort needed in site configurations of MSS – iterative process with experience in Feb & May

6 Castor performance – Tier 0  CMS:  Sustained rate to tape 1.3 GB/s with peaks > 2 GB/s  Aggregate rates in/out of castor of 3-4 GB/s  May:  Need to see this with all experiments

7 Data transfer  Each experiment sustained in excess of the target rates (1.3 GB/s) for extended periods.  Peak aggregate rates over 2.1 GB/s – no bottlenecks  All Tier 1 sites were included

8 24x7 Support DoneLate <1 mon Late >1 mon colour coding Only 6  9 sites have tested their 24 X 7 support, and only 5  7 have put the support into operation 27 Mar 08 Must be in place for May; understood by Tier 1s now after February experience

9 CCRC’08 (Feb) – results  Preparation:  Focus on understanding missing and / or weak aspects of the service and in identifying pragmatic solutions  Main outstanding problems in the middleware were fixed (just) in time and many sites upgraded to these versions  The deployment, configuration and usage of SRM v2.2 went better than had predicted, with a noticeable improvement during the month  Despite the high workload, we also demonstrated (most importantly) that we can support this work with the available manpower, although essentially no remaining effort for longer-term work  If we can do the same in May – when the bar is placed much higher – we will be in a good position for this year’s data taking  However, there are certainly significant concerns around the available manpower at all sites – not only today, but also in the longer term, when funding is unclear September 2, 2007 M.Kasemann WLCG Workshop: Common VO Challenge 9/6

10 Tier 0/Tier 1 Site reliability  Target:  Sites 91% & 93% from December  8 best: 93% and 95% from December  See QR for full status Sep 07Oct 07Nov 07Dec 07Jan 08Feb 08 All89%86%92%87%89%84% 8 best93% 95% 96% Above target (+>90% target)

11 Tier 2 Reliabilities  Reliabilities published regularly since October  In February 47 sites had > 90% reliability OverallTop 50%Top 20%Sites 76%95%99%89  100  For the Tier 2 sites reporting:  For Tier 2 sites not reporting, 12 are in top 20 for CPU delivered SitesTop 50% Top 20% Sites> 90% %CPU72%40%70% Jan 08

12 Reliability reporting  Currently (Feb 08) All Tier 1 and 100 Tier 2 sites report reliabilities  Recent progress: MB set up group to  Agreement on equivalence of NDGF tests with those used at EGEE and all other Tier 1 sites – now in production at NDGF  Should also be used for Nordic Tier 2 sites  Similar process with OSG (for US Tier 2 sites): tests only for CE so far, agreement on equivalence, tests are in production, publication to SAM in progress  Missing – SE/SRM testing  Expect full production May 2008 (new milestone introduced)  Important that we have all Tier 2s regularly tested and reporting  Important that we have correct Tier 2 federation contact to follow up these issues

13 Applications Area  Recent focus has been on major releases to be used for 2008 data taking:  QA process and nightly build system to improve release process  Geant4 9.1 released in December  ROOT 5.18 release in January  Two data analysis simulation and computing projects in the PH R&D proposal (July 2007) (Whitepaper)  WP8-1 - Parallelization of software frameworks to exploit multi-core processors  Adaptation of experiment software to new generations of multi-core processors – essential for efficient utilisation of resources  WP9-1 - Portable analysis environment using virtualization technology  Study how to simplify the deployment of the complex software environments to distributed (grid) resources

14 Progress in EGEE-III  EGEE-III now approved  Starts 1 st May, 24 months duration (EGEE-II extended 1 month)  Objectives:  Support and expansion of production infrastructure  Preparation and planning for transition to EGI/NGI  Many WLCG partners benefit from EGEE funding, especially for grid operations: effective staffing level is 20-25% less  Many tools: accounting, reliability, operations management funded via EGEE  Important to plan on long term evolution of this  Funding for middleware development significantly reduced  Funding for specific application support (inc HEP) reduced  Important for WLCG that we are able to rely on EGEE priorities on operations, management, scalability, reliability

15 Comments on EGI design study  Goal is to have a fairly complete blueprint in June  Main functions presented to NGIs in Rome workshop in March  Essential for WLCG that EGI/NGI continue to provide support for the production infrastructure after EGEE-III  We need to see a clear transition and assurance of appropriate levels of support; Transition will be  Exactly the time that LHC services should not be disrupted  Concerns:  NGIs agreed that a large European production-quality infrastructure is a goal  Not clear that there is agreement on the scope  Reluctance to accept level of functionality required  Tier 1 sites (and existing EGEE expertise) not well represented by many NGIs  WLCG representatives must approach their NGI reps and ensure that EGI/NGIs provide the support we need

16 Power and infrastructure  Expect power requirements to grow with capacity of CPU  This is not a smooth process: depends on new approaches and market-driven strategies (hard to predict) e.g. improvement in cores/chip is slowing; power supplies etc. already >90% efficient  No expectation to get back to earlier capacity/power growth rate Introduction of multi-cores  e.g. Existing CERN Computer Centre will run out of power in 2010  Current usable capacity is 2.5MW  Given the present situation Tier 0 capacity will stagnate in 2010  Major investments are needed for new Computer Centre infrastructure at CERN and major Tier 1 centres  IN2P3, RAL, FNAL, BNL, SLAC already have plans  IHEPCCC report to ICFA at DESY in Feb ’08

17 Summary  CCRC’08 first phase was seen as a success  SRM and MSS deployment was achieved  Project and experiments targets were achieved  Preparations for CCRC’08 phase 2 under way  Will be a full test of the entire system – all experiments together  Tuning of tape access with real use patterns – may require experiments to reconsider analysis patterns  Resource ramp-up: based on experiences and problems with 2008 procurements  Must ensure in future years that allowance is made for delays and problems  Important that the yearly April schedules are met – to be ready for accelerator start ups

18 Summary  Remaining Tier 2 federations must now ensure that they regularly report (and verify) accounting and reliability data  Important that we have the correct contact people for the Tier 2 federations  WLCG – especially Tier 1s – should influence the directions of the EGI Design study  Must ensure that we see a clear and appropriate strategy emerging that is fully supported by the NGIs  Must engage the NGI representatives in this  CERN and the Tier1s must ensure that their CC infrastructure can accommodate the pledged capacity beyond 2009