Ian Bird LCG Project Leader LHCC Referee Meeting Project Status & Overview 22 nd September 2008.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

Ian Bird LCG Project Leader LHCC + C-RSG review. 2 Review of WLCG  To be held Feb 16 at CERN  LHCC Reviewers:  Amber Boehnlein  Chris.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Ian Bird LCG Project Leader WLCG Status Report 27 th October, 2008 Project Overview Board.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
LCG LHC Computing Grid Project – LCG CERN – European Organisation for Nuclear Research Geneva, Switzerland LCG LHCC Comprehensive.
UK middleware deployment GridPP27 - CERN 15 th September 2011 GridPP27 - CERN 15 th September 2011 Status & plans Jeremy Coles.
24x7 Support Ian Bird GDB 9 th September The response times in the above table refer only to the maximum delay before action is taken to repair.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Les Les Robertson LCG Project Leader WLCG – Management Overview LHCC Comprehensive Review September 2006.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
Information System Status and Evolution Maria Alandes Pradillo, CERN CERN IT Department, Grid Technology Group GDB 13 th June 2012.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 21 st September 2009.
CCRC’08 Monthly Update ~~~ WLCG Grid Deployment Board, 14 th May 2008 Are we having fun yet?
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
DJ: WLCG CB – 25 January WLCG Overview Board Activities in the first year Full details (reports/overheads/minutes) are at:
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
LCG Introduction John Gordon, STFC GDB July8th 2009.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
Ian Bird LCG Project Leader Updating the Resource Planning for 2009/2010.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th April, 2008 Computing Resource Review Board.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
SRM 2.2: experiment requirements, status and deployment plans 6 th March 2007 Flavia Donno, INFN and IT/GD, CERN.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Ian Bird LCG Project Leader WLCG Status Report 7 th May, 2008 LHCC Open Session.
LCG Introduction John Gordon, STFC-RAL GDB June 11 th, 2008.
LCG Introduction John Gordon, STFC-RAL GDB November 7th, 2007.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Communication, Communication, Communication
LCG Service Challenge: Planning and Milestones
gLite->EMI2/UMD2 transition
Update on Plan for KISTI-GSDC
The CREAM CE: When can the LCG-CE be replaced?
Short update on the latest gLite status
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Data Management cluster summary
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

Ian Bird LCG Project Leader LHCC Referee Meeting Project Status & Overview 22 nd September 2008

2 Since July...  Experiments and WLCG have continued in “production mode”  Daily operations meetings – with written summaries  Main problems trigger post-mortems (see later); MB follow up of specific points as needed  Workloads and data transfers have remained at high levels  Some data files seen from cosmics, injection tests, Sep 10,...  Reasonably sustainable  Probably too many significant changes are still going on... Especially for storage

3 Ongoing work: jobs and transfers since CCRC May ~400k/day

4 Storage Services Summary Recall – 2 main points:  Outstanding functionality – agreed in short term solutions to be provided by end of 2008  In progress  Operational issues and considerations arising from use for data  Addressed in Sep GDB  Configuration problems for dCache – at 2 sites, now understood and being addressed  Clarified support model – sites should really attend weekly phone conf  Operational instabilities – probably largest outstanding problem  See post-mortem summary in Tier 1 report  Some concern over lack of general ability to test releases (of dCache)  But each site is so different that they really have to do pre-production testing before deployment

5 ALICECMS installed capacity (inc. efficiency factor) ATLASLHCb MoU commitment (inc. efficiency factor)

6 Resources  Awaiting first RSG reports  RSG will formally report to November C-RRB  Already received request from ATLAS to double 2009 Tier 0 + CAF resources  BUT:  CERN (Tier 0/ CAF) budget is fixed  LCG cannot decide/prioritise between allocations to experiments  Will need the LHCC and CERN scientific management to define what are the priorities: i.e. Given the budget envelope what fraction should be given to each experiment?  For other sites (Tier1 and Tier2)  The RRB (funding agencies) will note the RSG recommendations and perhaps adjust their pledges – this is very much unknown yet...

7 Site reliability: CERN+Tier 1s At the time of the mini- review

8 Site Reliability – CERN + Tier 1s CCRC’08... But... We know that these standard tests hide problems... e.g. RAL Castor unavailable for 2 weeks for ATLAS in August...

9 V0-specific testing...

10 Site Reliability: Tier 2s  OSG Tier 2s now reporting regularly – still a few remaining issues to resolve over next couple of months CCRC’08...

11 Middleware: Baseline Services  Storage Element  Castor, dCache, DPM  Storm added in 2007  SRM 2.2 – deployed in production – Dec 2007  Basic transfer tools – Gridftp,..  File Transfer Service (FTS)  LCG File Catalog (LFC)  LCG data mgt tools - lcg-utils  Posix I/O –  Grid File Access Library (GFAL)  Synchronised databases T0  T1s  3D project  Information System  Scalability improvements  Compute Elements  Globus/Condor-C – improvements to LCG-CE for scale/reliability  web services (CREAM)  Support for multi-user pilot jobs (glexec, SCAS)  gLite Workload Management  in production  VO Management System (VOMS)  VO Boxes  Application software installation  Job Monitoring Tools The Basic Baseline Services – from the TDR (2005)

12 Middleware... coming  Since July:  Continuing deployment of bug fixes, patches, etc for problems found in prodution  Clients on new ‘platforms’  SL5 WN 32/64, SL5 UI 32  SL4/SL5 Python 2.5  Debian4/32 WN  SL4/SL5 + new compiler version  Imminent or available updates  FTS/SL4 (available) ‏  Globus bugfixes (available) ‏  lcg-CE – further performance improvements  Results of WN working group – cluster publishing

13  glexec/SCAS  glexec is being verified for use with experiment frameworks in the PPS (setuid mode, without SCAS) ‏  SCAS still in ‘developer testing’  CREAM  First release to production imminent  NOT as a replacement for lcg-CE  Issues with proxy renewal  WMS/ICE  Patch under construction  Glue2  OGF Public Comments are now over  Glue WG will incorporate these during October  Will be deployed in parallel as it is non backward compatible New services

14  Schedule announce by CERN envisaged SLC5 lxbatch nodes in October  “Formal certification not yet started“  Middleware can progress anyway with SL5 or CENTOS5  We are on course, barring surprises in runtime testing…  Based on VDT1.8 – hope to upgrade to 1.10 very soon and base the rest of the release on this  Contemplating an approach to build which would no longer allow co-location of services, apart from explicit exceptions  Full schedule;  Clients  VOBOX  Storage 32/64  CE (NOT LCG-CE)  Target - whenever ready but before 6/2009 SL5

15 Milestone Summary

The response times in the above table refer only to the maximum delay before action is taken to repair the problem. The mean time to repair is also a very important factor that is only covered in this table indirectly through the availability targets. All of these parameters will require an adequate level of staffing of the services, including on-call coverage outside of prime shift. The following parameters define the minimum levels of service. They will be reviewed by the operational boards of the WLCG Collaboration.

17 Milestone Summary VO Box SLAs: In use, generally being finalised and signed off In test by ATLAS – waiting for confirmation to deploy more widely Agreed specific functionality – end of 2008 CAF : defined during CCRC OSG Reliability reporting in place Tape metrics: ASGC will duplicate CERN metrics; IN2P3 have some metrics from HPSS but reporting not yet automated Procurements: 2008: see status in Tier 1 talk 2009: reporting status this month Glexec (for pilot jobs): glexec and experiment frameworks OK Waiting for SCAS to be available VO-specific availability testing: validation in progress

18 Concerns etc.  Operational instabilities  Especially of storage systems  Staffing MSS installations at sites adequate?  Resources:  Sharing of allocations – need guidelines  Extended delays for 2008 resources; hope 2009 is better (but still will be vulnerable to delays/problems)  Overall planning/pledging process is probably unrealistic and we need to adjust it – C-RRB topic  5year outlook  3 years?  RSG should be looking +1 year in advance  Confirmed pledges/RRB needs to be earlier in the cycle  User analysis at CERN  But NB strategy is that Tier 2s are for analysis  Setting this up is late (no clear idea of what was needed; previous delays in Castor)