LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY.

Slides:



Advertisements
Similar presentations
Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.
Advertisements

CSCU 411 Software Engineering Chapter 2 Introduction to Software Engineering Management.
Release & Deployment ITIL Version 3
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
S/W Project Management
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
EGEE is a project funded by the European Union under contract IST JRA1 Testing Activity: Status and Plans Leanne Guy EGEE Middleware Testing.
CCSM Software Engineering Coordination Plan Tony Craig SEWG Meeting Feb 14-15, 2002 NCAR.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
Research & Technology Implementation TxDOT RTI OFFICE.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
LCG LHC Computing Grid Project – LCG CERN – European Organisation for Nuclear Research Geneva, Switzerland LCG LHCC Comprehensive.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
INFSO-RI Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE MiddlewareLCG Internal review18 November EGEE Middleware Activities Overview Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
LCG Introduction John Gordon, STFC-RAL GDB May 2, 2007.
WLCG Middleware Support II Markus Schulz CERN-IT-GT May 2011.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Documentation (& User Support) Issues Stephen Burke RAL DB, Imperial, 12 th July 2007.
EMI INFSO-RI SA1 Session Report Francesco Giacomini (INFN) EMI Kick-off Meeting CERN, May 2010.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Ian Bird LCG Project Leader On the transition to EGI – Requirements from WLCG WLCG Workshop 24 th April 2008.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon Kick-off meeting, 24 th October 2011.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
DJ: WLCG CB – 25 January WLCG Overview Board Activities in the first year Full details (reports/overheads/minutes) are at:
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th April, 2008 Computing Resource Review Board.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
PCAP Close Out Feb 2, 2004 BNL. Overall  Good progress in all areas  Good accomplishments in DC-2 (and CTB) –Late, but good.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
1 Proposal for a technical discussion group  Because...  We do not have a forum where all of the technical people discuss the critical.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
Components Selection Validation Integration Deployment What it could mean inside EGI
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
1 September 2007WLCG Workshop, Victoria, Canada 1 WLCG Collaboration Workshop Victoria, Canada Site Readiness Panel Discussion Saturday 1 September 2007.
EMI INFSO-RI SA2: Quality Assurance Status Report Alberto Aimar(SA2) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
15-Jun-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the LCG Security Group) CERN 15 June 2004 David Kelsey CCLRC/RAL, UK
Ian Bird LCG Project Leader Summary of EGI workshop.
Bob Jones EGEE Technical Director
EGEE Middleware Activities Overview
SA1 Execution Plan Status and Issues
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Database Readiness Workshop Intro & Goals
The CCIN2P3 and its role in EGEE/LCG
Leigh Grundhoefer Indiana University
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
Presentation transcript:

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 2 Information sources Input: Review of Tier 1 readiness June 8th Cern Reviewers: John Gordon (RAL), Volker Gülzow (DESY) Chair, Alessandro de Salvo (INFN Rome), Jeff Templon (NIKHEF), Frank Würthwein (UCSD) From a questionnaire to Tier 1‘s, from questions to the Experiments (Tier 1‘s, Middleware, Interoperability) From documents from MB, CRRB CTDR’s + supplement Tier 1 milestone plans LCG-wiki’s From other sources

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 3 Review Process I Mandate: (Discussed in MB) “… review pays specific attention to the following topics: state of readiness of CERN and the Tier-1 centres, including operational procedures and expertise, 24 X 7 support, resource planning to provide the required capacity and performance, site test and validation programme; the essential components and services missing in SC4 and the plans to make these available in time for the initial LHC service; the EGEE-middleware deployment and maintenance process, including the relationship between the development and deployment teams, and the steps being taken to reduce the time taken to deploy a new release; the plans for testing the functionality, reliability and performance of the overall service; interoperability between the LCG sites in EGEE, OSG and NDGF;”

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 4

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 5 Tier1/2 Summary Table 40 Tier2 centres have their data included in above table. 9 more centres plan to join as soon as possible. Source: Chris Eck, CRRB April 2006

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 6 Overall Comments to Tier 1‘s The Tier 1 requirements are currently changing due to accelerator time schedule, new resource planning from the experiments will show up in October A lot of diversity among the Tier-1’s i.e. Background Technology Funding Staffing # of experiments, size

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 7 Overall Comments to Tier 1‘s (June06) Not all the Tier-1’s have reached the level of readiness, which is required for LHC start-up. Key-factors are organisational gaps in implementing off-hour service, funding problems, communication with experiments (two sided problem) There are severe risks with the scalability of the resources. The manpower situation on the Tier 1‘s was not always transparent during the review

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 8 Source: Les Robertson

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 9 Overall Comments to Tier 1‘s The overall monitoring of the Tier 0/1/2 complex is of very great importance. The Tier 2 associations are not completely clear. This needs immediate clarification The support concept for Tier 2/Tier 3 centres by Tier 1’s is not well determined. This is partly because of unclear requirements from the experiments. At this stage, one should no longer make distinction between production and SC4 infrastructure (experiments complain)

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 10 milestone plans

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 11 „Communication“ Clear (and redundant) contact persons (e.g. liaison officers) have to be nominated on both sides. Clear/precise information from the experiments, well structured. Web based monitoring pages for operational issues should be made available by the experiments.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 12 „Communication“ Operations meetings OPS/SCM/RSM are important -> mandate etc. reviewed by MB GGUS is a well accepted tool and should be used as the main tracking tool. Further improvements are needed (e.g. GUI, amount of mails, support for full set of problem categories, “when can a case be declared closed?”)

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 13 „24x7“ A full 24x7 in the sense of live monitoring and alarming and for a certain class of problems „immediate“ reaction is required. A „on call“- Service still has to be setup at many sites. It‘s required to 1.have the right tools, which are often not sufficient. For the setup of tools, a initiative (eg via HEPIX) should be started to sharpen the tool set, which is helpful for Tier 2 and Tier 3‘s as well. 2.Have adequate staff available -> management. In the focus of MB.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 14 „Management issues“ The funding situation is not clear at every centre. A revised ramp up planning may help. This has to be followed carefully. Clear, up to date and realistic requirements from the Exp. would help the Tier 1‘s to acquire on time. At some centres critical work is carried out by temporary staff, depending on the country this can cause severe problems.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 15 „Middleware“ The introduction of gLite 3 was a bit “bumpy”, people were somewhat confused. Many emotions prior to real experience were expressed, which was not helpful. There were lots of complaints but only very little error reporting. The “post mortem” analysis of the process was very much appreciated.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 16 „Middleware“ Sites were not able to meet the tight time constraints. Reasons were (and are)  lack of manpower,  lack of understanding,  Site localization  coordination with needs of non-LHC experiments.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 17 „Middleware“ Stable production environments have to be the no. 1 goal today. Worry about effort diverted on side projects. The Software was not mature enough, we need to find ways to guarantee readiness of software when released. The representation of operational issues in the TCG is not adequate, the Tier 1’s should be better represented, their input has to be taken. The TCG should include operational issues in the priority list and allow sites to influence the ranking. Full VOMS needed! The error reporting from the users has to improve. The middleware urgently needs proper operational interfaces: –Logging –Diagnostics –Service operation interfaces

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 18 „Interoperability“ The experiments should make the importance of the problem clear. Interoperability of the grids needs more attention and manpower as there is today if required Can we expect uniform testing (SFT’s), monitoring, accounting, and metrics for ALL WLCG sites?

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 19 Conclusion: Excellent work was done at the Tier 1’s on many tasks The cultural gap has to be bridged The 24x7 case is almost open Monitoring of sites strongly recommended the funding and staffing situation needs careful attention Middleware robustness and operational hooks needed More binding acting in certain areas is required (on all Tier levels) The new ramp up does not allow to lean back