CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review 17-19 November 2003.

Slides:



Advertisements
Similar presentations
Nick Brook University of Bristol The LHC Experiments & Lattice EB News Brief overview of the expts  ATLAS  CMS  LHCb  Lattice.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
RomeWorkshop on eInfrastructures 9 December LCG Progress on Policies & Coming Challenges Ian Bird IT Division, CERN LCG and EGEE Rome 9 December.
Ian Bird LHCC Referee meeting 23 rd September 2014.
Grid Deployment Data challenge follow-up & lessons learned Ian Bird LCG Deployment Area Manager LHCC Comprehensive Review 22 nd November 2004.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
LCG LHC Computing Grid Project – LCG CERN – European Organisation for Nuclear Research Geneva, Switzerland LCG LHCC Comprehensive.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
SA1/SA2 meeting 28 November The status of EGEE project and next steps Bob Jones EGEE Technical Director EGEE is proposed as.
CERN LCG Deployment Overview Ian Bird CERN IT/GD LHCC Comprehensive Review November 2003.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
15-Dec-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the Joint Security Policy Group) CERN 15 December 2004 David Kelsey CCLRC/RAL,
EGEE is a project funded by the European Union under contract IST EGEE Services Ian Bird SA1 Manager Cork Meeting, April
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE MiddlewareLCG Internal review18 November EGEE Middleware Activities Overview Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Ian Bird GDB CERN, 9 th September Sept 2015
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
LCG CERN David Foster LCG WP4 Meeting 20 th June 2002 LCG Project Status WP4 Meeting Presentation David Foster IT/LCG 20 June 2002.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
DataGrid is a project funded by the European Commission under contract IST rd EU Review – 19-20/02/2004 The EU DataGrid Project Three years.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WP3 EGEE Steve Fisher / RAL 14/1/2004. WP3 Steve Fisher/RAL - 14/1/2004EGEE2 Credits My slides have been stolen from many sources including: –Fabrizio.
Towards deploying a production interoperable Grid Infrastructure in the U.S. Vicky White U.S. Representative to GDB.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
15-Jun-04D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security Update (Report from the LCG Security Group) CERN 15 June 2004 David Kelsey CCLRC/RAL, UK
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Evolution of storage and data management
Bob Jones EGEE Technical Director
Status of Task Forces Ian Bird GDB 8 May 2003.
Grid Operations Centre Progress to Aug 03
Regional Operations Centres Core infrastructure Centres
EGEE Middleware Activities Overview
David Kelsey CCLRC/RAL, UK
JRA3 Introduction Åke Edlund EGEE Security Head
SA1 Execution Plan Status and Issues
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Grid Deployment Area Status Report
LCG Operations Centres
Operating the LCG and EGEE Production Grid for HEP
LCG experience in Integrating Grid Toolkits
Leigh Grundhoefer Indiana University
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
Presentation transcript:

CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003

CERN LCG Grid Deployment Area  Goal: - deploy & operate a prototype LHC computing environment  Scope: –Integrate a set of middleware and coordinate and support its deployment to the regional centres –Provide operational services to enable running as a production-quality service –Provide assistance to the experiments in integrating their software and deploying in LCG; Provide direct user support  Deployment Goals for LCG-1 –Production service for Data Challenges in 2004 Initially focused on batch production work –Experience in close collaboration between the Regional Centres –Learn how to maintain and operate a global grid –Focus on building a production-quality service Robustness, fault-tolerance, predictability, and supportability take precedence –Understand how LCG can be integrated into the sites’ physics computing services Move away from dedicated testbeds

CERN Deployment Activities  Three main areas of deployment activity Middleware: –Testing and certification –Packaging, configuration, distribution and site validation –Support – problem determination and resolution; feedback to middleware developers Operations: –Grid infrastructure services –Site fabrics run as production services –Operations centres – trouble and performance monitoring, problem resolution – 24x7 globally Support: –Experiment integration – ensure optimal use of system –User support – call centres/helpdesk – global coverage; documentation; training

CERN Implementation – 1  A core team at CERN – Grid Deployment group  Collaboration of the regional centres – through the Grid Deployment Board  Partners take responsibility for specific tasks (e.g. GOC)  Focussed task forces as needed  Collaborative joint projects – via JTB, grid projects, etc. CERN deployment group, includes LCG funded staff, fellows, etc –Core preparation, deployment, and support activities –Integration, packaging, debugging, development of missing tools, –Deployment coordination & support, security & VO management, –Experiment integration and support GDB: Country representatives for regional centres –Address policy, operational issues that require general agreement –Brokered agreements on: Initial shape of LCG-1 via 5 working groups Security What is deployed

CERN Implementation – 2 Several long-term groups, set up by the GDB: Security group –Members: site security contacts, experiments –Proposes policies, usage rules, registration, etc, operational issues – audit, incident response Grid Operations Centre – RAL –Includes a GOC steering group Call Centre – FZK –Works together with GOC groups  These will be discussed in later talks

CERN Task forces Limited time task forces set up to address specific issues: Mass storage access –Working to agree and implement a common strategy to provide access to mass storage (tape and disk) at LCG sites For implementation details see discussion under GTA Packaging, installation, configuration tools –To address the problems of simplifying m/w installation and config …

CERN Collaborative activities Via the HICB-JTB –Members from US and EU grid projects –Address common issues of interoperability GLUE schema, Testing –Several other issues of interoperability to be addressed under its umbrella: Replica location services Installation tools Monitoring Workload management –Demonstrate interoperability … Through GGF –PNPA research area – bring experiences and requirements to GGF –Several relevant areas: production grid management, security, user support, SRM, etc.

CERN Collaborative Activities – 2 Other collaborative activities include: –Russian participation in testing group Team of 3 with one person (3 month rotating) at CERN –Collaboration on monitoring with INFN GridIce tool adapted to LCG needs –Collaboration with Indian group Monitoring, problem tracking tools –US-LHC VO management tools Workshop at CERN in December

CERN 2003 Milestones Project Level 1 Deployment milestones for 2003: –July: Introduce the initial publicly available LCG-1 global grid service With 10 Tier 1 centres in 3 continents –November: Expanded LCG-1 service with resources and functionality sufficient for the 2004 Computing Data Challenges Additional Tier 1 centres, several Tier 2 centres – more countries Expanded resources at Tier 1s –(e.g. at CERN make the LXBatch service grid-accessible) Agreed performance and reliability targets This is what is now called LCG-2

CERN Progress in 2003 LCG-0 –During March – May deployed first set of m/w based on VDT and EDG1.4 –To put procedures in place as far as possible –Was adapted by CMS (and LCG) for production work LCG-1 – 1 st milestone in July –Was ~3 months late – middleware was not available until June –Process to prepare for deployment and deploy took expected time –Now deployed to 21 sites But resources limited until experiments have validated it LCG-2 – 2 nd milestone –Needed updates for DCs, but problems were introduced –Goal is deployment in December – as an upgrade to LCG-1 –On-track to achieve this Milestones back on track – BUT –Experiments will have had little experience with the LCG service –Operational issues can only now start to be addressed

CERN Achievements – 2003 Put in place the Integration and Certification process: –Essential to prepare m/w for deployment – the key tool in trying to build a stable environment –Used seriously since January for LCG-0,1,2 – also provided crucial input to EDG  LCG is more stable than earlier systems Set up the deployment process: Tiered deployment and support system is working Currently support load on small team is high, must devolve to GOC Support experiment deployment on LCG-0,1 User support load high – must move into support infrastructure (FZK) CMS use of LCG-0 in production Produced a comprehensive User Guide Put in place security policies and agreements Particularly important agreements on registration requirements Basic Operations Centre and Call Centre frameworks in place Expect to be ready for the 2004 DCs Essential infrastructures are ready, but have not yet been production tested And, improvements will happen in parallel with operating the system

CERN Problems Middleware is not yet production quality –Although a lot has been improved, still unstable, unreliable –Some essential functionality was not delivered – LCG had to address Deployment tools not adequate for many sites –Hard to integrate into existing computing infrastructures –Too complex, hard to maintain and use Middleware limits a site’s ability to participate in multiple grids –Something that is now required for many large sites supporting many experiments, and other applications We are only now beginning to try and run LCG as a service –Find many configuration issues that have no tools to resolve Delays have meant that we could not yet address these fundamental issues that we had hoped to this year

CERN Changing landscape The view of grid environments has changed in the past year From –A view where all LHC sites would run a consistent and identical set of middleware, To –A view where large sites must support many experiments each of which have grid requirements –National grid infrastructures are coming – catering to many applications, and not necessarily driven by HEP requirements We have to focus on interoperating between potentially diverse infrastructures (“grid federations”) –At the moment these have underlying same m/w –But modes of use and policies are different Need to have agreed services, interfaces, protocols The situation is now more complex than anticipated

CERN Expected Developments in 2004 General: –LCG-2 will be the service run in 2004 – aim to evolve incrementally –Goal is to run a stable service Some functional improvements: –Extend access to MSS – tape systems, and managed disk pools –Distributed replica catalogs – with Oracle back-ends To avoid reliance on single service instances Operational improvements: –Monitoring systems – move towards proactive problem finding, ability to take sites on/offline –Continual effort to improve reliability and robustness –Develop accounting and reporting Address integration issues: –With large clusters, with storage systems –Ensure that large clusters can be accessed via LCG –Issue of integrating with other experiments and apps Move to EGEE –EGEE will operate the service in Europe –LCG deployment and operations teams will be part of the core activity –Understanding how to peer with other Grid infrastructures will be essential

CERN Deployment Activities Human Resources ActivityCERN/LCGExternal Integration & Certification5VDT testers group Debugging/development/mw support 3 Testing32 Experiment Integration & Support4 (+1 Dec) Deployment & Infrastructure Support 6.5 RC system managers Security/VO Management2Security Task Force Operations Centres RAL + GOC Task Force Grid User Support FZK + US Task Force Management1 Totals24.5 (+1) Team of 3 Russians have 1 at CERN at a given time (3 months) Refer to Security talk Refer to Operations Centre talk In collaboration The GDA team has been very understaffed – only now has this improved with 6 new fellows in October There are many opportunities for more collaborative involvement in operational and infrastructure activities

CERN Summary Initial milestone was late – –middleware came late, functionality less than hoped for Many issues with deployment … –Packaging, dependencies, incompatibility of m/w installs with others, requirement to control full machine environment, etc… … and with lack of functionality for experiments … … were due to the legacies LCG inherited –Working hard to get away from them – but it is a complex problem –Lack of time to adapt research products to a production service environment Very little time to run this as a service – we are still resolving basic operational issues –This will continue during 2004 BUT – we will have a service adequate for the DCs in 2004, with appropriate functionality –And the infrastructure is there to provide all aspects of support There is still much work to be done!

CERN Agenda Certification and Testing activities – Marco Serra Deployment and Experiment Support – Markus Schulz Regional Centre experiences in LCG-1 – Federico Ruggieri Security – Dave Kelsey Operations Centres and User Support – Trevor Daniels Experiment reports – experiments