INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Status of EGEE Operations Ian Bird, CERN SA1 Activity Leader EGEE 3 rd Conference Athens,

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Grid Infrastructure and Operations Maite.
Advertisements

LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
EGEE is a project funded by the European Union under contract IST SA1 and NA3 Alistair Mills Grid Deployment Group +41.
EGEE is a project funded by the European Union under contract IST The way ahead Alistair Mills Grid Deployment Group
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
INFSO-RI Enabling Grids for E-sciencE Status of EGEE Production Service Ian Bird, CERN SA1 Activity Leader EGEE 1 st EU Review 9-11/02/2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PPS All sites Meeting: Introduction & Agenda.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
INFSO-RI Enabling Grids for E-sciencE Plan until the end of the project and beyond, sustainability plans Dieter Kranzlmüller Deputy.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IPv6 test methodology Mathieu Goutelle (CNRS.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
Operations Working Group Summary Ian Bird CERN IT-GD 4 November 2004.
LCG CERN David Foster LCG WP4 Meeting 20 th June 2002 LCG Project Status WP4 Meeting Presentation David Foster IT/LCG 20 June 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
INFSO-RI Enabling Grids for E-sciencE Quality Assurance Gabriel Zaquine - JRA2 Activity Manager - CS SI EGEE Final EU Review
IAG – Israel Academic Grid, EGEE and HEP in Israel Prof. David Horn Tel Aviv University.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
INFSO-RI Enabling Grids for E-sciencE User and Virtual Organisation Support in EGEE Flavia Donno, CERN Torsten Antoni, FZK Alistair.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
INFN GRID Production Infrastructure Status and operation organization Cristina Vistoli Cnaf GDB Bologna, 11/10/2005.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
INFSO-RI Enabling Grids for E-sciencE An introduction to EGEE Mike Mineter NeSC Edinburgh
Components Selection Validation Integration Deployment What it could mean inside EGI
EGEE-II INFSO-RI Enabling Grids for E-sciencE Training in EGEE-II Mike Mineter (Some slides from Brendan Hamill)
INFSO-RI Enabling Grids for E-sciencE Upcoming Releases Markus Schulz CERN SA1 15 th June 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
JRA1 Middleware re-engineering
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
Operations Status Report
EGEE Middleware Activities Overview
SA1 Execution Plan Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Long-term Grid Sustainability
LCG Operations Workshop, e-IRG Workshop
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Status of EGEE Operations Ian Bird, CERN SA1 Activity Leader EGEE 3 rd Conference Athens, 18 th April, 2005

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Overview Overall activity status  Service & Operations Planning for remainder of project  Main focus of activities  gLite migration Summary  Tomorrow’s plenary session for technical details

INFSO-RI Enabling Grids for E-sciencE Operations Status

Country providing resources Country anticipating joining In LCG-2:  131 sites, 30 countries  >12,000 cpu  ~5 PB storage Includes non-EGEE sites: 9 countries 20 sites Computing Resources: April 2005

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Infrastructure metrics Countries, sites, and CPU available in EGEE production service Countries, sites, and CPU available in EGEE production service Regioncoun- tries sitescpu M6 (TA) cpu M15 (TA) cpu actual CERN UK/Ireland France Italy South East South West Central Europe Northern Europe Germany/Switzerland Russia EGEE-total USA Canada Asia-Pacific Hewlett-Packard Total other Grand Total EGEE partner regions Other collaborating sites

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Service Usage VOs and users on the production service  Active HEP experiments:  4 LHC, D0, CDF, Zeus, Babar  Active other VO:  Biomed, ESR (Earth Sciences), Compchem, Magic (Astronomy), EGEODE (Geo-Physics)  6 disciplines  Registered users in these VO: 600  In addition to these there are many VO that are local to a region, supported by their ROCs, but not yet visible across EGEE Scale of work performed:  LHC Data challenges 2004:  >1 M SI2K years of cpu time (~1000 cpu years)  400 TB of data generated, moved and stored  1 VO achieved ~4000 simultaneous jobs (~4 times CERN grid capacity) Number of jobs processed/month

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April SA1 – Operations Structure Operations Management Centre (OMC): Core Infrastructure Centres (CIC)  Manage daily grid operations – oversight, troubleshooting  Run essential infrastructure services  Provide 2 nd level support to ROCs  UK/I, Fr, It, CERN, + Russia (M12)  Weekly rotation in place since October  Taipei also run a CIC Regional Operations Centres (ROC)  Act as front-line support for user and operations issues  Provide local knowledge and adaptations  One in each region – many distributed User Support Centre (GGUS)  In FZK – manage PTS – provide single point of contact (service desk)  Not foreseen as such in TA, but need is clear

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Operations Procedures Driven by experience during 2004 Data Challenges, & Reflecting the outcome of the November Operations Workshop Operations Procedures  roles of CICs - ROCs - RCs  weekly rotation of operations centre duties (CIC-on-duty)  Process in place since October  daily tasks of the operations shift  monitoring (tools, frequency)  problem reporting problem tracking system communication with ROCs&RCs  escalation of unresolved problems  handing over the service to the next CIC

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April New Release Process (simplified) C&T EIS GIS GDB Applications RC Bugs/Patches/Task Savannah Bugs/Patches/Task Savannah EIS CICs Head of Deployment Head of Deployment prioritization & selection Developers Applications Developers 1 1 List for next release (can be empty) List for next release (can be empty) 2 2 integration & first tests C&T 3 3 Internal Releases Internal Releases 4 4 User Level install of client tools EIS 5 5 full deployment on test clusters (6) functional/stress tests ~1 week C&T 6 6 assign and update cost Bugs/Patches/Task Savannah Bugs/Patches/Task Savannah components ready at cutoff Internal Client Release Internal Client Release 7 7 Client Release Client Release Service Release Service Release Updates Release Updates Release Core Service Release Core Service Release C&T

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Deployment process Release(s) Certification is run daily Update User Guides EIS Update Release Notes GIS Release Notes Installation Guides User Guides Re-Certify CIC Every Month 11 Release Client Release Deploy Client Releases (User Space) GIS Deploy Service Releases (Optional) CICs RCs CICs RCs Deploy Major Releases (Mandatory) ROCs RCs ROCs RCs YAIM Every Month Every 3 months on fixed dates ! at own pace

INFSO-RI Enabling Grids for E-sciencE Planning for next year

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Future work – comments from review Testing and software packaging will be critical to success. Reinforce these also intellectually very demanding activities even further.  Yes – this is agreed! Work hard on event-based monitoring techniques, triggering preventive maintenance actions, to improve the stability of the Grid infrastructure. Implement a strong mechanism to quickly isolate unstable sites in the production Grid.  These are both part of ongoing program of work  Use R-GMA as monitoring framework; build triggers and alarms on top  Better mechanism to remove sites – web interface to allow VO to select Improve the middleware deployment process (technical, organisational) even further to increase the stability of the infrastructure and consequently improve the job success rate and reduce the load on the support team.  Already updated and streamlined deployment and release process and improved configuration mechanisms

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April month plan No major changes to goals or work Areas of work focus:  Migration to gLite  See next slides  Improving operational and grid reliability  Follow recommendations of review discussed above  Improve monitoring systems – build reactive alarms  Site isolation – need simple mechanism (CIC tool) to remove sites Bad sites, security problems, etc.  Improving user support  In progress – need recognised usable service by mid-year  24x7 service availability  Availability of service rather than components  Identify critical services  Isues: on-call support; hot stand-by machines; etc (might need work on middleware to support this!)

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Review recommendations to SA1 The migration path to gLite needs to be better planned, as it is inherently difficult to support two different grid software stacks indefinitely. More specifically, establishing a fixed time-line for migration as well as deprecation deadlines for LCG-2 services, plus possibly identifying who would be the earliest adopters from the application side and the time-line for their possible early committal, would be essential; otherwise, existing users may not be motivated to migrate. Migration plan is being worked out in detail – but will be driven by experience in the certification and pre-production deployment Must be a migration plan and not a switch from old to new Early adopters include LCG, others should be identified via NA4

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Migration to gLite Migration strategy  Needs to be incremental rather than big- bang – as has been stated for a year 2 Activities in parallel:  Deploy components into LCG-2 certification test-bed and then to pre-production  Deploy pre-production sites in parallel PPS and Production  Are evolutionary LCG-2  gLite components Cannot provide LCG-2 end-of-life estimate/deadlines  LCG-2 is the fallback solution Applications must test services and decide which ones they need LCG-2 (=EGEE-0) prototyping product LCG-3 (=EGEE-x?) product

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Review recommendations to SA1 Consider the current gLite as a stepping stone towards a more robust standards-based infrastructure, rather than a final deployment solution. Select additional components for integration and deployment through collaborations with other international middleware R&D initiatives. Work with Globus, VDT, OSG, etc on common solutions/interfaces – but has to be driven by the applications and experience from operations Should be in situation to be able to deploy components needed by the applications Integration and certification process mechanism from selecting other components

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Review recommendations to SA1 Continue to conduct application-driven investigation that may result in complex usage scenarios and consider how the advanced middleware and infrastructure would support them in a viable manner. As such, keep a keen eye on new generations of production-level Grid middleware from various international groups that go beyond gLite features. For HEP – Data challenges and service challenges bring specific goals and targets (and timescales) – this will continue Other applications might consider similar exercises – define some goals

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Milestones for rest of project M14: full production grid in production  9 ROCs, 5 CICs (include Russia at M12), 20 sites  Should be based on EGEE re-engineered middleware.  This is dependent on the quality and robustness of gLite components  Experience: takes 6 months to put new software into production  Will not deploy new components unless they improve upon existing components or add new required functionality M21: expanded production infrastructure in place  As above, but expanded to 50 sites  Now decoupled from specific gLite release

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Deliverables for rest of project Release notes corresponding to milestones  Updated relative to first set of release notes; snapshots corresponding to milestones  NB. ALL releases are accompanied by full set of release notes EGEE “Cookbook”  Foreseen as planning guides to assist new participants join or build components of the infrastructure.  Resource centres and their administrators  ROCs, CICs, and VOs  Templates and checklists to assist administrators to: design a facility, determine what resources to acquire, how to configure them, etc.  Detailed enough to allow admins to understand limitations of the system are and how to address them (e.g. what services can run on 1 machine, how to configure, etc.)  Make use of expertise of CICs, ROCs and staff in RCs (“and use technical writers in NA3”) M24: Assessment of infrastructure operation throughout the project  Remove suggestions on long-term sustainability  put into EGEE-2 planning

Enabling Grids for E-sciencE INFSO-RI Athens Conference; 18 th April Summary Production grid is operational and in use  Larger scale than foreseen, use in 2004 probably the first time such a set of large scale grid productions has been done  Modest growth in resources foreseen over next year Operational infrastructure in place and working  Need to continue to improve reliability of service  Need to continue to improve user support Support for applications and VOs  VO deployment should become still simpler and more routine  Application support needs more resources than foreseen Deployment and migration to gLite is now a major focus