EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Operations Automation in EGEE-III What does.

Slides:



Advertisements
Similar presentations
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC SEE By E. Atanassov,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Operations Dashboard Workplan Cyril.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGI: SA1 Operations John Gordon EGEE09 Barcelona September 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD21 22 Sept 2009 Forum & COD-22 since COD21 until EGI Hélène Cordier COD-22, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD June 2009 COD-20 Parallel sessions Hélène Cordier COD-20, CNRS-IN2P3,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Operations Automation Team KoM, May ROC VIEW (SWE)‏ Javier Lopez Cacheiro/
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks DSA1.4 – Objectives and Status Ioannis Liabotis.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Julia Andreeva on behalf of the MND section MND review.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CIC portal Requirements from users WLCG service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Alistair.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-17
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA5: Policy and International Cooperation.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Transition to EGI PSC-06 Istanbul Ioannis Liabotis Greece GRNET
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Ian Bird GDB Meeting CERN 9 September 2003
Introduction to OAT presentations
Evolution of SAM in an enhanced model for monitoring the WLCG grid
A Messaging Infrastructure for WLCG
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation in EGEE-III What does the OAT mean to you ? James Casey, CERN EGEE’08 Istanbul, Turkey

Enabling Grids for E-sciencE EGEE-III INFSO-RI What is the Operations Automation Team (OAT) Defined in EGEE MSA1.1 –‘Operations Automation Strategy’ –Initial focus on multi-level monitoring –Delivered mid-June – comments still welcome Abstract: In EGEE-III, within the SA1 activity, a group called the ‘Operations Automation Team’ was formed with the task of coordinating operational tools and their development, with the specific goal of advising on the strategic directions to take in terms of automating the operations effort. This will entail replacing manual processes with automated ones in order that the overall staffing level of operations can be significantly reduced in a long-term, sustainable infrastructure. This document outlines a strategy for achieving this automation using an integration architecture based on messaging. It describes how current tools and processes, such as operational alarming and ticketing will evolve during the lifetime of EGEE-III and lays out a roadmap for this evolution. 2

Enabling Grids for E-sciencE EGEE-III INFSO-RI Questions What’s this got to do with EGI ? I’ve heard Nagios replaces SAM, what does this mean? –Its uses messaging, what’s that mean? –I’m in a VO, does this affect me ? Will you help me manage my site better? When will this all happen? How can I help ? 3

Enabling Grids for E-sciencE EGEE-III INFSO-RI OAT and EGI OAT is an EGEE-III body –Using EGEE effort to automate (improve ???) operations during the project –Oversee all operational tool development within EGEE SA1 Following EGI visions on upcoming strategy –EGI “subsidiarity principle” –This is a big driver for us – moving processes and tools to regional models  Where possible !!! Provide input to EGI on operational tool development and deployment that is on the roadmap beyond the end of EGEE-III 4

Enabling Grids for E-sciencE EGEE-III INFSO-RI Operational Tools in EGEE-III 5

Enabling Grids for E-sciencE EGEE-III INFSO-RI Current Operational Model Several teams involved –Operations Management (OCC) –Monitoring system operators (SAM) –Grid operators (COD) –Regional Operations Centres (ROC) –First line support teams (ROC) –Resource Centres/sites (RC) –User support team (GGUS) 6

Enabling Grids for E-sciencE EGEE-III INFSO-RI Improving reliability and availability 7

Enabling Grids for E-sciencE EGEE-III INFSO-RI Current operational model (s) 8

Enabling Grids for E-sciencE EGEE-III INFSO-RI Future operational model 9

Enabling Grids for E-sciencE EGEE-III INFSO-RI Multi-level monitoring Based on existing work in CE ROC –Replaces central SAM execution framework with Nagios at ROC and site –Interacts with existing SAM components  Visualization, availability calculation, historical result store –Tied together via a reliable messaging infrastructure –Regional operations dashboard and alarms DB –Link into regional ticketing, e.g. via GGUS Follow new operational model –Raise alarms immediately at the site –1 st level support sees them and can respond if needed –Central COD only involved after 2-3 weeks e.g. site banning Tutorial yesterday with much more details –Full install done of all components at a site in 1.5 hours... 10

Enabling Grids for E-sciencE EGEE-III INFSO-RI Monitoring is multi-level Source# checks / service Type Central1-2Network monitoring, Service ‘Ping’ Regional5-10User-oriented actions (e.g existing SAM tests) Site local10-30Detailed functional tests 11

Enabling Grids for E-sciencE EGEE-III INFSO-RI Messaging Systems Flexible architecture: –Deliver messages, either in point to point (queue)… –… or multicast mode (topics) –Support Synchronous or Asynchronous communication. Reliable delivery of messages: –Provide reliability to the senders if required –Configurable persistency / Master-Slave. Highly Scalable: –Network of Brokers 12

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mature open-source implementation of these ideas –Top-level Apache project –Commercial support available from IONA –Widely-used commodity software Easy to integrate into your code –Multiple language + transport protocol support Good performance characteristics –See later … Work done to integrate into our environment –RPMs, YAIM configuration, monitoring and alarms use Net::Stomp ; my $stomp = Net::Stomp->new ({hostname => 'gridmsg102.cern.ch', port => '6163' }) ; $stomp->connect () ; $stomp->subscribe ( { 'destination' => '/topic/grid.probe.metricOutput', 'ack' => 'client', 'activemq.prefetchSize' => 1}); while(1) { my $frame = $stomp->receive_frame; warn $frame->body ; print $frame->as_string ; $stomp->ack({frame => $frame} ) ; } $stomp->disconnect ; ActiveMQ 13

Enabling Grids for E-sciencE EGEE-III INFSO-RI ActiveMQ 14

Enabling Grids for E-sciencE EGEE-III INFSO-RI Results : Throughput > Consumers > Throughput 15

Enabling Grids for E-sciencE EGEE-III INFSO-RI Vendor tests 16 From “Optimizing FUSE Message Broker” -

Enabling Grids for E-sciencE EGEE-III INFSO-RI Usages of Messaging We use it as an ‘integration bus’ –Use when systems want to share information  E.g VO transfer systems publishing data rates to WLCG It’s another string to our bow  When the application model fits well, then use it E.g. Async communications, broadcast messages Don’t force applications to use it –Have other solutions too  E.g “RESTful” web services a.la SAM Programmatic Interface 17

Enabling Grids for E-sciencE EGEE-III INFSO-RI ‘Standard’ Integration Patterns The same patterns are repeated in many of the following examples: –Gather results at many points –Collect the raw results and store in a database –Perform some operation on the raw results  Summarisation, availability calculation, … –Publish the summarised results to many clients  E.g. site monitoring, dashboards, … –Store historical data in a database and visualize via web client We provide ‘standard’ components to make this plug and play for many workflows 18

Enabling Grids for E-sciencE EGEE-III INFSO-RI VO, ROC, Project & Local monitoring 19

Enabling Grids for E-sciencE EGEE-III INFSO-RI Another application - Usage Reporting Other main part of monitoring – Usage statistics –Gridftp transfers, FTS transfers, job records, … Used to calculate throughput and reliability Currently handled in GridView, Dashboards –Use messaging system to unite these efforts Delegate parsing/routing of specific information back to experts –L&B, FTS, … Other integration examples include –Accounting –GOCDB synchronization 20

Enabling Grids for E-sciencE EGEE-III INFSO-RI Site Management gLite often doesn’t provide enough management tools –Direct feedback from site and service managers Site managers often write tools themselves Strategy defined to get these tools to a wide audience –Lightning talks  5 minute presentations on tools people have developed  Stay for the rest of the session ! –Publicity of tool development  e.g. Via iSGTW  Doubling of visitors to gridmap.cern.ch after publishing an article 21

Enabling Grids for E-sciencE EGEE-III INFSO-RI Deployment support EGEE-SA1 tools project started Policy being defined now –With some ‘early adoptor’ projects  Some of the tools you’ll see in the lightning talks session Facilities –Support in using ETICS –Support in writing YAIM –Yum Repository –Documentation repository Contact us if you’re interested in contributing here 22

Enabling Grids for E-sciencE EGEE-III INFSO-RI Global Roadmap 23 Covers multi-level monitoring Roadmaps for other areas (e.g. accounting) in the process of being defined by individual teams –And co-ordinated by the OAT

Enabling Grids for E-sciencE EGEE-III INFSO-RI Roadmap for tools Milestone ‘Messaging 1’: August 2008 –Production level messaging broker in production. This should have internal failover capabilities, but will not have the WAN failover capabilities of a network of broker Milestone ‘Messaging 2’: December 2008 –A scalable and reliable network of brokers, consisting of a deployment over at least 3 sites is in place Milestone ‘Site Monitoring 1’: September 2008 –A release of the site components for the multi-level monitoring, including packaging and configuration as part of a EGEE middleware release exists and is ready for deployment to the sites. Milestone ‘ROC Monitoring 1’: December 2008 –The ROC components for the multi-site monitoring are ready for deployment to sites. Milestone ‘ROC Monitoring 2’: February 2009 –The alarm component has been integrated with the regionalized dashboard Milestone ‘ROC Monitoring 3’: July 2009 –The regional dashboard is now available to be deployed at the ROCs 24

Enabling Grids for E-sciencE EGEE-III INFSO-RI Roadmap for distributed COD Milestone ‘rCOD 1’: September 2008 –4 ROCs carry out r-COD and 1st line support roles directly. This will be done with a ‘regionalized’ version of the current operations dashboard, and with SAM as the alarm generation system Milestone ‘rCOD 2’: April 2009 –4 additional ROCs carry out r-COD and 1st line support roles using the regionalized dashboard Milestone ‘rCOD 3’: April 2009 – 2 additional ROCs carry out r-COD and 1st line support roles directly using the new multi-level monitoring framework Milestone ‘rCOD 4’: September 2009 –All 11 ROCs carry out r-COD and 1st line support roles directly. The c-COD is fully established Milestone ‘rCOD 5’: December 2009 –All 11 ROCs carry out r-COD and 1st line support roles using the new multi-level monitoring framework 25

Enabling Grids for E-sciencE EGEE-III INFSO-RI OAT and new tools OAT is the body to oversee new tool development –In response to needs of sites, ROC, OCC New projects under investigation –SLA Portal  In response to MSA 1.5 SLA –Metrics portal  In response to MSA 1.3 – Activity QA plan Also new development for multi-level monitoring –Improvement of Nagios probes for services –Re-engineering of existing SAM probes –Re-engineer other existing tools for regional models  SAMAP, Gridview,... –‘Probe description database’  metadata store for probes 26

Enabling Grids for E-sciencE EGEE-III INFSO-RI Take home messages The OAT is trying to provide tools to improve operations –Reduce effort The OAT is a process –We’ve started now –There’s still lots to do Site administrators are needed to contribute –With deploying the tools and giving feedback –With contributing best of breed system management tools –Working on design and development of operational tools Get in touch ! –Talk to an OAT member –Send us mail, join the discussion list –Read the strategy document 27

Enabling Grids for E-sciencE EGEE-III INFSO-RI Contacts Strategy Document : Contact the team : Discuss Mailing List : egee3-operations-automation- – please join !egee3-operations-automation- Documentation Site : share/oat/default.aspx (in development) share/oat/default.aspx List of OAT Members 28 AP – Joanna Huang CE – Emir Imamagic CE – Marcin Radecki CERN – James Casey CERN – John Shade DECH – Angela Poschlad FR – Cyril L'Orphelin FR – Guillaume Cessieux IT – Giuseppe Misurelli NE – Ronald Starink SEE – Antun Balaz SWE – Javier Lopez Cacheiro UKI – Gilles Mathieu