EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Service Availability Monitoring – Status.

Slides:



Advertisements
Similar presentations
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC SEE By E. Atanassov,
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Operations Dashboard Workplan Cyril.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Monitoring and enforcement of Service Level Agreements John Shade EGEE-II / EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Operations Automation Team KoM, May ROC VIEW (SWE)‏ Javier Lopez Cacheiro/
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation in EGEE-III What does.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks DSA1.4 – Objectives and Status Ioannis Liabotis.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE GStat Work Plans for EGEE-III Joanna Huang, ASGC/OPS EGEE SA1 F2F Meetings, Abingdon.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Julia Andreeva on behalf of the MND section MND review.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Xavier Jeannin (CNRS/UREC Paris, FR) 24.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
SAM Database and relation with GridView Piotr Nyczyk SAM Review CERN, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations WS: Introduction & Objectives.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 2 : Restructuration of the OPS Manual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks COD-16 (Transition to EGEE-III) Report to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional tools use cases overview Peter Solagna – EGI.eu On behalf of the.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks An insight into GOCDB for ROD Operators.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Introduction to OAT presentations
Evolution of SAM in an enhanced model for monitoring the WLCG grid
March Availability Report for EGEE Sites based on Nagios
Maite Barroso, SA1 activity leader CERN 27th January 2009
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
Site availability Dec. 19 th 2006
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status & Plans John Shade, CERN SA1 coordination meeting, Abingdon

Enabling Grids for E-sciencE EGEE-III INFSO-RI Service Availability Monitoring - Status and Plans 2 Main Objectives Increase grid availability and reliability Provide Site managers with tools that pin-point malfunctioning services – and provide enough information for trouble-shooting Collect statistics on service availability for management purposes (WLCG/EGEE) Continue to provide a flexible framework that allows VOs to integrate their own tests Encourage sites/regions to become more autonomous Don’t break anything along the way!

Enabling Grids for E-sciencE EGEE-III INFSO-RI What needs to be improved? Allow more flexible availability calculations –VO-specific; allow comparisons & consistent recalculations (i.e. maintain topology snapshots) If SAM web-service is down, test results go to the big bit-bucket in the sky –Use of Message Bus buffering will solve this particular problem Sites are often “blind” during central outages –Nagios at the site will address this issue Need to expand coverage of tests –Develop GFAL tests, refine LFC tests, etc. Service Availability Monitoring - Status and Plans 3

Enabling Grids for E-sciencE EGEE-III INFSO-RI Architecture – Regional Level Service Availability Monitoring - Status and Plans 4

Enabling Grids for E-sciencE EGEE-III INFSO-RI Architecture – phased approach Service Availability Monitoring - Status and Plans 5 Site Nagios runs more or less stand-alone Use NDO Utils to populate regional MySQL DB Use Nagios Checker Firefox plug-in Provide central DB for collecting availability metrics Run a large Nagios instance(s) that emulates SAM Send VM images of above to regions

Enabling Grids for E-sciencE EGEE-III INFSO-RI Nagios Checker Service Availability Monitoring - Status and Plans 6

Enabling Grids for E-sciencE EGEE-III INFSO-RI Architecture of the regional solution Use Nagios to occasionally probe sites from ROC Have a standard set of components inside the region for: –Storing topology of regional grid –Storing metric results from probes –Raising alarms –Raising tickets –Viewing metric history and details for debugging Central data stores and components for project-level systems –Project-level metric store –Topology Database with history Service Availability Monitoring - Status and Plans 7

Enabling Grids for E-sciencE EGEE-III INFSO-RI SAM Work Plan December 2008 –SAM ready to use Message Bus  SAM client on UI and WNs uses msg-publisher  Msg-consume2oracle used to retrieve messages from a topic and insert in SAM DB  Everything tested in Validation –Have Nagios equivalents of current SAM sensors used for WLCG/EGEE Availability calculations  SRMv2; CE & CREAM CE; sBDII  NCG templates available Service Availability Monitoring - Status and Plans 8

Enabling Grids for E-sciencE EGEE-III INFSO-RI January 2009 Service Availability Monitoring - Status and Plans 9

Enabling Grids for E-sciencE EGEE-III INFSO-RI Future plans Now we need a schema for the new metric store –Possibly an extension of the NDO schema Need to publish Nagios test results to new store and compare contents with current SAM DB –Use availability calculations to do this, rather than a portal Need tools to view test results and generate alarms –GridView to view metrics collected at project level –Use NDO add-ons to visualize contents of regional MySQL DB, or possibly build a custom portal? –Use built-in escalation functionality of Nagios for sending alarms to MsgBus, from where “alarm-DB” intelligence can feed GGUS –Use Firefox plug-in to view regional alerts from ROC Nagios Service Availability Monitoring - Status and Plans

Enabling Grids for E-sciencE EGEE-III INFSO-RI …and Timelines April 2009: – 1 st implementation of new metric store, new topology database and Probe Description Database –Design of new SAM portal completed June 2009: –switch Regional Nagios metric publication to new metric store –Adapt submission framework to use new topology database –SAM portal layered on new metric store –3 rd party tool selected/customized for viewing NDO database This is ambitious! –VOs will need some hand-holding to migrate their tests –We rely on a few key individuals who have other engagements –We have an existing production service to maintain! Service Availability Monitoring - Status and Plans 11