Monitoring Working Group Update Grid Deployment Board 5 th December, 2007. CERN Ian Neilson.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Site Monitoring for Grid Services WLCG Grid.
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
23 January 2007WLCG workshop, CERN System Management Working Group Alessandra Forti WLCG workshop CERN, 23 January 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
Area Coordinator Report for Operations Rob Quick 4/10/2008.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
1 Models for Monitoring James Casey, CERN WLCG Service Reliability Workshop 27th November, 2007.
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
INFSO-RI Enabling Grids for E-sciencE GOCDB Requirements John Gordon, STFC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
INFSO-RI Enabling Grids for E-sciencE GOCDB2 Matt Thorpe / Philippa Strange RAL, UK.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
CERN IT Department CH-1211 Geneva 23 Switzerland t Michel Jouvin (GRIF/LAL) on behalf of James Casey (CERN) (All materials from J. Casey)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New WLCG Grid Service Monitoring Displays.
Monitoring BOF, 23 rd Jan 2007 Grid Service Monitoring Working Group Monitoring WG BOF, January 2007 James Casey/Ian Neilson.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Security Monitoring in a Nagios world
Grid Service Monitoring Working Group
Maite Barroso, SA1 activity leader CERN 27th January 2009
Cristina del Cano Novales STFC - RAL
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Presentation transcript:

Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson

Grid Services Grid sensors Transport Metric Repositories Views ……. WLCG Monitoring Working Groups 3 groups created by Ian Bird – “….to help improve the reliability of the grid infrastructure….” – “…. provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service. …” System Management Fabric management Best Practices Security ……. System Analysis Application monitoring ……

Monitoring Working Group WLCG Grid Services Monitoring WG – Principles: Integrate, don’t re-invent Connect the islands Simple specifications Build on existing fabric monitoring Bring diagnosis close to solution – Initial focus on the site grid services “Make the site admins happy” – Ian Bird

Monitoring Working Group Activity & Dissemination – Initial BOF enthusiasm led to active participation: SAM CERN - central monitoring EGEE CE Region - Nagios expertise and implementation OSG GOC – interoperable probes LEMON team – architecture sanity checks EDS/Openlab – high-level visualisation & messaging – Many presentations WLCG, CHEP, EGEE’07, GridCamp …. Overall 25+ WG telecon/meetings scheduled in indico Sys. Analysis WG series of experiment-focused meetings

Site Grid Service Monitoring Nagios (widely used open-source fabric monitoring package) – Based on existing CE EGEE ROC Nagios – Integrates (optionally) latest centrally-run SAM status locally-run service checks external network service ‘ping’ from ENOC – Alerts directly into LOCAL fabric monitoring system All the Nagios features – flexible alarm ( , sms…) – dependencies and groupings – + grid security model based around myproxy – Integrated ‘publisher’ exposes ‘local’ status Credit: Emir SAM

Nagios grid services monitoring

Prototype deployed/tested at sites – CERN PPS, NIKHEF, FZK, LIP, SARA,.. – Packaged as rpms for easy installation – Repository hosted by Sys. Management WG. Positive, constructive feedback Not difficult to setup Very useful! Jeff: “For me it's already worth the investment” Not yet “production quality”, but close Feedback issues addressed in latest release – Modular configuration – Dependency on gLite-UI – Documentation Ronald Starink, EGEE’07

Nagios grid services monitoring Near-term activity – with 2 nd release, move out of “prototype” – increase scope of the local checks (on-box) – ‘standardize’ metric publication Need to – ‘encourage’ deployment – how? components should be reusable – clarify role of central, regional and site monitoring

GridMap Visualization Visualize the Grid by using Treemaps – (Grid + Treemap = GridMap) Colour of rectangle is e.g. - SAM status of site / service - Availability of site / service -... okdegradeddown

Trends Trends can be understood by looking at a sequence of GridMaps 25 Sep Sep Sep 2007 Site Availability over time 22 Sep Sep Sep 2007

GridMap Architecture Grid sites existing monitoring system(s) GridMap Server Web Browser Title view1 view2 view3 GridMap ViewGridMap Server - Browser based Web 2.0 type client component - single interactive and responsive web page (no page reloads required, data is retrieved in the background) - fast switching between views possible - details of the site/service statuses are shown as a context sensitive Tooltip - POC implementation is based on HTML, lightweight JavaScript libraries, AJAX type communication pattern - provides client side code and client supporting services - implements GridMap Layout Algorithm - retrieves and caches data from existing monitoring systems - POC implementation is based on Apache / Python

Grid Service Monitoring Summary: – WG’s have focused monitoring activity – Useful deliverables close to release Site-based grid service monitoring High level visualisation tool – Many activity threads not mentioned here Now: – WLCG Service Reliability Workshop and GDB – Gaps, architecture and plan for coming year.

Monitoring