EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks New WLCG Grid Service Monitoring Displays.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Site Monitoring for Grid Services WLCG Grid.
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations Ian Bird, CERN IT/GD LHCC.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
SEE-GRID-SCI Regional Grid Infrastructure: Resource for e-Science Regional eInfrastructure development and results IT’10, Zabljak,
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
1 1 Service Composition for LHC Computing Grid Monitoring Beob Kyun Kim e-Science Division, KISTI
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PPS All sites Meeting: Introduction & Agenda.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Visualization Ideas for Management Dashboards
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
1 Models for Monitoring James Casey, CERN WLCG Service Reliability Workshop 27th November, 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Use of Nagios in Central European ROC
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Grid Service Monitoring Working Group
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New WLCG Grid Service Monitoring Displays James Casey, CERN IT-GD HEPIX, November 2007

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 2 Overview Service Monitoring in WLCG Site Service Monitoring –Nagios Central Monitoring –GridMap Future work

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 3 Grid Services Grid sensors Transport Metric Repositories Views ……. WLCG Monitoring Working Groups 3 groups created by Ian Bird, Oct’06 –“….to help improve the reliability of the grid infrastructure….” –“…. provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service. …” –“… stakeholder are site administrators, grid service managers and operations, VOs, Grid Project management” System Management Fabric management Best Practices Security ……. System Analysis Application monitoring ……

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 4 Monitoring Grid Monitoring Control Presentation measurement instrumentation - active, passive, collection intervals, alarms appropriate metrics - directly relevant to user experience - clearly defined and understood manual decision making Sensors/Agents  Transport  Repositories automated decision making real-time  historical accuracy and credibility data collection points - system element  service Views You can’t manage what you don’t measure... Slide by Max Böhm, EDS

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 5 WLCG Grid Monitoring Landscape local resources Grid Middleware Grid Applications central services site services site Local monitoring Lemon/SLS Nagios Ganglia... GStat SAM/GridView GridICE GridPP Real Time Monitor... Experiment Dashboards... Grid Services monitoring Application monitoring DomainMonitoring Tools in use Slide by Max Böhm, EDS

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 6 Grid Monitoring Landscape View BDII CESE RB Info System Experiment/VO ATLAS GOCDB site registry Central Services GStat GridICE SAM Grid View html site status + graphs Exp. Dashb. LFC CPUsTBs batch Site Services Grid Services Fabric Resources App Layer Experiment/VO... Ganga/ Panda Apps RTM HTTP/XML pull LB real time 3D job view job state AtlasProdDB VO jobs, data, site reliability data transfer, job status, service availability GOCDB, BDII GOCDB, extBDII DB access HTTP/SOAP push LDAP Experiment/VO... HTTP/XML BDII + fabric/job infos sites LEMON one per experiment File Catalog Resource Broker HTTP/XML push agents RGMA RGMA, RGMA, MonALISA MonALISA HTTP/XML pull submit test jobs fabric infos (other monitoring tools) RGMA FTS results

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 7 High-level Model See for detailshttps://twiki.cern.ch/twiki/pub/LCG/GridServiceMonitoringInfo/0702-WLCG_Monitoring_for_Managers.pdf LEMON Nagios SAM R-GMA SAME GridView Experiment Dashboard GridIce HTTP LDAP GOCDB Dashboard GridView GridMap

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 8 Grid Site Monitoring Principles Provide an easily extensible site monitoring system –Or be able to plug grid features into existing site monitoring Should be able to provide (or augment) alarms at the site for the grid services Don’t force a solution on the site administrators –Should work with any fabric monitoring system that provides basic functionality Provide the specific plugins to deal with the Grid –Probes that work for Grid Services Enable export of the data from the site into standard grid monitoring systems e.g. SAM, GridView, GridICE,… –Avoid duplicate running of probes

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 9 Purpose Bring in data from existing monitoring systems inside the site monitoring tools –Service Availability Monitoring (SAM) –Network performance monitoring (NPM) –Experiment site blacklists (FCR tool) –Experiment dashboards, … Decided to create a prototype based on Nagios –Due to existing take-up of Nagios in the community Second stage will be integrate with LEMON –As next most common solution –Based on questionnaire to community

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 10 Nagios Open source monitoring system Widely used & actively developed Host and service problems detection and recovery Provides set of basic plugins (sensors) –easy to develop custom sensors No components required on monitored entities

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 11 Architecture … Site nodes Site BDII CESELFC MyProxy Refresh proxy Get VOMS proxy Service checks Get remote results Probe descriptions … Get site’s & nodes information Get nodes information Live node checks Get Nagios results Site admins Get site status Issue alarms Monitoring server

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 12 Grid Extensions Standard probes –provided by SRCE, CERN, OSG –Security facilities & services  CA distribution, Certificate lifetime, MyProxy –Monitoring & information services  R-GMA, BDII, MDS, GridICE –Job management services  Globus Gatekeeper, RB, WMS, WMProxy, Job matching –Data management services  GridFTP, SRM, DPNS, LFC, FTS Remote gatherers –SAM & NPM Nagios Config Generator (NCG), Publisher, Credential management

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 13 Standard Components Probe wrapper –enables integration of standardized probes  One probe can run in Nagios, LEMON, SAM, … –Grid Monitoring Probes Specification – ificationhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpec ification Publisher & remote gatherers –integration with other tools  Existing tools can just consume the data. E.g SAM, GridView, Dashboards… –Grid Monitoring Data Exchange Standard – ngeStandardhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExcha ngeStandard Comments, contributions & probes welcome!

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 14

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 15

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 16

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 17 SAM Standard probes NPM

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 18

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 19 Current Status Three sets of standard probes integrated –SRCE, CERN, OSG RPMs in apt and yum repository – Installation documentation on twiki – Mailing list for community support of sites Will appear in upcoming gLite releases as packaged software Will be bundled with “follow-up” documentation to help site admins understand what went wrong on probe failure New (early-access) volunteers welcome!

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 20 New visualizations for the Grid ? Grid monitoring data is complex! –And there are many sites… Current tools visualize data by sorted tables, bar charts, etc. Difficult to present an easy to understand top-level view which provides –quick, action oriented oversight and insight –help understand job failures and availability patterns Can new visualizations help?

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 21 GridMap Visualization Idea –visualize the Grid by using Treemaps (Grid + Treemap = GridMap) Example GridMap site regions Size of rectangle is e.g. - size of site (#CPUs) - #running jobs -...

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 22 GridMap Visualization Idea –visualize the Grid by using Treemaps (Grid + Treemap = GridMap) Example GridMap Colour of rectangle is e.g. - SAM status of site / service - Availability of site / service -... okdegradeddown

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 23 Multiple Views GridMaps can be used for top-level, geographical and VO views VO Views cross-location Top-level View Geographical Views Federation, Partner, Site, etc. Next level of GridMaps Large-scale Federated Grid Services Infrastructure Global GridMap Application Domain GridMap Local GridMap Alert Corrective action effect

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 24 Trends Trends can be understood by looking at a sequence of GridMaps 25 Sep Sep Sep 2007 Site Availability over time: 22 Sep Sep Sep 2007

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 25 More Views Correlations of metrics can be discovered by switching between different views LHCbCMSAtlasAliceOPS Site Availability from different VO perspectives: site BDIISRMSECEOverall Site Status of different Site Services: sites without colour do not support the VO

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 26 GridMap Prototype Architecture Grid sites existing monitoring system(s) GridMap Server Web Browser Title view1 view2 view3 GridMap ViewGridMap Server - Browser based Web 2.0 type client component - single interactive and responsive web page (no page reloads required, data is retrieved in the background) - fast switching between views possible - details of the site/service statuses are shown as a context sensitive Tooltip - POC implementation is based on HTML, lightweight JavaScript libraries, AJAX type communication pattern - provides client side code and client supporting services - implements GridMap Layout Algorithm - retrieves and caches data from existing monitoring systems - POC implementation is based on Apache / Python

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 27 GridMap Prototype View Component Metric selection for colour of rectangles Show SAM status Show GridView availability data Grid topology view (grouping) Metric selection for size of rectangles VO selection Overall Site or Site Service selection Link: Drilldown into region by clicking on the title Context sensitive information Colour Key Description of current view

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 28 GridMap Prototype: Link to Existing Tools Clicking on a site opens a page with details in GridView/SAM Site Detail Availability SAM Test Results

Enabling Grids for E-sciencE EGEE-II INFSO-RI Nov 8th 2007/ HEPIX / New WLCG Grid Service Monitoring Displays 29 Conclusions To improve reliability we need to: 1.Provide more information to site administrators –That relate to what users actually see when using their site  A lot of data already gathered, so if possible don’t do it again –Need to get it into the fabric monitoring system already used at a site –Nagios-based prototype validating the approach  Good feedback form early adoptors 2.Improve the visualization –Too much data - especially for central monitoring (~250 sites) –New techniques help to compress information and bring useful information into view (guest:guest)