HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.

Slides:



Advertisements
Similar presentations
PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
Advertisements

Experiences with GridWay on CRO NGI infrastructure / EGEE User Forum 2009 Experiences with GridWay on CRO NGI infrastructure Emir Imamagic, Srce EGEE User.
TNC 2008 / Short Lived Credential Service Implementation Based on National AAI Short Lived Credential Service Implementation Based on National AAI Emir.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Site Monitoring for Grid Services WLCG Grid.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
R. Lange, M. Giacchini: Monitoring a Control System Using Nagios Monitoring a Control System Using Nagios Ralph Lange, BESSY – Mauro Giacchini, LNL.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
SEE-GRID-SCI Regional Grid Infrastructure: Resource for e-Science Regional eInfrastructure development and results IT’10, Zabljak,
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
INFSO-RI Enabling Grids for E-sciencE Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Overview of Privilege Project at Fermilab (compilation of multiple talks and documents written by various authors) Tanya Levshina.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
The Earth System Grid (ESG) A Fault Monitoring System for ESG Components DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
TCD Site Report Stuart Kenny*, Stephen Childs, Brian Coghlan, Geoff Quigley.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
II EGEE conference Den Haag November, ROC-CIC status in Italy
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New WLCG Grid Service Monitoring Displays.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Use of Nagios in Central European ROC
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
a middleware implementation
Presentation transcript:

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC 2007, Workshop on Grid Monitoring

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Overview  Motivation  Nagios framework  Nagios-based grid monitoring  Architecture  Grid extensions  Statistics  Demo  Contributions to WLCG Grid Service Monitoring WG  Future work  Conclusions

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Motivation  Provide site admin-centric monitoring  simplify grid resources operations  Enable better resource availability  issue notifications as soon as problem appears  Achieve complex sensor’s dependencies  enables problem isolation  only relevant notifications are issued  Visualization & management interface  grid resources status  Report generation  availability, problem history

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Nagios Framework  Open source monitoring framework  widely used & actively developed  Host and service problems detection and recovery  Provides wide set of basic sensors  easy to develop custom sensors  Centralized vs. distributed deployment  High configurability  service dependencies, fine-grained notification options  Web interface  status view, administration

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Nagios-based Grid Monitoring  Monitoring CRO-GRID Infrastructure ( )  Globus Toolkit Pre-WS & WS, UNICORE, other services  active recovery of services  still in production within CRO NGI  Monitoring EGEE resources in Central Europe (CE)  core services since mid 2006  all CE sites for 1st line support since September 2006  centralized deployment - single SRCE 

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Architecture

Grid Extensions  Grid sensors  Security facilities & services CA distribution, Certificate lifetime, MyProxy, VOMS, VOMS Admin  Monitoring & information services R-GMA, BDII, MDS, GridICE  Job management services Globus Gatekeeper, RB, WMS, WMProxy, Job matching  File management services GridFTP, SRM, DPNS, LFC

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Extensions  Sensor hierarchy  Automatic recovery  both local and remote services  security handled with sudo  Certificate based authentication for the web interface  NCG, SAM gatherer, Credential mgmt.

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Statistics  EGEE implementation statistics  69 hosts  570 services actively monitored  1029 services results imported from SAM  Nagios server statistics (last month)

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Demo EGEE implementation web interface

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios

Contributions to WLCG Grid Service Monitoring WG  All sensors rewritten to be compliant with Probe specification  Developed interface to Nagios data compliant with Data exchange format  Nagios-based prototype  several grid extensions used (NCG, credential management, SAM gatherer)

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Future Work  Utilizing our extensions on site level  Distributing monitoring deployment  hierarchy of Nagios servers  Migration of credential management to robot certificates  Further sensor development  Service check execution optimization  active vs. passive checks

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Conclusions  Nagios  highly configurable monitoring framework with notifications, service dependencies, …  simple, programming language-agnostic sensor API  Grid extensions  integration with existing infrastructure (user certificates, VOMS, GOCDB, SAM)  sensors for key grid services  grid  enables sites’ better availability  admins get only relevant notifications

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Thank You! Questions?