Download presentation
Presentation is loading. Please wait.
Published byAnnabella Ferguson Modified over 8 years ago
1
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin - CNRS/IN2P3 Amsterdam, ROD team Workshop
2
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Context The dashboard has been developed during EGEE I, II and III at IN2P3-CC in the SA1 activity. The aim of this project was to propose on a single interface the maximum of information useful for the daily operations and to ease Operators to follow the procedures. Initially the dashboard was hosted on the CIC Portal – http://cic.gridops.orghttp://cic.gridops.org This portal is migratetd step by step to Operations Portal – https://operations-portal.in2p3.fr https://operations-portal.in2p3.fr The dashboard is the first module we have migrated. Since the beginning of May we are involved in the JRA1 activity of EGI. Our objectives in the coming year : – Migrate other features – Improve the dashboard module (regional helpdesk, ergonomic,...) – Distribute these features in a package – Propose programmatic interfaces (xml, json )
3
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting The dashboard is a tool designed to follow and track problems on sites. This tool is a integration platform and propose a synoptic view of different data sources : Gstat, monitoring tool of the publication done by sites Nagios, system and network monitoring application SAM, framework of job submission (used for VO specific tests ) GOC DB, the DB for the Sites. GGUS, the global ticketing system. BDII, a ldap repository with dynamic information published by sites. In summary you track problem with the different results from Monitoring Tools ( Nagios, Gstat) and you can open and update trouble ticket in GGUS direclty in the dashboard. We use also GOC DB and BDII to consolidate monitoring informations with downtime information and dynamic statuses.
4
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Centrale Instance : Architecture
5
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting The main page
6
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting 1 st Level : the synoptic view Site Name + infos Alarms Ticket Downtimes Global Informations Actions Open a ticket without “alarms” Send a notepad to the site See the graph of alarms or downtimes Refresh informations
7
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting 2 nd Level : Access to details
8
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Other Pages C-COD view (restricted access) A synoptic view of informations related to problematic sites ( alarms older than 72 h, tickets expired, tickets opened since one month ) https://operations-portal.in2p3.fr/index.php/dashboard/ccodView Handover A tool to report or share problems between regional teams or between C-COD team https://operations-portal.in2p3.fr/index.php/dashboard/handover User List Set up your own lists of sites to use in the dashboard. https://operations-portal.in2p3.fr/index.php/dashboard/userPreferences Regional List View regional information ( contact, responsibles) https://operations-portal.in2p3.fr/index.php/dashboard/regionalPreferences GridMap Visualizing the state of your grid with GridMaps How-to // User documentation https://edms.cern.ch/file/1015741/3/dashboardHOWTO.pdf
9
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Nagios Integration
10
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting The “notifications” work-flow The global work-flow is based on the exchange of notifications between Nagios and the Lavoisier WS. The decision to send out notifications is made in the service check and host check logic. * When a hard state change occurs. More information on state types and hard state changes can be found here: o http://nagios.sourceforge.net/docs/2_0/s tatetypes.html o http://nagios.sourceforge.net/docs/2_0/n otifications.html * When a host or service remains in a hard non-OK state and the time specified by the option in the host or service definition has passed since the last notification was sent out. At this point Lavoisier is connected to the broker and the topic corresponding to all notifications. We apply a filter on these notifications : - on the role, the name of the roc/ngi, the hostname => to distinguish the Nagios Box @ cern // nagios box in region - on the test name and the status to keep only tests defined critical. If the notification is passing successfully through the filter, we sent an acknowledgment notification to a specific broker and we register the notification in the DB. If a notification is already registered in the DB for a specified host and specified service we just update its status. The acknowledgment mechanism will permit in case of problem (on the notification system or on our Web Service) to send again the notifications. These different steps could explain some differences on what you're seeing on the Nagios Interface and on the dashboard interface.
11
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Next steps Nagios filtering improvements : – A dynamic filter is in place based on a on-line configuration file given by Nagios Team – Add in this filter the list of the critical tests / ROC – The Problem ID is not taken into account as a primary Key for the notifications.It means that only one record will be active in the same time for a given host and a given test. – An acknowledgment mechanism is in place. It could be used in case of problem between the notification system and Lavoisier. Other Improvements : – On the main site view your default site list will be directly loaded. – The access will no more limited to people registered in GOC DB ( a certificate is enough) – A new alarm might be masked by an assigned one – Optimize the DB to increase performances
12
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Regional package The application has been modified to cope completely with a regional context. A synchronization system is in place to exchange information with the future regional instances. The package will be proposed with 2 modules : – The lavoisier Web Service =>a rpm file (download-able from SVN) – the php part => direct checkout from SVN The package will be released June 8 th : – Czech NGI and Portugal NGI will evaluate the package in a first time – After this the package will be more widely distributed
13
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Regional Package - Synchronization
14
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 COD-16 Transition meeting Links Lavoisier Web Service: http://grid.in2p3.fr/lavoisier/ Operations Portal Documentation, paper, posters http://cic.gridops.org/index.php?section=home&page=generaldoc Tracking System (you can use also GGUS) https://forge.in2p3.fr/projects/show/opsportaluser Dashboard : URL https://operations-portal.in2p3.fr User documentation https://edms.cern.ch/file/1015741/3/dashboardHOWTO.pdf To Contact us : cic-information@in2p3.fr
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.