Download presentation
Presentation is loading. Please wait.
Published byGerard French Modified over 9 years ago
1
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)1 Overview of monitoring tools for Grid Systems Varenna, 12 May 2008 Antonio Pierro INFN-BARI (Italy) Antonio.pierro ba.infn.it
2
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)2 Outlines Overview of EGEE monitoring tools: SAM (Service Availability Monitoring) GridMap GStat (Global Grid Information Monitoring System) GridView GridICE (infrastructure and application monitoring)
3
May 12, 2008 3/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) Resource Utilization and Performance Evaluation Resources observability is needed for an optimized Grid utilization Management Decisions To reduce time spent waiting for Resource Availability Be always aware of what is happening Debugging purposes to help the operations team locate and troubleshoot the problems Grid resources and services are subject to failures Why do we need monitoring?
4
May 12, 2008 4/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) Requirements for a Grid Monitoring tool Scalable Dynamic Robust Should be integrated with other Grid Technologies and middleware (security infrastructure, resource brokers, schedulers,...)
5
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)5 SAM (introduction) Service Availability Monitoring framework (SAM) : Monitoring all grid services and nodes not only CE It is used in the validation process of sites and services SAM wiki : http://goc.grid.sinica.edu.tw/gocwiki/SAMhttp://goc.grid.sinica.edu.tw/gocwiki/SAM SAM portal : https://lcg-sam.cern.ch:8443/sam/sam.pyhttps://lcg-sam.cern.ch:8443/sam/sam.py Service and Site status are recorded (several snapshots per day) Daily, weekly, monthly availability is calculated using integration (averaging) over the given period Official evaluation of T0,T1 and T2 sites.
6
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)6 SAM(performed tests) 1/2 CE job submission - UI->RB->CE->WN chain version of CA certificates installed (on WN!) and software middleware (on WN!) replica management tests-using lcg-utils,default SE defined on WN and a selected “central” SE accessibility of experiments software directory - environment variable, directory existence accessibility of VO tag management tools other tests: R-GMA client check, Apel accounting records SE, SRM storing file from the UI - using lcg-cr command with LFC registration getting file back to the UI - using lcg-cp command removing file - using lcg-del command with LFC de-registration
7
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)7 SAM(performed tests) 2/2 LFC directory listing - using lfc-ls command on /grid creating file entry in /grid/ area FTS checking if FTS is published correctly in the BDII channel listing - using glite-transfer-channel-list command with ChannelManagement service transfer test (in development): Standalone tests GSTAT, RB VO specific tests as well
8
SAM - CE sensor Tests France Region, VO OPS
9
OK: normal status Errror: subject has failed and problem is localized *** Running R-GMA client test on alifarm57.ct.infn.it *** Inserting tuple: ERROR: Could not contact R-GMA server at grid005.ct.infn.it:8443 – (104, 'Connection reset by peer') ERROR: Could not contact R-GMA server at grid005.ct.infn.it:8443 – (104, 'Connection reset by peer') Failed Timeout when executing test CE-sft-rgma after 600 seconds! subject may fail soon
10
May 12, 2008 10/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) It publishes the same data of SAM in a different way Is a simple interactive and user-friendly interface to see the state of Grid Sites or services of the Grid are represented by rectangles of different size and colour allowing two dimensions of data to be visualized simultaneously. This representation of monitoring data requires much less space than conventional sorted tables or bar charts. GridMAP
11
May 12, 2008 11/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) GridMAP GridMap Prototype – visualizing the state of the grid the state of the grid – SAM test Daily availability
12
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)12 GridView 1/2 It is a visualization system for viewing monitoring information Approach: Collections monitoring information from different sources, e.g.: SAM, GridFTP monitor, RB Logs The records of monitoring information are in a central Oracle database at CERN Visualizations of summary data through Web interface Target: Grid operators, Site administrators, VO managers
13
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)13 GridView (web page) 2/2 Statistic of data transfert jobs running service availability
14
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)14 GStat 1/2 GStat is built using Python scripts that generate web based reports used by Grid site administrators to troubleshoot Information System issues or access usage information. GStat scripts are executed periodically to query and collect the information published by each site in the Grid Infrastructure. The information published is then processed by extensible analysis framework that checks for IS failures and errors. Target: Grid operators Site administrators
15
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)15 The main page of GStat shows the overall status and usage statistic for each site. GStat site detailed report GStat site resource status GStat 2/2
16
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)16 EGEE EGEE-SWE RDIG EGEE-SEE Grid.it GILDA CMS ATLAS EUMedGrid EUChinaGrid EUIndiaGrid BalticGrid LIBI BioinfoGRID EELA OMIIBeGrid It is a distributed monitoring tool for Grid systems is evolving in the context of EU-EGEE and many other EU Grid projects fully integrated with the gLite-3.x Middleware Self-configurable collection and presentation just give the URL of the root Grid Information Service (GIS) Installed servers are monitoring Grid resources in the scope of: GridICE: Overview
17
May 12, 2008 17/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) Recent evolution of GridICE lightweight sensor + VOMS information Attributes measured by the Job Monitoring sensor To reduce its intrusiveness in terms of resources consumption: Two daemons running and a probe executed periodically They listen to a set of log files and collect the relevant information Few LRMS commands to retrieve jobs status The status of all jobs is stored in a cache (stateful behaviour)
18
May 12, 2008 18/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) Integration with local monitoring systems (LEMON) Grid monitoring integrated with local monitoring The last server version is very simple to install The client installation may be turned on in the standard middleware LCG installation (no additional operation are needed) The LEMON monitoring system and alarm management are integrated in the new version of the GridICE server The local sensor currently used for farm monitoring can be interfaced with GridICE to collect all the available data The back-end is realized with LEMON Local farm monitoring that are using LEMON can be integrated with GridICE
19
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)19 LRMSinfo The LRMS Info sensor provides aggregated information of the Local Resource Manager System
20
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)20 We focus on the following categories of users: VO manager actual set of resources accessible to VO members: “How many jobs submitted by my users are running or queued?” (with details of the VOMS groups and/or single user) Grid operator all resources under responsibility of a Grid Operator Center (“How many resources are available?”) Site administrator site resources offered to a Grid (“Is there any service down?”) Grid users The status of their jobs on a grid.
21
How do we identify the user/role? May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)21 The users are identified with the digital certificate installed in its browser a valid CA certificate server based on https protocol The new sensor are able to retrieve the VOMS information VOMS information: groups and roles of users submitting the jobs The related role (e.g., site manager, VO manager) can be retrieved by GridICE database.
22
May 12, 2008 22/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) “Standard user ” monitoring (1) User that has no jobs submitted and no role registered
23
May 12, 2008 23/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) “Standard user ” monitoring (2) An authenticated user sees only his/her own jobs
24
May 12, 2008 24/19Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) “Standard user ” monitoring (3) An authenticated user sees only his/her own jobs exit status = 0 => successfully jobs exit status <> 0 =>failure jobs
25
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)25 Grid monitoring from the VO Manager perspectives
26
May 12, 2008 26Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI) Grid monitoring from the Site Manager perspectives
27
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)27 Acronyms and Abbreviations (1): ACL - Access Control List APEL - Accounting Processor for Event Logs API - Application Programming Interface BDII - Berkeley Database Information Index CA - Certificate Authority CE Computing Element: a Grid-enabled computing resource CERN - European Organisation for Nuclear Research GIIS - Grid Index Information Service. MDS index node. Aggragates information dCache - (disk pool management system) DN - Distinguished Name (X.500, LDAP) EGEE - Enabling Grids for E-sciencE FTS - File Transfer Service (EGEE) GARR - Gruppo per l'Armonizzazione delle Reti della Ricerca GGUS - Global Grid User Support GIIS - Grid Information Index Server GILDA - Grid Infn Laboratory for Dissemination Activities GRIS - Grid Resource Information Service. Collects information for MDS. IN2P3 - Institut National de Physique Nucléaire et de Physique des Particules INFN - Istituto Nazionale di Fisica Nucleare (in Italy) ISO - International Standardization Organization JDL - Job Description Language LB - Logging and Bookeeping service LEMON - LHC Era Monitoring
28
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)28 Acronyms and Abbreviations (2): LCG - LHC Computing Grid LDAP - Lightweight Directory Access Protocol LDIF - LDAP Data Interchange Format LDN - Logical Dataset Name LFC - LCG File Catalog LFN - Logical File Name LHC - Large Hadron Collider. Under construction. Hosts CMS, ATLAS, and other experiments. LRMS - Local Resource Management System MDS - Meta Directory Service, or Monitoring and Discovery Service (Globus) MPI - Message Passing Interface (Globus) PhEDEx - Physics Experiment Data Export (CMS) RFIO - Remote File I/O R-GMA - Relational Grid Monitoring Architecture (EGEE). A monitoring system similar to MDS ROC - Regional Operations Centre RLS - Replica Locator Service SE - Storage Element SOAP - Simple Object Access Protocol SRM - Storage Resource Management VO - Virtual Organization, e.g., an experiment VOBOX - VO box VOMRS - Virtual Organization Management Registration Service VOMS - VO Management Service X.509 - (ITU-T standard for Public-key and attribute certificate frameworks)
29
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)29 References SAM http://goc.grid.sinica.edu.tw/gocwiki/SAME_Planning https://lcg-sam.cern.ch:8443/sam/sam.py?sensors=CE®ions= GRIDMAP http://gridmap.cern.ch/gm/ http://cerncourier.com/cws/article/cnl/31986 Gstat http://goc.grid.sinica.edu.tw/gstat/ GridView: Portal: http://gridview.cern.ch/ Portal: http://gridview.cern.ch/ TWiki: https://twiki.cern.ch/twiki/bin/view/LCG/GridView GridICE: http://gridice.forge.cnaf.infn.it/
30
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)30 Conclusions There are several monitoring tools available for the Grid system Which tool do you use? It depends by your role in grid Sometimes you could use more tools at the same time to satisfy your needs
31
May 12, 2008 Overview on monitoring tools for Grid Systems - Antonio Pierro (INFN-BARI)31 Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.