INDIANAUNIVERSITYINDIANAUNIVERSITY Grid Monitoring from a GOC perspective John Hicks HPCC Engineer Indiana University October 27, 2002 Internet2 Fall Members.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

1 Network Monitoring with Nagios Asian Internet Interconnection Initiatives Project Yan Adikusuma Nara Institute of Science and Technology
Overview of network monitoring development at AMRES Slavko Gajin.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Grid Monitoring Discussion Dantong Yu BNL. Overview Goal Concept Types of sensors User Scenarios Architecture Near term project Discuss topics.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
1 Monitoring Grid Services Yin Chen June 2003.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
MCTS Guide to Configuring Microsoft Windows Server 2008 Active Directory Chapter 3: Introducing Active Directory.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Administering Windows 7 Lesson 11. Objectives Troubleshoot Windows 7 Use remote access technologies Troubleshoot installation and startup issues Understand.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
INDIANAUNIVERSITYINDIANAUNIVERSITY TransPAC2 Security John Hicks TransPAC2 Indiana University 22nd APAN Conference – Singapore 20-July-2006.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
Keeping Network Monitoring Current using Automated Nagios Configurations (WIP) Greg Wickham APAN July 2005.
INFN-GRID Testbed Monitoring System Roberto Barbera Paolo Lo Re Giuseppe Sava Gennaro Tortone.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
Network Monitoring Manage your business without blowing your budget. Learn how the Calhoun ISD utilizes free “Open Source” tools for real-time monitoring.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Copyright © cs-tutorial.com. Overview Introduction Architecture Implementation Evaluation.
VO-Ganglia Grid Simulator Catalin Dumitrescu, Mike Wilde, Ian Foster Computer Science Department The University of Chicago.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Server Performance, Scaling, Reliability and Configuration Norman White.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
Scalable Grid system– VDHA_Grid: an e-Science Grid with virtual and dynamic hierarchical architecture Huang Lican College of Computer.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
Introduction to Grid Computing and its components.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
LHCONE Monitoring Thoughts June 14 th, LHCOPN/LHCONE Meeting Jason Zurawski – Research Liaison.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
1 Grid2003 Monitoring, Metrics, and Grid Cataloging System Leigh GRUNDHOEFER, Robert QUICK, John HICKS (Indiana University) Robert GARDNER, Marco MAMBELLI,
Monitoring Windows Server 2012
Operations Interfaces and Interactions
INFNGRID Monitoring Group report
Monitoring HTCondor with Ganglia
a VO-oriented perspective
Oracle Solaris Zones Study Purpose Only
Chapter 16: Distributed System Structures
The Globus Toolkit™: Information Services
CLUSTER COMPUTING.
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

INDIANAUNIVERSITYINDIANAUNIVERSITY Grid Monitoring from a GOC perspective John Hicks HPCC Engineer Indiana University October 27, 2002 Internet2 Fall Members meeting, HENP Working Group – Los Angeles

INDIANAUNIVERSITYINDIANAUNIVERSITY This presentation is concerned with work being done for the iVDGL/iGOC demonstration at SuperComputing Identifying the issues NOC vs. iGOC Getting information GOC tools Overview Web site:

INDIANAUNIVERSITYINDIANAUNIVERSITY What role should the GOC play in grid monitoring? Should the GOC just collect and publish general information about the grid status? Should the GOC collect information for trouble shooting problems? Should the GOC try to direct traffic and identify potential problems analogous to an air traffic controller (suggested by Saul Youssef, Boston University) Identifying the issues

INDIANAUNIVERSITYINDIANAUNIVERSITY What are some of the potential problems the GOC can help solve? Resource status and availability. Computational node Storage node Network Services (MDS) Resource availability can be determined with something as simple as a ping. Resource status depends on the measurement criteria. What is the machines current load? How much disk space is available? What is the measured network throughput between nodes? Are LDAP services available on this machine? Identifying the issues (cont.)

INDIANAUNIVERSITYINDIANAUNIVERSITY What information does the GOC need to help solve problems? What data needs to be gathered? Grid centric (MDS). OS centric (Ganglia, Nagios). Network centric (SNMP, other network monitoring tools). What is the data and acquisition frequency? Static (total number of nodes in a cluster). Dynamic but infrequent (number of available nodes). Dynamic and frequent (jobs running on a cluster). Realtime (available network bandwidth). Identifying the issues (cont.)

INDIANAUNIVERSITYINDIANAUNIVERSITY The Global NOC provides first level support for network related problems typically over networks within their domain of control. The iGOC should provide first level support for network, facility, and, infrastructure related problem not necessarily with their domain of control. The Global NOC has network engineers on staff. As far as I know, there is no such thing as a grid engineer. NOC performance monitoring usually has a demarcation point (i.e. wall jack, edge device, etc.) within a homogeneous network. GOC performance monitoring must measure end to end performance in a heterogeneous network and end node environment. The GOC must use the NOC as a resource for solving problems. NOC vs. iGOC

INDIANAUNIVERSITYINDIANAUNIVERSITY A key component of a successful GOC is accurate contact information. In order to solve problems or monitor resources you have to know who to talk to. We are currently collecting the following contact information from each site on the grid. High Performance Computing (HPC) contact. Principle Investigator (PI). Network person or local NOC contact. Security. Storage. System administrator. Getting information

INDIANAUNIVERSITYINDIANAUNIVERSITY We are using and developing the following tools to meet the GOC monitoring requirements. Nagios Ganglia LDAP tools GOC and other tools GOC tools

INDIANAUNIVERSITYINDIANAUNIVERSITY Nagios® is a host and service monitor designed to inform you of network problems and end system problems. Nagios provides simple ping availability of resources on the network. Nagios works with a set of “plugins” to provide local and remote host service status. Custom “plugins” are relatively easy to develop. Different methods are provided for remote resource discovery. Nagios is freely available from Nagios

INDIANAUNIVERSITYINDIANAUNIVERSITY Currently using the following built-in Nagios plugins: check_users check_load check_disk check_procs check_mem Current Nagios plugin development: check_nagios (see if a remote Nagios is running). check_aggregate (summarize and collect the status of a group of services). Nagios

INDIANAUNIVERSITYINDIANAUNIVERSITY There are different ways Nagios can get information from plugins. nrpep (perl version of nrpe). check_by_ssh (passive). check_by_ssh (active). Nagios remote plugin execution (perl). Easy to use once setup. uses MD5 and TripleDES. Scales reasonably well for large number of hosts. Must have remote root access to setup. Nagios

INDIANAUNIVERSITYINDIANAUNIVERSITY check_by_ssh (passive). Easy to use once setup. sshd already running most places. Requires crontab entry to push data to the server. Scales reasonably well for large number of hosts. check_by_ssh (active). Easy to use once setup. sshd already running most places. Does not scale well for large number of hosts. Nagios

INDIANAUNIVERSITYINDIANAUNIVERSITY Current iVDGL Nagios implementation for SuperComputing demo consists of star topology. One Nagios server. Using check_by_ssh (passive). Does not scale well. Quick and dirty demo installation.tp://datatag-nagios.pi.infn.ittp://datatag-nagios.pi.infn.it Proposed persistent GOC Nagios infratructure. Run a Nagios server at the gatekeeper of each cluster. Gatekeeper Nagios only responsible for local site. Aggregate summary information and send to regional Nagios server. GOC maintains Meta Nagios with grid health status. Nagios

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY Ganglia provides a complete pseudo real-time monitoring and execution environment. Ganglia provides a mechanism that you can not only link nodes of a cluster but an entire cluster to another cluster. Ganglia Monitoring Daemon (Gmond) is a multithreaded daemon that runs on each node that you want to monitor. Ganglia Meta Daemon (gmetad) allows you to monitor clusters. The Ganglia web front end uses PHP and RRDTool. Ganglia is freely available at Ganglia

INDIANAUNIVERSITYINDIANAUNIVERSITY Ganglia has been modified to provide VO – centric reporting. Standard Ganglia does not provide layered reporting. VO – centric Ganglia has the following features: Monitoring of host resources (processor load, memory load, disk load, etc.) Simple plugin design that allows users to easily develop their own service checks (included from the standard version) Grid and VO related sensors Publishing/Retrieving summary information to third parties Optional SSL-enabled communication (meta-daemons and web-interface) MDS interface for collecting list of reporting nodes Optional web interface for viewing current network status, notification and problem history, log file, etc. Interface with Nagios(TM)Nagios(TM) Developed by Catalin Lucian, – University of Chicago ( Ganglia

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY

INDIANAUNIVERSITYINDIANAUNIVERSITY Grid centric information can be obtained from the MDS. There are a couple of good LDAP web interfaces. LDAPExplorer, John’s LDAP Web interface, There are a number of Perl modules for LDAP, The key to extracting information is understanding the schema. Find out who is responsible for the schema and take an active role in its development. Always built dynamic search queries tools. Learn to use ldapsearch and grid-info-search. LDAP tools

INDIANAUNIVERSITYINDIANAUNIVERSITY GOC staff are being presented with a new set of challenges. New tools are being developed to meet these challenges. A combination of new and old tools is required to monitor and troubleshoot grid issues. Future GOC staff and “Grid Engineers” will need a broad skill set in order to be affective. There are many other grid and cluster monitoring packages: MonaLisa, GOSSIP, Gridview, etc.. There are many network monitoring packages. MRTG SNAPP and other RRDTool collectors. Netflow tools. Weather Map software. OCxMON. Pinger. GOC and other tools

INDIANAUNIVERSITYINDIANAUNIVERSITY Questions and discussion John Hicks Indiana University