Monitoring Your Data Center Using Apache and Ganglia Brad Nicholes Sr. Software Engineer, Novell Member Apache Software Foundation

Slides:



Advertisements
Similar presentations
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Advertisements

The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
1 Dynamic DNS. 2 Module - Dynamic DNS ♦ Overview The domain names and IP addresses of hosts and the devices may change for many reasons. This module focuses.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
Mi-Joung choi, Hong-Taek Ju, Hyun-Jun Cha, Sook-Hyang Kim and J
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Lesson 1: Configuring Network Load Balancing
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Installing software on personal computer
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Understanding and Managing WebSphere V5
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
Chapter 13: Sharing Printers on Windows Server 2008 R2 Networks BAI617.
Linux Operations and Administration
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
SEE-GRID-SCI Monitoring Tools
Module 10 Configuring and Managing Storage Technologies.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Module 13: Configuring Availability of Network Resources and Content.
Module 14: Configuring Print Resources and Printing Pools.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Monitoring Your Data Center Using Apache and Ganglia Brad Nicholes Sr. Software Engineer, Novell Member Apache Software Foundation
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Finish configuration cloudclinica root jdbc:postgresql:5432//localhost/cc_db JDBC Url: JDBC Driver: User name: Password: ******** org.postgresql.Driver.
OOI CyberInfrastructure: Technology Overview - Hyrax January 2009 Claudiu Farcas OOI CI Architecture & Design Team UCSD/Calit2.
SUSE Linux Enterprise Desktop Administration Chapter 12 Administer Printing.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Introduction to the Adapter Server Rob Mace June, 2008.
1 Cisco Unified Application Environment Developers Conference 2008© 2008 Cisco Systems, Inc. All rights reserved.Cisco Public Introduction to Etch Scott.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
Module 5: Implementing Printing. Overview Introduction to Printing in the Windows Server 2003 Family Installing and Sharing Printers Managing Access to.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
© 2004 OPNET Technologies, Inc. All rights reserved. OPNET and OPNET product names are trademarks of OPNET Technologies, Inc. ARMing Apache David Carter.
Ch 10 Monitoring NCNU CSIE 林似真 Stella. NCNU CSIE Stella2010/6/82 ganglia.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
Monitoring Your Data Center Using Apache and Ganglia Brad Nicholes Sr. Software Engineer/Consultant, Novell Member Apache Software Foundation
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
ICM – API Server & Forms Gary Ratcliffe.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. BI Publisher Server: Administration and Security.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
Chapter 4: server services. The Complete Guide to Linux System Administration2 Objectives Configure network interfaces using command- line and graphical.
CHAPTER 10: DHCP Routing & Switching. Objectives 10.0 Introduction 10.1 Dynamic Host Configuration Protocol v Dynamic Host Configuration Protocol.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Copyright © 2008 Stanislav Sinyagin 1 Torrus Functional Overview.
SQL Database Management
Lab A: Planning an Installation
LINUX ADMINISTRATION 1
Chapter 2: System Structures
Monitoring HTCondor with Ganglia
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Chapter 2: System Structures
Nate Nelson I*LEVEL, Inc.
Web Servers (IIS and Apache)
Presentation transcript:

Monitoring Your Data Center Using Apache and Ganglia Brad Nicholes Sr. Software Engineer, Novell Member Apache Software Foundation

© Novell Inc. All rights reserved 2 Agenda Ganglia Monitoring Introduction and Overview Ganglia Architecture Gmond Gmetad Web Frontend Deployment Module Development Conclusion

© Novell Inc. All rights reserved 3 Introduction and Overview Scalable Distributed Monitoring System Targeted at monitoring clusters and grids Multicast-based Listen/Announce protocol Depends on open standards – XML – XDR compact portable data transport – RRDTool - Round Robin Database – APR – Apache Portable Runtime – Apache HTTPD Server – PHP based web interface

© Novell Inc. All rights reserved 4 Ganglia Architecture Gmond – Metric gathering agent installed on individual servers Gmetad – Metric aggregation agent installed on one or more specific task oriented servers Apache Web Frontend – Metric presentation and analysis server Attributes – Multicast – All gmond nodes are capable of listening to and reporting on the status of the entire cluster – Failover – Gmetad has the ability to switch which cluster node it polls for metric data – Lightweight and low overhead metric gathering and transport Ported to various different platforms (Linux, FreeBSD, Solaris, others)

© Novell Inc. All rights reserved 5 Ganglia Architecture

© Novell Inc. All rights reserved 6 Gmond – Metric Gathering Agent Built-in metrics – Various CPU, Network I/O, Disk I/O and Memory Extensible – Gmetric – Out-of-process utility capable of invoking command line based metric gathering scripts – Loadable modules capable of gathering multiple metrics or using advanced metric gathering APIs Built on the Apache Portable Runtime – Supports Linux, FreeBSD, Solaris and more…

© Novell Inc. All rights reserved 7 Gmond – Metric Gathering Agent Automatic discovery of nodes – Adding a node does not require configuration file changes – Each node is configured independently – Each node has the ability to listen to and/or talk on the multicast channel – Can be configured for unicast connections if desired – Heartbeat metric determines the up/down status Thread pools – Collection threads – Capable of running specialized functions for gathering metric data – Multicast listeners – Listen for metric data from other nodes in the same cluster – Data export listeners – Listen for client requests for cluster metric data

© Novell Inc. All rights reserved 8 Gmond – Global Configuration daemonize - When “yes”, gmond will daemonize setuid - When “yes”, gmond will set its effective UID to the uid of the user specified by the user attribute debug_level - When set to zero (0), gmond will run normally. Greater than zero, gmond runs in the foreground and outputs debugging information Mute - When “yes”, gmond will not send data deaf - When “yes”, gmond will not receive data host_dmax - When set to zero (0), gmond will not delete a host from its list. If set to a positive number, gmond will flush a host after it has not heard from it for N seconds cleanup_threshold - Minimum about of time before gmond will cleanup expired data gexec - Specify whether gmond will announce the hosts availability to run gexec jobs

© Novell Inc. All rights reserved 9 Gmond – Cluster Configuration name - Specifies the name of the cluster of machines owner - Specifies the administrators of the cluster latlong - Latitude and longitude GPS coordinates of this cluster on earth url - Additional information about the cluster

© Novell Inc. All rights reserved 10 Gmond – Network Configuration Udp_send_channel – mcast_join, mcast_if – Multicast address and interface – host – Unicast host – port – Multicast or Unicast port Udp_recv_channel – mcast_join, mcast_if, port – Multicast address, interface and port – Bind – Bind a particular local address – family – Protocol family Tcp_accept_channel – Bind, port, interface – Bind a particular local address, listen port and interface – Family – Protocol family – timeout – Request timeout

© Novell Inc. All rights reserved 11 Gmond – Configuration Example globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no host_dmax = 0 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no } cluster { name = “My Cluster" owner = “Administrator" latlong = “N37.37 W122.23" url = “ } udp_send_channel { mcast_join = port = 8649 ttl = 1 } udp_recv_channel { mcast_join = port = 8649 bind = } tcp_accept_channel { port = 8649 }

© Novell Inc. All rights reserved 12 Gmond – Access Control Configured in upd_recv_channel or tcp_accept_channel sections Examples: – “Deny all” with exceptions  – “Allow all” with IPv4 & IPv6 exceptions  acl { default = "deny" access { ip = mask = 32 action = "allow" } acl { default = "allow" access { ip = mask = 24 action = "deny" } access { ip = ::ff: mask = 120 action = "deny" }

© Novell Inc. All rights reserved 13 Gmond – Metric Collection Groups Specify as many collection groups as you like Each collection group must contain at least one metric section List available metrics by invoking “gmond -m” Collection_group section: – collect_once – Specifies that the group of static metrics – collect_every – Collection interval (only valid for non-static) – time_threshold – Max data send interval Metric section: – Name – Metric name (see “gmond –m”) – Value_threshold – Metric variance threshold (send if exceeded)

© Novell Inc. All rights reserved 14 Gmond – Configuration Example collection_group { collect_once = yes time_threshold = 20 metric { name = "heartbeat" } collection_group { collect_once = yes time_threshold = 1200 metric { name = "cpu_num" } metric { name = "cpu_speed" } metric { name = "mem_total" } metric { name = "swap_total" } … } collection_group { collect_every = 20 time_threshold = 90 metric { name = "load_one" value_threshold = "1.0" } metric { name = "load_five" value_threshold = "1.0" } … } collection_group { collect_every = 80 time_threshold = 950 metric { name = "proc_run" value_threshold = "1.0" } metric { name = "proc_total" value_threshold = "1.0" }

© Novell Inc. All rights reserved 15 Gmetad – Metric Aggregation Agent Polls a designated cluster node for entire cluster status – Data collection thread per cluster – Ability to poll gmond or another gmetad for metric data Failover capability RRDTool – Storage and trend graphing tool – Defines fixed size databases that hold data of various granularity – Capable of rendering trending graphs from the smallest granularity to the largest (eg. Last hour vs last year) – Never grows larger than the predetermined fixed size – Database granularity is configurable through gmetad.conf

© Novell Inc. All rights reserved 16 Gmetad – Configuration Data source and and failover designations – data_source "my cluster" [polling interval] address1:port addreses2:port... RRD database storage definition – RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374" Access control – trusted_hosts address1 address2 … DN1 DN2 … – all_trusted OFF/on RRD files location – rrd_rootdir "/var/lib/ganglia/rrds" Network – xml_port 8651 – interactive_port 8652

© Novell Inc. All rights reserved 17 Gmond – Configuration Example data_source "my cluster" 10 localhost my.machine.edu: :8655 data_source "my grid" :8655 grid.org:8651 grid- backup.org:8651 data_source "another source" : trusted_hosts my.gmetad.org xml_port 8651 interactive_port 8652 rrd_rootdir "/var/lib/ganglia/rrds"

© Novell Inc. All rights reserved 18 Ganglia Web Frontend Built around Apache HTTPD server using mod_php Uses presentation templates so that the web site “look and feel” can be easily customized Presents an overview of all nodes within a grid vs all nodes in a cluster Ability to drill down into individual nodes Presents both textual and graphical views

© Novell Inc. All rights reserved 19 Ganglia Customized Web Front-end

© Novell Inc. All rights reserved 20 Gmetric Service Level Metrics Utility Extends the available metrics that can be produced through gmond Ability to run specialized metric gathering scripts Pushes metric data back through gmond Must be scheduled through cron rather than gmond Gmetric repository on Ganglia project site –

© Novell Inc. All rights reserved 21 Gmetric Command Line gmetric --conf=./custom.conf -n "wow" -v "it works" -t "string" Usage: gmetric [OPTIONS]... -h, --help Print help and exit -V, --version Print version and exit -c, --conf=STRING The configuration file to use for finding send channels (default=`/etc/gmond.conf') -n, --name=STRING Name of the metric -v, --value=STRING Value of the metric -t, --type=STRING Either string|int8|uint8|int16|uint16|int32|uint32|float|double -u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius (default=`') -s, --slope=STRING Either zero|positive|negative|both (default=`both') -x, --tmax=INT The maximum time in seconds between gmetric calls (default=`60') -d, --dmax=INT The lifetime in seconds of this metric (default=`0')

© Novell Inc. All rights reserved 22 Gmond Pluggable Metric Modules Extends the available metrics that can be gathered by gmond Provided as dynamically loadable modules Configured through the gmond.conf Scheduled through gmond rather than an external scheduler Module development is similar to an Apache module Able to produce multiple metrics from a single module

© Novell Inc. All rights reserved 23 Gmond Module Development Three callback interfaces – Init int (*ex_metric_init)(apr_pool_t *p); – Clean up void (*ex_metric_cleanup)(void); – Metric gathering handler g_val_t (*ex_metric_handler)(int metric_index); Metric definition structure mmodule example_module = { STD_MMODULE_STUFF, // Internal module definition ex_metric_init,// Metric init callback function ex_metric_cleanup,// Metric cleanup callback function ex_metric_info,// Metric info data structure ex_metric_handler,// Metric handler };

© Novell Inc. All rights reserved 24 Gmond Example Module mmodule example_module; static int ex_metric_init(apr_pool_t *p) { srand(time(NULL)%99); return 0; } static void ex_metric_cleanup ( void ) { } static g_val_t ex_metric_handler ( int metric_index ) { g_val_t val; switch (metric_index) { case 0: val.int32 = rand()%99; return val; case 1: val.int32 = 50; return val; } /* default case */ val.int32 = 0; return val; } static const Ganglia_25metric ex_metric_info[] = { {0, "Random_Numbers", 90, GANGLIA_VALUE_UNSIGNED_INT, "s", both", "%u", UDP_HEADER_SIZE+8, "Example module metric (random numbers)"}, {0, "Constant_Number", 90, GANGLIA_VALUE_UNSIGNED_INT, "Num", "zero", "%hu", UDP_HEADER_SIZE+8, "Example module metric(constant number)"}, {0, NULL} }; mmodule example_module = { STD_MMODULE_STUFF, ex_metric_init, ex_metric_cleanup, ex_metric_info, ex_metric_handler, };

© Novell Inc. All rights reserved 25 Gmond Example Module Configuration modules { module { name = "example_module" path = "/usr/lib/ganglia/modexample.so" } /* Define Collection Groups */ collection_group { collect_every = 10 time_threshold = 50 metric { name = "Random_Numbers" value_threshold = 30.0 } collection_group { collect_once = yes time_threshold = 20 metric { name = "Constant_Number" }

© Novell Inc. All rights reserved 26 Gmond Python Module Development Extends the available metrics that can be gather by gmond Configured through the gmond configuration file Python module interface is similar to the C module interface Ability to save state within the script vs. a persistent data store Larger footprint but easier to implement new metrics

© Novell Inc. All rights reserved 27 Gmond Python Module Development Three mandatory functions – metric_init() > Called once at module initialization time > Must return a metric description dictionary or list of dictionaries > Any other module initialization can also take place here – metric_handler() – may have multiple handlers > Metric gathering handler > Must return a single data value of the same type as specified in the metric_init() function – metric_cleanup() > Called once at module termination time > Does not return a value

© Novell Inc. All rights reserved 28 Gmond Python Module Development Metric definition data dictionary – Must be returned from the metric_init() function d = {‘name’: ‘ ’, ‘call_back’:, ‘time_max’: int( ), ‘value_type’: ‘ ’, ‘units’: ’ ’, ‘slope’: ‘ ’, ‘format’: ‘ ’, ‘description’: ‘ ’}

© Novell Inc. All rights reserved 29 Gmond Python Module Development def metric_init(): d = {‘name’: ‘Curve_Metric’, ‘call_back’: curve_handler, ‘time_max': int(60), ‘value_type’: ‘uint’, ‘units’: ‘Seconds’, ‘slope’: ‘both’, ‘format’: ‘%u’, ‘description’: ‘Shows a uniform curve’} return d v = int(1) inc = int(1) count = 0 def curve_handler(name): global v,count,inc v += inc count += 1 if count > 15: count = 0 inc = -inc return int(v) def metric_cleanup(): pass

© Novell Inc. All rights reserved 30 Gmond Python Module Deployment Copy the.py file to a specific directory – The python modules directory is define in the gmond.conf file Start Gmond using the –m paramenter – Shows a list of all available metrics known to Gmond – The python based metric should be in the list Add the new python metric to a collection group just like any other metric Restart Gmond

© Novell Inc. All rights reserved 31 Configuring Gmond for Python Must load the mod_python.so pluggable module Must specify a python module path – The ‘params’ directive specifies the python modules path – Mod_python will automatically load any.py module found in the specified path Recommended to add the collection groups of python based metrics in the same.conf file that loads the python support module modules { module { name = "python_module" path = "/usr/lib/ganglia/modpython.so" params = "/usr/lib/ganglia/python_modules" }

© Novell Inc. All rights reserved 32 Deploying Ganglia Monitoring See Install Gmond on all monitored nodes – Edit the configuration file > Add cluster and host information > Configure network upd_send_channel, udp_recv_channel, tcp_accept_channel > Start gmond Installing Gmetad on an aggregation node – Edit the configuration file > Add data and failover sources > Add grid name > Start gmetad Installing the web frontend – Install Apache httpd server with mod_php – Copy Ganglia web pages and PHP code to appropriate location – Add appropriate authentication configuration for access control

Demo & Questions