26/05/2004HEPIX, Edinburgh, May 24-281 Lemon Web Monitoring Miroslav Šiket CERN IT/FIO

Slides:



Advertisements
Similar presentations
Capacity Planning for LAMP Architectures John Allspaw Manager, Operations Flickr.com Web Builder 2.0 Las Vegas.
Advertisements

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Performance Testing - Kanwalpreet Singh.
19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report
The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Post-C5 Lemon-web 2.0 Daniel Lenkes and Ivan Fedorko.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Camilo Lara KIP HLT Production Readiness Review 1 HLT Cluster Management.
The CERN Computer Centres October 14 th 2005 CERN.ch.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Institute of Computer Science AGH Performance Monitoring of Java Web Service-based Applications Włodzimierz Funika, Piotr Handzlik Lechosław Trębacz Institute.
Understanding and Managing WebSphere V5
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Module 18 Monitoring SQL Server 2008 R2. Module Overview Monitoring Activity Capturing and Managing Performance Data Analyzing Collected Performance Data.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
1 Network Statistic and Monitoring System Wayne State University Division of Computing and Information Technology Information Technology.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Ch 10 Monitoring NCNU CSIE 林似真 Stella. NCNU CSIE Stella2010/6/82 ganglia.
RRDtool Miroslav Siket FIO-FS /
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
1 Oracle Enterprise Manager Slides from Dominic Gélinas CIS
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006.
EU 2nd Year Review – Feb – WP4 demo – n° 1 WP4 demonstration Fabric Monitoring and Fault Tolerance Sylvain Chapeland Lord Hess.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon for Quattor I.Fedorko CERN CF/IT 16 March 2011.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
April 2003 Iosif Legrand MONitoring Agents using a Large Integrated Services Architecture Iosif Legrand California Institute of Technology.
PPDG February 2002 Iosif Legrand Monitoring systems requirements, Prototype tools and integration with other services Iosif Legrand California Institute.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Lemon monitoring and Lemon Alarm System (sensors, exception, alarm)
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
DBS Monitor and DAN CD Projects Report July 9, 2003.
Retele de senzori EEMon Electrical Energy Monitoring System.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
Lemon Tutorial Quattor and Non-Quattor Configuration of the lemon-agent Miroslav Siket, Dennis Waldron CERN-IT/FIO-FD.
Hepix EDG Fabric Monitoring tutorial – n° 1 Introduction to EDG Fabric Monitoring Sylvain Chapeland.
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
LEMON – Monitoring in the CERN Computer Centre
Miroslav Siket, Dennis Waldron
Presentation transcript:

26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO

26/05/2004HEPIX, Edinburgh, May Outline Concepts Design and architecture Web visualization Deployment Current development

26/05/2004HEPIX, Edinburgh, May Concepts of Monitoring Monitoring information in Computer Centers CERN ~ 2000 computers and ~70 clusters Huge amount of data ~150 metrics per host High demand on organization of the information in easily accessible way and easily to parse Variety of views for different groups of users – sysadmins, users, managers Lemon – tries to do the job by incorporating many relatively new technologies

26/05/2004HEPIX, Edinburgh, May Monitoring information We have generally three types of data: Performance metrics: CPU usage, load averages, memory use, disk use/performance, sockets, network, … Exceptions: High load, swap use over 90%, service down,… Status information: Uptime, boot time, kernel version,… Heartbeat All is gathered with different frequencies from 60s to 1 day/on boot. About 1GB of data a day

26/05/2004HEPIX, Edinburgh, May Lemon Architecture

26/05/2004HEPIX, Edinburgh, May Components (I) MSA (Monitoring Sensor Agent) and MS (Monitoring Sensor) - MS measures data and MSA provides transport to MR MR (Monitoring Repository) with backend to Oracle, MySQL, flat file,… Correlation Engine – framework for creating metric correlations Alarm Broker (prototype) – daemon for handling exceptions and communication between alarm GUI and MR

26/05/2004HEPIX, Edinburgh, May Components (II) Anamon (Analysis of MONitoring information) – java based GUI for real-time visualization of metrics SOAP/WSDL – MR provides Web services extension for any additional users RRD/Apache/PHP framework for easy access to the pre-processed information CDB (Configuration Database) – many components access this information which is part of Quattor framework at CERN

26/05/2004HEPIX, Edinburgh, May RRD Tool Framework RRD (Round Robin Database) Data is organized in time-series files of aging information Supported types – Gauge, Counter, Derive, Absolute Framework for storing measurement averages, min, max, derivatives,… Provides graphing capabilities Provides simple mathematic operation on stored data Data does not expand in size with time Provides export to XML, flat file formats Is widely used by many applications – MRTG, Ganglia, CDF Farm Control, FBSNG WWW

26/05/2004HEPIX, Edinburgh, May Framework Architecture RRD Tool framework is used to store and to manipulate data Data is retrieved from Monitoring Repository by a daemon in 5 min. intervals Data are pre-processed and RRD files are updated Apache/PHP and RRD tools are accessing these files and are creating statistics per host and per cluster In connection with CDB also configuration information is provided JPGraph (PHP) is used to provide access to information in graphical form from the MR that is not available through RRD Framework

26/05/2004HEPIX, Edinburgh, May Cluster information

26/05/2004HEPIX, Edinburgh, May Host information

26/05/2004HEPIX, Edinburgh, May JpGraph and host reboots

26/05/2004HEPIX, Edinburgh, May Scalability Scalability is usually an issue with large scale monitoring frameworks Our framework currently encompasses ~2000 computers at CERN and is scalable to more than computers RRD Tool reduces need to access directly MR (Oracle) and provides cached information Our framework provides support for RRD framework clusters and is expandable – currently uses about 40 most common performance metrics

26/05/2004HEPIX, Edinburgh, May Issues and future work RRD Tool framework does not contain certain features that we have added to it – support for uploading historical data, easy removal and addition of metrics,… Current development: Dynamic configuration of stored data in connection with CDB (configuration DB) Packaging and providing site independent structure Expanding framework for Web displays – on demand correlations, manipulation of cluster configuration,… Summary displays for exception metrics

26/05/2004HEPIX, Edinburgh, May Conclusion The framework is currently in deployment at CERN Already help for sysadmins, developers, experiments in data challenges Framework provides an easy overview of the computing capabilities at our computing center It is alive and is currently being improved to suit user needs, to provide centralized information, to provide more functionality