Monitoring Review. Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW.

Slides:



Advertisements
Similar presentations
Netflow Data-Mining Techniques Chris Poetzel Argonne National Laboratory Scott Pinkerton.
Advertisements

Overview of network monitoring development at AMRES Slavko Gajin.
|ESDS SOFTWARE SOLUTION PVT. LTD.| Enterprise Datacenter Management Suite.
SYSTEM ADMINISTRATION Chapter 19
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
HiVision SNMP Software.
SOE and Application Delivery Gwenael Moreau, Abbotsleigh.
Net Optics Confidential and Proprietary Net Optics appTap Intelligent Access and Monitoring Architecture Solutions.
Module 7: Fundamentals of Administering Windows Server 2008.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
The Professional Open Source™ Company CLI Shell JBossNetwork Enterprise Manager Command Line Interface.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
The ProactiveWatch Monitoring Service. Are These Problems For You? Your business gets disrupted when your IT environment has issues Your employee and.
A Brief Documentation.  Provides basic information about connection, server, and client.
QoS Evaluation Model for a Campus-Wide Network: an alternative approach Juan Antonio Martínez Comunicacions - Servei d’Informàtica.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Microsoft Management Seminar Series SMS 2003 Change Management.
Manchester University Tiny Network Element Monitor (MUTiny NEM) A Network/Systems Management Tool Dave McClenaghan, Manchester Computing George Neisser,
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Get to know SQLDocKit!. Monitoring and administration solutions for SharePoint, Office 365, Windows Servers, Remote Desktop Services, and Citrix admins.
I/Watch™ Weekly Sales Conference Call Presentation (See next slide for dial-in details) Andrew May Technical Product Manager Dax French Product Specialist.
BY: SALMAN 1.
Working with Windows 7 at CERN
Introduction to Operating Systems
Redcell™ Management Essentials, Juniper Networks Enterprise Edition
BY: SALMAN.
System Monitoring with Lemon
OptiView™ XG Network Analysis Tablet
Shared Services with Spotfire
Nectus Click to edit Master title style
PLM, Document and Workflow Management
CO HW Monitoring Architecture
Hands-On Microsoft Windows Server 2008
Computing infrastructure for accelerator controls and security-related aspects BE/CO Day – 22.June.2010 The first part of this talk gives an overview of.
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Introduction to Operating System (OS)
Presented By: #NercompPDO3
© 2002, Cisco Systems, Inc. All rights reserved.
Oracle Solaris Zones Study Purpose Only
ACC Virtualization flash News
Chapter 1: Introduction
Migration Strategies – Business Desktop Deployment (BDD) Overview
WHATAP 제가 와탭을 소개하겠습니다. November 14, 2018.
Outline Overview Development Tools
VMware vRealize® Operations™ Management Pack for Pure Storage
Chapter 8: Monitoring the Network
Get your ETL flow under statistical process control
Training Module Introduction to the TB9100/P25 CG/P25 TAG Customer Service Software (CSS) Describes Release 3.95 for Trunked TB9100 and P25 TAG Release.
Chapter 2: Operating-System Structures
LO2 – Understand Computer Software
Chapter 15: Network Monitoring and Tuning
Software - Operating Systems
Performance And Scalability In Oracle9i And SQL Server 2000
Chapter 2: Operating-System Structures
Features Overview.
T-FLEX DOCs PLM, Document and Workflow Management.
STATEL an easy way to transfer data
Presentation transcript:

Monitoring Review

Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW monitoring Frank: 10 minutes presentation: FESA Brice/Fernando: 10 minutes presentation: MOON Luigi: 10 minutes presentation: Lemon, Xymon, Kibana, Meter, as sysadmin Discussion and conclusions, goods and bad of the actual monitoring solutions. Plans for TODAY

TODAY Our actual monitoring experience and solutions 7/7/2016 at 15:00 in Monitoring system of the future + OP 14/7/2016 at 15:00 in Commercial solutions? 21/7/2016 at 15:00 in Summary conclusions/mandate 2016/7/28 at 14:00 in 104 R A10 (CNIC on monitoring at CERN experiments, IT, BE, Access ) Plans for July – Overview of monitoring

Monitoring/tracing tools we use as sysadmin + some considerations Luigi Gallerani Monitoring review meeting

Monitoring for a sysadmin Classical sysadmin parameters: machine status, ping, disk, cpu, memory, network, open files, locks, filesystem mount, process running, syslog errors, dmesg, versions!… Overview of all the machines status (consoles, servers, blades…) Configuration over systems, group compare hosts with same issues Who / Process are on the machine, doing what --- not really monitored today: Graphics on the screens, and real CONSOLE of the machines Machine dependency, network connections History, fluctuations, configuration Usability of the system, Performance and availability

List of Tools in use today… Diamon Lemon Lemon View (A. Bland) Xymon Kibana ElasticSearch IT Meter Spectrum & MIB & Snmp HpTools Atop, Rsyslog grep Others (A. Bland)

Diamon Monitor almost all the machines in the accelerator sector Designed for OP Easiest way to have an overview of the all infrastructure ok/not ok Ping agent is great Win and Linux and FE ! Very hard to configure (CCDB) and to tune it History playback but quite slow Show all hosts with memory free < 10%… ?

Lemon Monitor almost all the Linux servers BE-CO and virtual machines Designed by IT and we still run at BE Great to show live and statistics over time, superfast 10 years in 2 seconds! Immediate graphs of main parameters, also grouped by clusters! Does not show as diamond the full picture easily, (Alastair has done a image-map based workaround) No window machine monitoring Very easy to use Future of this? IT has abandon it…

Lemon overview By Abl, it takes data and graph from Lemon, than with some imagemagick scripts it shows up the status of all our machines Simple but very effective

Xymon Small tool Monitor almost all the Linux servers BE-CO and virtual machines Designed for sysadmin configured BE-CO Shows many graph and history, host based with cluster/grouping concept It is used mainly by ACC-adm to monitor NFS and other critical servers No windows monitoring

Kibana/ESearch Collects the logs from our machines and… (copy paste from wiki): Kibana displays data from the Elasticsearch backend, which is currently receiving around 2.5 million messages per hour. Elasticsearch The core features of Kibana are 1) fast and easy searches, 2) flat, 2- dimensional visualisations and 3) dashboards. Not usable for realtime monitoring No Windows Machines there From sysadmin, still not seen use case where is faster than |grep

Meter (IT) ugin/kibana/#/dashboard/temp/ AVMYA58vK-VFzBoVlHqg ugin/kibana/#/dashboard/temp/ AVMYA58vK-VFzBoVlHqg Monitor the openstack servers that hosts all BE virtual machines and servers We have no control on it, just read the data and execute some queries Based on Kibana, run in IT Association between servers and virtual machines from lanDB but manual query to find the data

Spectrum MIB Used to monitor the network switches, based on SNMP, see traffic, read packages, ports, mac addresses and do advanced diagnostics. Gives informations about the network that no other tools gives Some tools also developed on SNMP directly to see the HP procurve switches

HP tools Expert proprietary HP tools, mainly used to monitor the hardware, the blades, cpu, network Concept of rack topology, hardware view, status and gives metrics not available from OS Not integrated, not designed to be integrated… but…

Atop Rsyslog Grep Atop: most metric and history collected by Atop, low level but extremely powerful. Some tools implemented by abl Rsyslog Collect all the log in cs-ccr-tracing The fastest way to get problems on multiple machines when we know what to search for with grep

Other tools, A. Bland Generate then display stored RRD network statistics from Blade switches (top right) Display any day of the last month’s atop metrics (right) gives map of CCR routers, network services, Blade enclosures (below)

Conclusion: Why so many tools? No one of the existing tools provides all the functionality, or cover all the os, domain, systems, mainly / different design Huge effort is required for learning /configuring/ tuning each of the tool No integration between tools, we understand it is almost impossible to get No coherent view between different monitoring systems Needs of CUSTOM homemade script solution to easily monitor some parameters Using all the tool together + offline analysis + sysadmin knowledge we can monitor the infrastructure... Diamon Lemon Lemon View (abl) Xymon Kibana ElasticSearch IT Meter Spectrum & MIB HpTools Atop, Rsyslog grep… Others

Conclusion: What we do not have at all User side experience monitoring … (no way to detect issue like “I can’t connect to” situation) Monitor of System Dependency relations and chain, only grouping Human monitoring feedback, humans are excluded completely from all the monitoring tools, not even acknowledge errors Easy Tuning and configuration, auto discovery of new systems, multiview, system aggregation, performance analysis, fluctuation detection, abnormal errors rate detection, artificial intelligence to detect something is wrong locally or globally.

Bonus Slide: What I dream as sysadmin for monitoring A CERN common integrated solution for monitoring, that satisfy all the needs of sysadmin (IT, EN, BE, TE…). A system that records and display automatically all the metrics available per hosts (syslog, snmp, atop, network, diamond, lemon…) and per time and can return all metrics needed very fast A system that tell us where the problem are and has knowledge of dependency, relations, history A system that interacting with our clever colleagues experts and operators, as humans can be parts of the monitoring systems. A coherent system not showing false alarms or bad values, and capable of tracking all modifications…