Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring Review. Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW.

Similar presentations


Presentation on theme: "Monitoring Review. Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW."— Presentation transcript:

1 Monitoring Review

2 Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW monitoring Frank: 10 minutes presentation: FESA Brice/Fernando: 10 minutes presentation: MOON Luigi: 10 minutes presentation: Lemon, Xymon, Kibana, Meter, as sysadmin Discussion and conclusions, goods and bad of the actual monitoring solutions. Plans for TODAY

3 TODAY Our actual monitoring experience and solutions 7/7/2016 at 15:00 in 774-2-014 Monitoring system of the future + OP 14/7/2016 at 15:00 in 774-2-014 Commercial solutions? 21/7/2016 at 15:00 in 774-2-014 Summary conclusions/mandate 2016/7/28 at 14:00 in 104 R A10 (CNIC on monitoring at CERN experiments, IT, BE, Access ) https://indico.cern.ch/rooms/booking/CERN/278767/# Plans for July – Overview of monitoring

4

5 Monitoring/tracing tools we use as sysadmin + some considerations Luigi Gallerani Monitoring review meeting

6 Monitoring for a sysadmin Classical sysadmin parameters: machine status, ping, disk, cpu, memory, network, open files, locks, filesystem mount, process running, syslog errors, dmesg, versions!… Overview of all the machines status (consoles, servers, blades…) Configuration over systems, group compare hosts with same issues Who / Process are on the machine, doing what --- not really monitored today: Graphics on the screens, and real CONSOLE of the machines Machine dependency, network connections History, fluctuations, configuration Usability of the system, Performance and availability

7 List of Tools in use today… Diamon Lemon Lemon View (A. Bland) Xymon Kibana ElasticSearch IT Meter Spectrum & MIB & Snmp HpTools Atop, Rsyslog grep Others (A. Bland)

8 Diamon Monitor almost all the machines in the accelerator sector Designed for OP Easiest way to have an overview of the all infrastructure ok/not ok Ping agent is great Win and Linux and FE ! Very hard to configure (CCDB) and to tune it History playback but quite slow Show all hosts with memory free < 10%… ?

9 Lemon http://cs-ccr-inf1.cern.ch/lemon-web/ Monitor almost all the Linux servers BE-CO and virtual machines Designed by IT and we still run at BE Great to show live and statistics over time, superfast 10 years in 2 seconds! Immediate graphs of main parameters, also grouped by clusters! Does not show as diamond the full picture easily, (Alastair has done a image-map based workaround) No window machine monitoring Very easy to use Future of this? IT has abandon it…

10 Lemon overview http://cs-ccr-inf1.cern.ch/animation.gif By Abl, it takes data and graph from Lemon, than with some imagemagick scripts it shows up the status of all our machines Simple but very effective

11 Xymon http://abinf1.cern.ch/xymon/ Small tool Monitor almost all the Linux servers BE-CO and virtual machines Designed for sysadmin configured BE-CO Shows many graph and history, host based with cluster/grouping concept It is used mainly by ACC-adm to monitor NFS and other critical servers No windows monitoring

12 Kibana/ESearch http://abinf1.cern.ch/xymon/ Collects the logs from our machines and… (copy paste from wiki): Kibana displays data from the Elasticsearch backend, which is currently receiving around 2.5 million messages per hour. Elasticsearch The core features of Kibana are 1) fast and easy searches, 2) flat, 2- dimensional visualisations and 3) dashboards. Not usable for realtime monitoring No Windows Machines there From sysadmin, still not seen use case where is faster than |grep

13 Meter (IT) https://meter.cern.ch/public/_pl ugin/kibana/#/dashboard/temp/ AVMYA58vK-VFzBoVlHqg https://meter.cern.ch/public/_pl ugin/kibana/#/dashboard/temp/ AVMYA58vK-VFzBoVlHqg Monitor the openstack servers that hosts all BE virtual machines and servers We have no control on it, just read the data and execute some queries Based on Kibana, run in IT Association between servers and virtual machines from lanDB but manual query to find the data

14 Spectrum MIB Used to monitor the network switches, based on SNMP, see traffic, read packages, ports, mac addresses and do advanced diagnostics. Gives informations about the network that no other tools gives Some tools also developed on SNMP directly to see the HP procurve switches

15 HP tools Expert proprietary HP tools, mainly used to monitor the hardware, the blades, cpu, network Concept of rack topology, hardware view, status and gives metrics not available from OS Not integrated, not designed to be integrated… but…

16 Atop Rsyslog Grep Atop: most metric and history collected by Atop, low level but extremely powerful. Some tools implemented by abl Rsyslog Collect all the log in cs-ccr-tracing The fastest way to get problems on multiple machines when we know what to search for with grep

17 Other tools, A. Bland Generate then display stored RRD network statistics from Blade switches (top right) Display any day of the last month’s atop metrics (right) gives map of CCR routers, network services, Blade enclosures (below)

18 Conclusion: Why so many tools? No one of the existing tools provides all the functionality, or cover all the os, domain, systems, mainly / different design Huge effort is required for learning /configuring/ tuning each of the tool No integration between tools, we understand it is almost impossible to get No coherent view between different monitoring systems Needs of CUSTOM homemade script solution to easily monitor some parameters Using all the tool together + offline analysis + sysadmin knowledge we can monitor the infrastructure... Diamon Lemon Lemon View (abl) Xymon Kibana ElasticSearch IT Meter Spectrum & MIB HpTools Atop, Rsyslog grep… Others

19 Conclusion: What we do not have at all User side experience monitoring … (no way to detect issue like “I can’t connect to” situation) Monitor of System Dependency relations and chain, only grouping Human monitoring feedback, humans are excluded completely from all the monitoring tools, not even acknowledge errors Easy Tuning and configuration, auto discovery of new systems, multiview, system aggregation, performance analysis, fluctuation detection, abnormal errors rate detection, artificial intelligence to detect something is wrong locally or globally.

20 Bonus Slide: What I dream as sysadmin for monitoring A CERN common integrated solution for monitoring, that satisfy all the needs of sysadmin (IT, EN, BE, TE…). A system that records and display automatically all the metrics available per hosts (syslog, snmp, atop, network, diamond, lemon…) and per time and can return all metrics needed very fast A system that tell us where the problem are and has knowledge of dependency, relations, history A system that interacting with our clever colleagues experts and operators, as humans can be parts of the monitoring systems. A coherent system not showing false alarms or bad values, and capable of tracking all modifications…


Download ppt "Monitoring Review. Luigi, Review of mandate, plans for July, info about CNINC Felix: 10 minutes presentation: DIAMON Joel: 10 minutes presentation: CMW."

Similar presentations


Ads by Google