DIAMON. What is DIAMON ? Technology stack Current Situation & Plans.

Slides:



Advertisements
Similar presentations
ONE STOP THE TOTAL SERVICE SOLUTION FOR REMOTE DEVICE MANAGMENT.
Advertisements

ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
Management Framework for Amazon EC2 Speaker: Frank Bitzer
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
 M.A - BIS Workshop – 4th of February 2015 BIS software layers at CERN Maxime Audrain BIS workshop for CERN and ESS, 3-4 of February 2015 On behalf of.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Network Management with JMX Thu Nguyen Oliver Argente CS158B.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
Customized cloud platform for computing on your terms !
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Inventory:OCSNG + GLPI Monitoring: Zenoss 3
TrueSight vs Nagios & Foglight
Module 7: Fundamentals of Administering Windows Server 2008.
Overview of MSS System Human Actors Non-Human Actors In-house developed components Third party products.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
Service Charging Platform. EMS (Entity Management System) 0 Logging Agent Provides detailed activity logs and reports all raw facts as they happen to.
Industrial Control Engineering Session 1 Introduction  What is RADE  Technology  Palette  Tools  Template  Combined Example  How to get RADE 
Arizona SharePoint Professionals Group.
Clinical Data Exchange using HL7 and Mirth Connect Lecture 2 - Toolset to use for working with Mirth Connect. - Mirth Connect architecture. - Changing.
Azure-Powered Solution to Monitor, Manage and Backup Servers and Websites Lets Users Keep Control of IT Infrastructure and Prevent Downtime MICROSOFT AZURE.
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
Metrics data published Via different methods Monitoring Server
Platform as a Service (PaaS)
Progress Apama Fundamentals
Connected Infrastructure
WP4 meeting Heidelberg - Sept 26, 2003 Jan van Eldik - CERN IT/FIO
Platform as a Service (PaaS)
Apache Ignite Data Grid Research Corey Pentasuglia.
Building GoDaddy.com’s Compute Cloud
TrueSight Operations Management 11.0 Architecture
Platform as a Service (PaaS)
Pilot Watcher Product Overview V5.3
OVirt Data Warehouse 02/11/11 Yaniv Dary BI Software Engineer, Red Hat.
Connected Maintenance Solution
Shared Services with Spotfire
CO HW Monitoring Architecture
Control and Data Acquisition System for VEST at SNU
GFA Controls IT Alain Bertrand
Connected Maintenance Solution
Platform as a Service.
Connected Infrastructure
OPNFV: Support for HA Guest APIs: Introduction
FESA evolution and the vision for Front-End Software
Introduction to J2EE Architecture
#01 Client/Server Computing
Chapter 3: Windows7 Part 4.
Continuous Performance Engineering
PaaS - Development Stefan Geiger Gerry
CANalytics TM CAN Interface Software BY.
Lecture 1: Multi-tier Architecture Overview
Cloud computing mechanisms
SENTRY SOFTWARE Extending BMC ProactiveNet Performance Management with
AIMS Equipment & Automation monitoring solution
Channel Access Concepts
Chapter 15: Network Monitoring and Tuning
Seminarium on Component-based Software Engineering
Banafsheh Hajinasab Based on presentation by K. Strnisa, Cosylab
Features Overview.
Mark Quirk Head of Technology Developer & Platform Group
#01 Client/Server Computing
Presentation transcript:

DIAMON

What is DIAMON ? Technology stack Current Situation & Plans

What is DIAMON? A service which allows to check what IS and what WAS going on.

The User Interface Informs on Health of computers / applications / equipment Computer/ application problem logs Host events: installations, restarts, reboots Responsible of computers Allows to Restart software or computer Subscribe to mail/sms notifications Introspect remotely processes JMX/CMX History Created in 2006 based on LASER-1 technology DIAMON V.2 based on C2Mon since 2012 Targeted at Developers + Operators (+ Service Managers)

The User Interface Shows general information on entity: responsible, operational, location Shows metrics and their health: computer, FECs, workstations, PLCs, module driver status, timing board status, process up/down Allows access to do diagnostic Introspect remotely processes JMX/CMX Check FIP Board status Check loaded drivers for modules Check Timing board state Check Host events : installations, restarts, reboots Check Problem logs Check running processes Actions: Restart software or computer Ping computer, SSH to machine Subscribe to mail/sms notifications Targeted at Developers + Operators (+ Service Managers)

What do we monitor ? Applications: metrics via JMX/RDA/CMX + up/down state Computers: FECs, Workstations, servers, VMs CPU/disk/uptime/network/ SNMP devices, e.g. WhiteRabbit Dedicated checks, e.g. JMS service performance Timing Board status Module Driver Status (correct/incorrect) FIP Module status 3400 computers 6880 processes metrics 3400 computers 6880 processes metrics

Entities are organized in tree structure Details Tools to filter Groups / Hosts

Find computer/process/metrics quickly HOSTSProcesesMetrics

Problem Overview for Host

Metric Details for Host

Host Events : restarts, reboots, installations..

Shows logs from Tracing

Custom Extension s Module Driver Status FIP board state

Remote “live” introspection of Java/C/C++ Processes C/C++ Services Java Services

Get Responsible quickly

Enable your notifications

Notification Example Alert: IO Wait on computer too high IO Wait back to OK

Manage your notifications If in WARN/ERROR this sends value change notification Filter items EGroup Edit Mode Notification Level Enable regular problem report

DIAMON GUI as portal to more information JAPC ToolboxCCDB Tracing CMWAdmin CCDB DIAMON Configuration Only for JMX & CLIC …

How is DIAMON used GUI Operators, Equipment specialists, Service Managers User specific configurations = “views” (CCDB) Backend Notify on problems: > 90 Messages/day, 29 users Applications read data from DIAMON via RDA (i.e. MOON) Host alarms are expected by OP FECs host metrics in Lemon Driver Feedback FIP problems

Technology Stack

CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 22 myMod

CLICRDA LASERJMX DIAMON (C2Mon) CCDB RDALASER History ~ datapoints ~8271 equipments 24 DAQs for acquisition & filtering PING 1 Server Lemon MOONLemon Acquisition & Filtering > 15 Equipment types > 26 Million updates/day > 26 Million updates/day Business Logic Layer > 150k data points > 300k alarms > 20K commands > 50K business rules > 28 Million updates/day > 70 configurations/day DIAMON GUI TRACING LASER configures C2MON 4 Outputs SNMP

The CLIC Agent C++ Process on every machine Compatible with Windows/Linux down to PPC4 Accessible via JMS Java API or /mcr/bin/dmnsh command Sends metrics every x seconds Extensions: Allows access to Timing board state Allows access to CMX enabled C/C++ processes Allows access to FIP Bus Only client-side acquisition of metrics CLIC communication via STOMP

What is good.. System is used by large user group on daily basis Acquisition and standardizing information from various sources (RDA, JMX, SNMP, HOST metrics…) C2Mon platform is quite reliable & evolving in right direction Online reconfiguration is VERY useful Mail/sms Notification very useful for pre-failure detection What is not so good: CCDB driven (Re)configuration SQL based history limited to 3 months & slow (C2Mon) Limited rule evaluation (C2Mon) Notification messages sometimes not really clear Some information is still missing compared to e.g. lemon Notifications or triggering rules cannot be easily trained

General Experience Sometimes not clear who is responsible for certain problem People have most experience, not the system We can show the responsible for host, but what about a module or timing card problem ? No information on responsible for deployed software process No concept of acknowledging a problem Currently rather host, less “service” oriented Notifications cannot be trained

Plans & Dreams Move configuration part away from CCDB Make information easy accessible: why only TN? Replace SQL history storage Enable sending/receiving data via REST C2MON Replace JCache implementation (Terracotta -> Ignite?) Find better solution for rule engine

Appendix

CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 29 myMod

C2MON in Action DAQ C2MON Server Terracotta … C2MON Server DAQ … Access Dashboard Data Analysis Terracotta Standby 30 Removes database dependency Allows scaling horizontally Client JMX SNMP Alarm Web Interface

Technologies used for C2MON C2MON server: No J2EE server and only open source!  Java 7, Spring 4.2  persistence framework: MyBATIS (server), Hibernate (client) Dependency management through Maven Database: Oracle, but no major dependencies. Works also with HSQL, MySQL Middleware: JMS ActiveMQ Message transport format: XML and JSON Remote caching solution for C2MON server cluster: Terracotta/Ehcache, (Apache Ignite foreseen)  horizontally scalable  proven technology  open source  support contract possible