Download presentation
Presentation is loading. Please wait.
Published byMalcolm Allison Modified over 8 years ago
1
DIAMON
2
What is DIAMON ? Technology stack Current Situation & Plans
3
What is DIAMON? A service which allows to check what IS and what WAS going on.
4
The User Interface Informs on Health of computers / applications / equipment Computer/ application problem logs Host events: installations, restarts, reboots Responsible of computers Allows to Restart software or computer Subscribe to mail/sms notifications Introspect remotely processes JMX/CMX History Created in 2006 based on LASER-1 technology DIAMON V.2 based on C2Mon since 2012 Targeted at Developers + Operators (+ Service Managers)
5
The User Interface Shows general information on entity: responsible, operational, location Shows metrics and their health: computer, FECs, workstations, PLCs, module driver status, timing board status, process up/down Allows access to do diagnostic Introspect remotely processes JMX/CMX Check FIP Board status Check loaded drivers for modules Check Timing board state Check Host events : installations, restarts, reboots Check Problem logs Check running processes Actions: Restart software or computer Ping computer, SSH to machine Subscribe to mail/sms notifications Targeted at Developers + Operators (+ Service Managers)
6
What do we monitor ? Applications: metrics via JMX/RDA/CMX + up/down state Computers: FECs, Workstations, servers, VMs CPU/disk/uptime/network/ SNMP devices, e.g. WhiteRabbit Dedicated checks, e.g. JMS service performance Timing Board status Module Driver Status (correct/incorrect) FIP Module status 3400 computers 6880 processes 152.000 metrics 3400 computers 6880 processes 152.000 metrics
7
Entities are organized in tree structure Details Tools to filter Groups / Hosts
8
Find computer/process/metrics quickly HOSTSProcesesMetrics
9
Problem Overview for Host
10
Metric Details for Host
11
Host Events : restarts, reboots, installations..
12
Shows logs from Tracing
13
Custom Extension s Module Driver Status FIP board state
14
Remote “live” introspection of Java/C/C++ Processes C/C++ Services Java Services
15
Get Responsible quickly
16
Enable your notifications
17
Notification Example Alert: IO Wait on computer too high IO Wait back to OK
18
Manage your notifications If in WARN/ERROR this sends value change notification Filter items EGroup Edit Mode Notification Level Enable regular problem report
19
DIAMON GUI as portal to more information JAPC ToolboxCCDB Tracing CMWAdmin CCDB DIAMON Configuration Only for JMX & CLIC …
20
How is DIAMON used GUI Operators, Equipment specialists, Service Managers User specific configurations = “views” (CCDB) Backend Notify on problems: > 90 Messages/day, 29 users Applications read data from DIAMON via RDA (i.e. MOON) Host alarms are expected by OP FECs host metrics in Lemon Driver Feedback FIP problems
21
Technology Stack
22
CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 22 myMod
23
CLICRDA LASERJMX DIAMON (C2Mon) CCDB RDALASER History ~150.000 datapoints ~8271 equipments 24 DAQs for acquisition & filtering PING 1 Server Lemon MOONLemon Acquisition & Filtering > 15 Equipment types > 26 Million updates/day > 26 Million updates/day Business Logic Layer > 150k data points > 300k alarms > 20K commands > 50K business rules > 28 Million updates/day > 70 configurations/day DIAMON GUI TRACING LASER configures C2MON 4 Outputs SNMP
24
The CLIC Agent C++ Process on every machine Compatible with Windows/Linux down to PPC4 Accessible via JMS Java API or /mcr/bin/dmnsh command Sends metrics every x seconds Extensions: Allows access to Timing board state Allows access to CMX enabled C/C++ processes Allows access to FIP Bus Only client-side acquisition of metrics CLIC communication via STOMP
25
What is good.. System is used by large user group on daily basis Acquisition and standardizing information from various sources (RDA, JMX, SNMP, HOST metrics…) C2Mon platform is quite reliable & evolving in right direction Online reconfiguration is VERY useful Mail/sms Notification very useful for pre-failure detection What is not so good: CCDB driven (Re)configuration SQL based history limited to 3 months & slow (C2Mon) Limited rule evaluation (C2Mon) Notification messages sometimes not really clear Some information is still missing compared to e.g. lemon Notifications or triggering rules cannot be easily trained
26
General Experience Sometimes not clear who is responsible for certain problem People have most experience, not the system We can show the responsible for host, but what about a module or timing card problem ? No information on responsible for deployed software process No concept of acknowledging a problem Currently rather host, less “service” oriented Notifications cannot be trained
27
Plans & Dreams Move configuration part away from CCDB Make information easy accessible: why only TN? Replace SQL history storage Enable sending/receiving data via REST C2MON Replace JCache implementation (Terracotta -> Ignite?) Find better solution for rule engine DIAMON@grafana
28
Appendix
29
CERN Control and Monitoring Platform Acquisition and filtering of metrics Evaluation of rules and alarms Provides historical data and replay Extensible and modular The C2MON Project C2MON Server DAQ API Client API myApp myDAQ Acquisition Filtering Business Client Apps 29 myMod
30
C2MON in Action DAQ C2MON Server Terracotta … C2MON Server DAQ … Access Dashboard Data Analysis Terracotta Standby 30 Removes database dependency Allows scaling horizontally Client JMX SNMP Alarm Web Interface
31
Technologies used for C2MON C2MON server: No J2EE server and only open source! Java 7, Spring 4.2 persistence framework: MyBATIS (server), Hibernate (client) Dependency management through Maven Database: Oracle, but no major dependencies. Works also with HSQL, MySQL Middleware: JMS ActiveMQ Message transport format: XML and JSON Remote caching solution for C2MON server cluster: Terracotta/Ehcache, (Apache Ignite foreseen) horizontally scalable proven technology open source support contract possible
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.