Download presentation
Presentation is loading. Please wait.
Published byAnnabel McKenzie Modified over 9 years ago
1
ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012
2
Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions ACET - TC on 06 December 2012 2
3
3 Controls system overview Knobs Services “Core”Diagnostics Applications Middletier Front Ends Sequencer Orbit InCA/LSA Proxies JMS SIS CMW/FESA Timing Drivers DB Boot NFS cmwDir RBAC DiaMon cmwAdmin FESA Navigator Video Syslog Hardware Tune RT 425 Consoles 400 GUIs 300 Servers 200 Java servers 1300 FECs 600 module types 85.000 devices
4
Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions ACET - TC on 06 December 2012 4
5
ACET Motivation Distributed and complex controls system Knowledge distributed over many experts Move towards uniform (LHC) exploitation model across machines Purpose: Allow (non-)experts to carry out more efficient diagnostics ACET collaborates with CO projects to improve diagnostic facilities of the control system ACET - TC on 06 December 2012 5
6
Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions ACET - TC on 06 December 2012 6
7
Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX DiaMon – GUI and CLIC agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB ACET - TC on 06 December 2012 7
8
Diagnostic tools Tools evaluated for criticality Aggregation into CCM diagnostic menu Training given during shutdown lectures ACET - TC on 06 December 2012 8
9
Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX DiaMon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB ACET - TC on 06 December 2012 9
10
Process Metrics – JMX architecture http://wikis/display/ACET/JMX+client+instrumentation C2Mon SRV JMX-DAQ DiaMon GUI Metrics RMI JMX mBeans JMX viewer JmxDirectory jConsole jar1 jar2 mgt JVM jmx-dir-client jVisualVM SRV ACET - TC on 06 December 2012 10
11
Process metrics – CMX architecture http://wikis/display/MW/CMX C2Mon CLIC-DAQ DiaMon GUI lib1 lib2 p1 lib1lib2 cmx-lib-c shared memory segments C process p1 cmx-lib registry lib3lib4 cmx-lib-c++ C++ process p2 lib3 lib4 p2 cmx-lib-c++ CLIC agent CMX viewer ACET - TC on 06 December 2012 11 FEC Command line tool DB Metrics
12
Process metrics – DiaMon JMX integration ACET - TC on 06 December 2012 12
13
Process metrics - jConsole ACET - TC on 06 December 2012 13
14
Process metrics - Viewers ACET - TC on 06 December 2012 14
15
Process metrics – JMX lookup ACET - TC on 06 December 2012 15
16
Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX DiaMon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB ACET - TC on 06 December 2012 16
17
Documentation - Structure ACET - TC on 06 December 2012 17
18
Documentation – Portal ACET - TC on 06 December 2012 18
19
Documentation – Useful links ACET - TC on 06 December 2012 19
20
Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX DiaMon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB ACET - TC on 06 December 2012 20
21
Dependencies - architecture FEC cmwadmin-scanner Visualization client connections cmwAdmin CMW/FESA Dependency analysis FEC cmwDirectory “dot” files log files ACET - TC on 06 December 2012 21 Data collection before LS1 http://wikis/display/MW/Statistics
22
Dependencies – a view ACET - TC on 06 December 2012 22
23
Dependencies – a view ACET - TC on 06 December 2012 23 http://wikis/display/MW/Statistics Face FecBook
24
Focus points Diagnostic Tools – aggregation and training Process metrics – JMX & CMX DiaMon – GUI and clic agent Documentation Wiki/site structure, Portal and Useful links Dynamic/runtime dependencies Feedback – Tracing & Config message format, transport, analysis Trace analysis using Splunk Config analysis in CCDB ACET - TC on 06 December 2012 24
25
Feedback – architecture http://wikis/display/MW/Log+and+Tracing JMS@cs-ccr-tracing cmw-fb-c C process cmw FESA3 cmw-log CCDB cmw-log4j Java process jar1jar2 ACET - TC on 06 December 2012 25 Listeners GUIs C process syslog@cs-ccr-feop syslog@cs-ccr-tracing /var/log/messages FEC/SRV JMS@cs-ccr-cmw Syslog tracing APEX GUIs Splunk syslog converters Java tracing Tracing & Config libs logfiles Impl make Scripts cmmnbld deploy wreboot
26
Feedback – CCDB tracing GUI ACET - TC on 06 December 2012 26
27
Feedback – Hardware config CCDB GUI ACET - TC on 06 December 2012 27
28
Splunk - architecture Central instance running on dedicated machine Project accounts set up Training given to projects Project-specific searches created FEC JMS@cs-ccr-tracing FEC Splunk@cs-ccr-tracing syslog@cs-ccr-feop syslog@cs-ccr-tracing /var/log/messages FEC JMS@cs-ccr-cmw FEC SRV logfiles ACET - TC on 06 December 2012 28 Contact Steen for Splunk access FEC filter&throttle logfiles cmw-log SRV cmw-log4j filters
29
Splunk – Message filter GUI ACET - TC on 06 December 2012 29
30
Splunk – saved searches ACET - TC on 06 December 2012 30
31
Splunk - visualization ACET - TC on 06 December 2012 31
32
Splunk – dashboard ACET - TC on 06 December 2012 32
33
Splunk – Use case: japc-ext-dir Queue overflow messages from CMW proxy Hosts and PIDs reported Client application identified japc-ext-dir suspected – and verified Subscriptions made to “constant” properties Data never consumed => Queue overflow in proxy Problem fixed by Eric ACET - TC on 06 December 2012 33
34
Splunk – Use cases Leap second RBAC tokens missing/malformed/expired CMW slow clients Telegram layout and configuration JAPC applying wrong token in certain cases FESA handling of Timlib error Separating test environment from operational ACET - TC on 06 December 2012 34
35
Splunk – Comments (1) “Proper usage requires very good configuration” “We need to rework our way to log information…” “Log files are a bit of a mess now, and only contain a sub-set of necessary data…it is necessary to clean up and extend logging…” “…it must be possible for others to access the data…” ACET - TC on 06 December 2012 35
36
Splunk – Comments (2) ACET - TC on 06 December 2012 36 Positive comments “Powerful tool for detecting and reporting anomalies” “Very useful for proactive actions” “Powerful tool to make statistics” “It avoids spending time creating tools for decoding traces” “It is an agile way to gather analytics, to inform design decisions” “It is a very powerful auditing tool” “Trends over time allow spotting new types of problems” “It was useful for me several times for seeing if a problem is on one or multiple machines” “It gives an easy, reusable way of looking at logfiles” “It could become a valuable tool to spot errors, where currently we feel blind whenever there is a problem”
37
Splunk – vision Active, daily use by component providers - Dashboards Exploit tracing for Pro-active operation Informed evolution Preventive maintenance 10 user-friendly message types per project ERROR or WARNING Contact information Link to documentation Message body meaningful to non-expert No java stack trace Continuous improvement of messages ACET - TC on 06 December 2012 37
38
Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions ACET - TC on 06 December 2012 38
39
Plans for 2013 (a) DiaMon Interactive service-oriented dependency view Declare and monitor process metrics Integrate metrics viewers Launching of external tools Make contact information accessible Splunk Improve current setup and configurations Increase support and project uptake Investigate integration of ITAT ACET - TC on 06 December 2012 39
40
Plans for 2013 (b) Documentation Agree/implement CO-wide website/wiki structure Agree on maintenance responsibilities Portal – review, add and extend pages Content – all projects provide ½-page description Databases Finalize Hardware Configuration Feedback mechanisms Capturing version information, detecting time bombs Update contact information ACET - TC on 06 December 2012 40
41
Plans for 2013 (c) Feedback (Tracing and Configuration) Improve message quality (structure, content, level) Increase project usage of feedback API All projects review configuration/version feedback Process metrics Work with projects to expose metrics Extend CMX (commands,…) ? MW team take over jmxDirectory ACET - TC on 06 December 2012 41
42
Plans for 2013 (d) Runtime dependency data Analysis and visualization of CMW data Collecting network connection information Drivers Finalize hardware configuration feedback Version feedback implementation ACET - TC on 06 December 2012 42
43
Outline Controls system overview Motivation and purpose Focus points 2013 Conclusions ACET - TC on 06 December 2012 43
44
Conclusions Done Means for provision/transport of tracing, configuration and metrics Centralized Tracing and analysis Todo Data generation by projects Documentation Analysis and presentation Good support from projects in 2012, but… Too many other priorities for developers – and for me… 2013 is for bringing the pieces together ACET - TC on 06 December 2012 44 ACET needs time from all projects in 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.