ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012.

Slides:



Advertisements
Similar presentations
This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
Controls Configuration Service Overview GSI Antonio on behalf of the Controls Configuration team Beams Department Controls Group Data & Applications.
2004 Cross-Platform Automated Regression Test Framework Ramkumar Ramalingam, Rispna Jain IBM Software Labs, India.
TEC at SLM 24 Aug 2011 Vito Baggiolini Reporting about work initiated or coordinated by me.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
 M.A - BIS Workshop – 4th of February 2015 BIS software layers at CERN Maxime Audrain BIS workshop for CERN and ESS, 3-4 of February 2015 On behalf of.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Maintaining and Updating Windows Server 2008
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
controls Middleware – OVERVIEW & architecture 26th June 2013
Overview Print and Document Services Print Management console Printer properties Troubleshooting.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
Customized cloud platform for computing on your terms !
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
W. Sliwinski – eLTC – 7March08 1 LSA & Safety – Integration of RBAC and MCS in the LHC control system.
AUTOBUILD Build and Deployment Automation Solution.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Overview of MSS System Human Actors Non-Human Actors In-house developed components Third party products.
14 December 2006 CO3 Data Management section Controls group Accelerator & Beams department Limits of Responsibilities in our Domains of Activities Ronny.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
JA-SIG 12/4/20051 JMX For Monitoring and Maintenance JA-SIG - December 4, 2005 – Atlanta, GA Eric Dalquist Division of Information Technology University.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
Log analysis in the accelerator sector Steen Jensen, BE-CO-DO.
WWWWhat timing services UUUUsage summary HHHHow to access the timing services ›I›I›I›Interface ›N›N›N›Non-functional requirements EEEExamples.
Management J2EE & JOnAS Domain Management JOnAS Juin 2005
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Eugenia Hatziangeli Beams Department Controls Group CERN, Accelerators and Technology Sector E.Hatziangeli - CERN-Greece Industry day, Athens 31st March.
Ibm.com /redbooks © Copyright IBM Corp All rights reserved. WP07 ITSO iSeries Technical Forum WebSphere Portal Express– Installation, Configuration.
T HE BE/CO T ESTBED AND ITS USE FOR TIMING AND SOFTWARE VALIDATION 22 June BE-CO-HT Jean-Claude BAU.
Software Architecture in Practice Practical Exercise in Performance Engineering.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Session 1 Introduction  What is RADE  Technology  Palette  Tools  Template  Combined Example  How to get RADE  Questions? RADE Applications EN-ICE-MTA.
Wojciech Sliwinski BE/CO for the RBAC team 25/04/2013.
The DIAMON Project Monitoring and Diagnostics for the CERN Controls Infrastructure Pierre Charrue, Mark Buttner, Joel Lauener, Katarina Sigerud, Maciej.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Module 13: Monitoring Resources and Performance. Overview Using Task Manager to Monitor System Performance Using Performance and Maintenance Tools to.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Module 9 Planning and Implementing Monitoring and Maintenance.
Centralized Logfile Search (a.k.a. Tracing) Vito Baggiolini with Gergo Horanyi, Felix Ehm, Stephen Page.
Jorke Odolphi Product Technology Specialist WebCentral Using Microsoft Operations Manager To Monitor And Maintain Your Farm.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Strategy to achieve smooth upgrades during operations Vito Baggiolini BE/CO 1.
BE-CO review Looking back at LS1 CERN /12/2015 Delphine Jacquet BE/OP/LHC Denis Cotte BE/OP/PS 1.
Mountaintop Software for the Dark Energy Camera Jon Thaler 1, T. Abbott 2, I. Karliner 1, T. Qian 1, K. Honscheid 3, W. Merritt 4, L. Buckley-Geer 4 1.
DIAMON Project Project Definition and Specifications Based on input from the AB/CO Section leaders.
Industrial Control Engineering ADE Rapid Application Development Environment.
© 2001 By Default! A Free sample background from Slide 1 Motivation CMW logging Real-Time Task CMW Server Logging thread.
Industrial Control Engineering Session 1 Introduction  What is RADE  Technology  Palette  Tools  Template  Combined Example  How to get RADE 
Software tools for digital LLRF system integration at CERN 04/11/2015 LLRF15, Software tools2 Andy Butterworth Tom Levens, Andrey Pashnin, Anthony Rey.
AB-CO Exploitation 2006 & Beyond Presented at AB/CO Review 20Sept05 C.H.Sicard (based on the work of Exploitation WG)
DIAMON. What is DIAMON ? Technology stack Current Situation & Plans.
A monitoring system for the beam-based feedbacks in the LHC
C/C++ Build tools & Testbed
CO HW Monitoring Architecture
Module 10: Managing and Monitoring Network Access
Middleware – ls1 progress and planning BE-CO Tc, 30th september 2013
FESA evolution and the vision for Front-End Software
Advanced Integration and Deployment Techniques
Smart Integration Express
LHC BLM Software audit June 2008.
Demo for Partners and Customers
Presentation transcript:

ACET Accelerator Controls Exploitation Tools Progress and plans, December 2012

Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December

3 Controls system overview Knobs Services “Core”Diagnostics Applications Middletier Front Ends Sequencer Orbit InCA/LSA Proxies JMS SIS CMW/FESA Timing Drivers DB Boot NFS cmwDir RBAC DiaMon cmwAdmin FESA Navigator Video Syslog Hardware Tune RT 425 Consoles 400 GUIs 300 Servers 200 Java servers 1300 FECs 600 module types devices

Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December

ACET  Motivation  Distributed and complex controls system  Knowledge distributed over many experts  Move towards uniform (LHC) exploitation model across machines  Purpose: Allow (non-)experts to carry out more efficient diagnostics  ACET collaborates with CO projects to improve diagnostic facilities of the control system ACET - TC on 06 December

Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December

Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and CLIC agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December

Diagnostic tools  Tools evaluated for criticality  Aggregation into CCM diagnostic menu  Training given during shutdown lectures ACET - TC on 06 December

Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December

Process Metrics – JMX architecture  C2Mon SRV JMX-DAQ DiaMon GUI Metrics RMI JMX mBeans JMX viewer JmxDirectory jConsole jar1 jar2 mgt JVM jmx-dir-client jVisualVM SRV ACET - TC on 06 December

Process metrics – CMX architecture  C2Mon CLIC-DAQ DiaMon GUI lib1 lib2 p1 lib1lib2 cmx-lib-c shared memory segments C process p1 cmx-lib registry lib3lib4 cmx-lib-c++ C++ process p2 lib3 lib4 p2 cmx-lib-c++ CLIC agent CMX viewer ACET - TC on 06 December FEC Command line tool DB Metrics

Process metrics – DiaMon JMX integration ACET - TC on 06 December

Process metrics - jConsole ACET - TC on 06 December

Process metrics - Viewers ACET - TC on 06 December

Process metrics – JMX lookup ACET - TC on 06 December

Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December

Documentation - Structure ACET - TC on 06 December

Documentation – Portal ACET - TC on 06 December

Documentation – Useful links ACET - TC on 06 December

Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December

Dependencies - architecture FEC cmwadmin-scanner Visualization client connections cmwAdmin CMW/FESA Dependency analysis FEC cmwDirectory “dot” files log files ACET - TC on 06 December  Data collection before LS1

Dependencies – a view ACET - TC on 06 December

Dependencies – a view ACET - TC on 06 December Face FecBook

Focus points  Diagnostic Tools – aggregation and training  Process metrics – JMX & CMX  DiaMon – GUI and clic agent  Documentation  Wiki/site structure, Portal and Useful links  Dynamic/runtime dependencies  Feedback – Tracing & Config  message format, transport, analysis  Trace analysis using Splunk  Config analysis in CCDB ACET - TC on 06 December

Feedback – architecture  cmw-fb-c C process cmw FESA3 cmw-log CCDB cmw-log4j Java process jar1jar2 ACET - TC on 06 December Listeners GUIs C process /var/log/messages FEC/SRV Syslog tracing APEX GUIs Splunk syslog converters Java tracing Tracing & Config libs logfiles Impl make Scripts cmmnbld deploy wreboot

Feedback – CCDB tracing GUI ACET - TC on 06 December

Feedback – Hardware config CCDB GUI ACET - TC on 06 December

Splunk - architecture  Central instance running on dedicated machine  Project accounts set up  Training given to projects  Project-specific searches created FEC FEC /var/log/messages FEC FEC SRV logfiles ACET - TC on 06 December Contact Steen for Splunk access FEC filter&throttle logfiles cmw-log SRV cmw-log4j filters

Splunk – Message filter GUI ACET - TC on 06 December

Splunk – saved searches ACET - TC on 06 December

Splunk - visualization ACET - TC on 06 December

Splunk – dashboard ACET - TC on 06 December

Splunk – Use case: japc-ext-dir  Queue overflow messages from CMW proxy  Hosts and PIDs reported  Client application identified  japc-ext-dir suspected – and verified  Subscriptions made to “constant” properties  Data never consumed => Queue overflow in proxy  Problem fixed by Eric ACET - TC on 06 December

Splunk – Use cases  Leap second  RBAC tokens missing/malformed/expired  CMW slow clients  Telegram layout and configuration  JAPC applying wrong token in certain cases  FESA handling of Timlib error  Separating test environment from operational ACET - TC on 06 December

Splunk – Comments (1)  “Proper usage requires very good configuration”  “We need to rework our way to log information…”  “Log files are a bit of a mess now, and only contain a sub-set of necessary data…it is necessary to clean up and extend logging…”  “…it must be possible for others to access the data…” ACET - TC on 06 December

Splunk – Comments (2) ACET - TC on 06 December  Positive comments  “Powerful tool for detecting and reporting anomalies”  “Very useful for proactive actions”  “Powerful tool to make statistics”  “It avoids spending time creating tools for decoding traces”  “It is an agile way to gather analytics, to inform design decisions”  “It is a very powerful auditing tool”  “Trends over time allow spotting new types of problems”  “It was useful for me several times for seeing if a problem is on one or multiple machines”  “It gives an easy, reusable way of looking at logfiles”  “It could become a valuable tool to spot errors, where currently we feel blind whenever there is a problem”

Splunk – vision  Active, daily use by component providers - Dashboards  Exploit tracing for  Pro-active operation  Informed evolution  Preventive maintenance  10 user-friendly message types per project  ERROR or WARNING  Contact information  Link to documentation  Message body meaningful to non-expert  No java stack trace  Continuous improvement of messages ACET - TC on 06 December

Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December

Plans for 2013 (a)  DiaMon  Interactive service-oriented dependency view  Declare and monitor process metrics  Integrate metrics viewers  Launching of external tools  Make contact information accessible  Splunk  Improve current setup and configurations  Increase support and project uptake  Investigate integration of ITAT ACET - TC on 06 December

Plans for 2013 (b)  Documentation  Agree/implement CO-wide website/wiki structure  Agree on maintenance responsibilities  Portal – review, add and extend pages  Content – all projects provide ½-page description  Databases  Finalize Hardware Configuration Feedback mechanisms  Capturing version information, detecting time bombs  Update contact information ACET - TC on 06 December

Plans for 2013 (c)  Feedback (Tracing and Configuration)  Improve message quality (structure, content, level)  Increase project usage of feedback API  All projects review configuration/version feedback  Process metrics  Work with projects to expose metrics  Extend CMX (commands,…) ?  MW team take over jmxDirectory ACET - TC on 06 December

Plans for 2013 (d)  Runtime dependency data  Analysis and visualization of CMW data  Collecting network connection information  Drivers  Finalize hardware configuration feedback  Version feedback implementation ACET - TC on 06 December

Outline  Controls system overview  Motivation and purpose  Focus points  2013  Conclusions ACET - TC on 06 December

Conclusions  Done  Means for provision/transport of tracing, configuration and metrics  Centralized Tracing and analysis  Todo  Data generation by projects  Documentation  Analysis and presentation  Good support from projects in 2012, but…  Too many other priorities for developers – and for me…  2013 is for bringing the pieces together ACET - TC on 06 December ACET needs time from all projects in 2013