BE/ICS Monitoring Systems Survey Results

Slides:



Advertisements
Similar presentations
Slow Control LHCf Catania Meeting - July 04-06, 2009 Lorenzo Bonechi.
Advertisements

Keeping our websites running - troubleshooting with Appdynamics Benoit Villaumie Lead Architect Guillaume Postaire Infrastructure Manager.
Unified theory of software evolution Reengineering – Business process reengineering and software reengineering BPR model – Business definition, process.
Supervision of Production Computers in ALICE Peter Chochula for the ALICE DCS team.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
CS CS 5150 Software Engineering Lecture 12 Usability 2.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Enterprise PI - How do I manage all of this? Robert Raesemann J Jacksonville, FL.
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
Requirements Review – July 21, Requirements for CMS Patricia McBride July 21, 2005.
1 Alice DAQ Configuration DB
Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
OHTO -99 SOFTWARE ENGINEERING “SOFTWARE PRODUCT QUALITY” Today: - Software quality - Quality Components - ”Good” software properties.
I Power Higher Computing Software Development The Software Development Process.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
CENTRALISED AND CLIENT / SERVER DBMS. Topics To Be Discussed………………………. (A) Centralized DBMS (i) IntroductionIntroduction (ii) AdvantagesAdvantages (ii)
Software quality factors
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
An Overview of Support of Small Embedded Systems with Some Recommendations Controls Working Group April 14, 2004 T. Meyer, D. Peterson.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
DIAMON Project Project Definition and Specifications Based on input from the AB/CO Section leaders.
Software Design and Architecture
Retele de senzori EEMon Electrical Energy Monitoring System.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
Vacuum Controls framework for the LHC and its injectors Sebastien Blanchard CERN TE-VSC Integration of controllers in SCADA using OPC Server Goal: Improve.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
Artificial Intelligence In Power System Author Doshi Pratik H.Darakh Bharat P.
- My application works like a dream…does it. -No prob, MOON is here. F
Combining safety and conventional interfaces for interlock PLCs
Automation Technologies SCADA SENSORS HMI
Understanding the New PTC System Monitor (PSM/Dynatrace) Application’s Capabilities and Advanced Usage Stephen Vaillancourt PTC Technical Support –Technical.
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Manufacturing Productivity Solutions
FCT and CERN Portuguese Trainee Programme Report
Client-Server & Peer-to-Peer Networks
The Development Process of Web Applications
BA Continuum India Pvt Ltd
Network Operating Systems (NOS)
Hands-On Microsoft Windows Server 2008
GWE Core Grid Wizard Enterprise (
Software Quality Assurance Software Quality Factor
Online Control Program: a summary of recent discussions
How SCADA Systems Work?.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Lec3: Network Management
Introduction to Cloud Computing
by Prasad Mane (05IT6012) School of Information Technology
Fault Tolerance Distributed Web-based Systems
PLC / SCADA / HMI Controllers: Name : Muhammad Zunair Comsats University Date: 28-October-2018.
Process Monitoring and Control Systems
Cloud computing mechanisms
Chapter 2: Operating-System Structures
PLANNING A SECURE BASELINE INSTALLATION
“Detective”: Integrating NDT and E2E piPEs
Chapter 2: Operating-System Structures
Project Migration / Management
Network Monitoring System
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Lec1: Introduction to Network Management
Infokall Enterprise Solutions
Presentation transcript:

BE/ICS Monitoring Systems Survey Results BE/ICS Technical Committee 3.11.2017 Timo & Ben

Goal of the survey Find out the current status of monitoring systems at BE/ICS What systems? Who uses? How? What level of satisfaction? What future requirements / wishes are there? By monitoring we mean “system monitoring” to determine system health as opposed to “process monitoring/control”

Survey participation Survey sent to be-dep-ics e-group 16 responses out of 88 -> 18% response rate

Q: Do you have monitoring requirements? Which monitoring system(s)? 14 answered the whole survey 2 didn’t have use for monitoring

What is monitored? WinCC OA: Access systems: Communication between WinCC OA SCADA and PLCs, FECs, equipment, databases Systems, processes, WinCC OA manager status, system integrity Subsystems such as LoggingDB, distributed systems connectivity, etc. License Generation Service Access systems: PC clients, servers: main components (CPU, disk, temperature, memory...) UPS information (voltage IN/OUT, autonomy, temperature, alarms) Access to DB read/write Intercoms Video cameras Control infrastructure components (e.g. SCADA machines, PLCs, Fieldbus nodes,...) Control process data, alarms, logs to detect faults/anomalies, discover patterns of alarms, evaluate the performance of the control systems. Hardware, network, processes to ensure the correct running of the systems All the applications covered by the Piquet. Critical / sensitive services and equipment

Users of monitoring data All control system users see system integrity alarms; experts inspect details. Experts for WinCC OA Service should be alarmed upon incidents. Piquet service (multiple times per day) depending of the analyzed control system and use-case. Other users like: TE-MPE. Mainly used for maintenance purposes. Very few outside users, but there's a need for some users to have access (they already have it). Different operator groups (TI, CCC, LHC OP...) and administrators

Immediacy of monitoring

System metrics: quantity and connectivity

Hardware: types and metrics

Software: metrics Software processes currently running Process CPU usage Process memory usage Process file handle count Process socket count Process live/connectivity (heartbeat) Process version level Log entries Database connectivity Webservice connectivity Access to AFS Application configuration mismatch

Tools used and level of satisfaction Perfect Poor

Future: is there need for improvement and to what extent

Future: comments and requests In the scope of frameworks redesign we need to streamline the code and architecture to make the system easier to develop as well as to use/maintain. In particular, dealing with large distributed configurations should be addressed. The system needs to be migrated from AFS (due to AFS phase-out); at the same time consolidation work is planned. Monitoring/logging improvements would be a part of the package. Piquet service would like a higher reliability from MOON. Integrate the current monitoring systems into a complete analytical framework where everyone can have access to the data and perform their analysis. More metrics to be monitored. More maintenance on MOON, make it more intuitive to use, faster and more functionality. A system that can be adapted to the user needs by the user. Devices to be monitored that can be easily removed/inserted by the user. Configurable alarm conditions. A monitoring system with an alarm screen. Link to a GIS metadata. A reliable and robust web interface for the monitoring tool. To be able to myself add/edit/delete and use the monitoring tool autonomously. For example, if a new equipment to be monitored with the same parameters as existing ones (a new camera, computer, UPS...) that the tool could be modular and easily configured. Better configuration of new instances, include monitoring for the new hardware (PLCs) and associated fieldbus nodes (e.g. Profinet) and communication backbone (e.g. switches, routers). More reactive, new functionalities, decoupling process alarm from infrastructure alarms, etc.

Future: size and connectivity Current:

Future: hardware types and metrics Essentially unchanged from the current systems

Future: software metrics Essentially unchanged from the current systems

Future: what monitoring tool Omission in the survey: MOON not among given choices -> assume WinCCOA to cover it Generally users wish to stick with what they have -> no need for radical change

Future: interest for new features: reminder of goals Online goals include: Identification of anomalous situations, i.e. alarm conditions (to alert a piquet service or project responsible of a system in error) Fault prediction, to alert before a system is likely to enter an error state. Offline goals include: Fault diagnosis, to aid experts in root cause analysis Fault classification, to assist experts in identifying common problems Knowledge discovery, e.g. to find inter-dependencies in the underlying process being monitored Expert rule based systems: System experts use their experience to define triggers (for alarms for example), a simple case is to define a threshold value for some measured metric - if the measured value exceeds the threshold, an alarm is raised. Machine learning systems: An algorithm, trained using data sets, generates a stochastic model which is then used to identify/extract information from measured system metrics.

Future: interest for new features

Future: techniques for online and offline monitoring Expert rule-based systems Machine learning systems None Combination of both rule-based and machine learning

Conclusions Statistics not great, but maybe covered most users with deeper interest in system monitoring Two poles: PLC sections (Moon, WinCC OA) and Access/Safety sections (SSM/Zabbix) Fairly good satisfaction level to status quo Users tend to be conservative If it ain’t (badly) broke, don’t change it Nonetheless, improvements welcome: Speed, reliability Ease of use User configurability / adaptability Future directions: Interest in new techniques (better automatic reasoning, statistical analysis)