Presentation is loading. Please wait.

Presentation is loading. Please wait.

BE/ICS Monitoring Systems Survey Results

Similar presentations


Presentation on theme: "BE/ICS Monitoring Systems Survey Results"— Presentation transcript:

1 BE/ICS Monitoring Systems Survey Results
BE/ICS Technical Committee Timo & Ben

2 Goal of the survey Find out the current status of monitoring systems at BE/ICS What systems? Who uses? How? What level of satisfaction? What future requirements / wishes are there? By monitoring we mean “system monitoring” to determine system health as opposed to “process monitoring/control”

3 Survey participation Survey sent to be-dep-ics e-group
16 responses out of 88 -> 18% response rate

4 Q: Do you have monitoring requirements? Which monitoring system(s)?
14 answered the whole survey 2 didn’t have use for monitoring

5 What is monitored? WinCC OA: Access systems:
Communication between WinCC OA SCADA and PLCs, FECs, equipment, databases Systems, processes, WinCC OA manager status, system integrity Subsystems such as LoggingDB, distributed systems connectivity, etc. License Generation Service Access systems: PC clients, servers: main components (CPU, disk, temperature, memory...) UPS information (voltage IN/OUT, autonomy, temperature, alarms) Access to DB read/write Intercoms Video cameras Control infrastructure components (e.g. SCADA machines, PLCs, Fieldbus nodes,...) Control process data, alarms, logs to detect faults/anomalies, discover patterns of alarms, evaluate the performance of the control systems. Hardware, network, processes to ensure the correct running of the systems All the applications covered by the Piquet. Critical / sensitive services and equipment

6 Users of monitoring data
All control system users see system integrity alarms; experts inspect details. Experts for WinCC OA Service should be alarmed upon incidents. Piquet service (multiple times per day) depending of the analyzed control system and use-case. Other users like: TE-MPE. Mainly used for maintenance purposes. Very few outside users, but there's a need for some users to have access (they already have it). Different operator groups (TI, CCC, LHC OP...) and administrators

7 Immediacy of monitoring

8 System metrics: quantity and connectivity

9 Hardware: types and metrics

10 Software: metrics Software processes currently running
Process CPU usage Process memory usage Process file handle count Process socket count Process live/connectivity (heartbeat) Process version level Log entries Database connectivity Webservice connectivity Access to AFS Application configuration mismatch

11 Tools used and level of satisfaction
Perfect Poor

12 Future: is there need for improvement and to what extent

13 Future: comments and requests
In the scope of frameworks redesign we need to streamline the code and architecture to make the system easier to develop as well as to use/maintain. In particular, dealing with large distributed configurations should be addressed. The system needs to be migrated from AFS (due to AFS phase-out); at the same time consolidation work is planned. Monitoring/logging improvements would be a part of the package. Piquet service would like a higher reliability from MOON. Integrate the current monitoring systems into a complete analytical framework where everyone can have access to the data and perform their analysis. More metrics to be monitored. More maintenance on MOON, make it more intuitive to use, faster and more functionality. A system that can be adapted to the user needs by the user. Devices to be monitored that can be easily removed/inserted by the user. Configurable alarm conditions. A monitoring system with an alarm screen. Link to a GIS metadata. A reliable and robust web interface for the monitoring tool. To be able to myself add/edit/delete and use the monitoring tool autonomously. For example, if a new equipment to be monitored with the same parameters as existing ones (a new camera, computer, UPS...) that the tool could be modular and easily configured. Better configuration of new instances, include monitoring for the new hardware (PLCs) and associated fieldbus nodes (e.g. Profinet) and communication backbone (e.g. switches, routers). More reactive, new functionalities, decoupling process alarm from infrastructure alarms, etc.

14 Future: size and connectivity
Current:

15 Future: hardware types and metrics
Essentially unchanged from the current systems

16 Future: software metrics
Essentially unchanged from the current systems

17 Future: what monitoring tool
Omission in the survey: MOON not among given choices -> assume WinCCOA to cover it Generally users wish to stick with what they have -> no need for radical change

18 Future: interest for new features: reminder of goals
Online goals include: Identification of anomalous situations, i.e. alarm conditions (to alert a piquet service or project responsible of a system in error) Fault prediction, to alert before a system is likely to enter an error state. Offline goals include: Fault diagnosis, to aid experts in root cause analysis Fault classification, to assist experts in identifying common problems Knowledge discovery, e.g. to find inter-dependencies in the underlying process being monitored Expert rule based systems: System experts use their experience to define triggers (for alarms for example), a simple case is to define a threshold value for some measured metric - if the measured value exceeds the threshold, an alarm is raised. Machine learning systems: An algorithm, trained using data sets, generates a stochastic model which is then used to identify/extract information from measured system metrics.

19 Future: interest for new features

20 Future: techniques for online and offline monitoring
Expert rule-based systems Machine learning systems None Combination of both rule-based and machine learning

21 Conclusions Statistics not great, but maybe covered most users with deeper interest in system monitoring Two poles: PLC sections (Moon, WinCC OA) and Access/Safety sections (SSM/Zabbix) Fairly good satisfaction level to status quo Users tend to be conservative If it ain’t (badly) broke, don’t change it Nonetheless, improvements welcome: Speed, reliability Ease of use User configurability / adaptability Future directions: Interest in new techniques (better automatic reasoning, statistical analysis)


Download ppt "BE/ICS Monitoring Systems Survey Results"

Similar presentations


Ads by Google