Optimising CERN systems through ML & DA using controls data

Slides:



Advertisements
Similar presentations
Chapter 10 Control Loop Troubleshooting. Overall Course Objectives Develop the skills necessary to function as an industrial process control engineer.
Advertisements

André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
November 21, 2005 Center for Hybrid and Embedded Software Systems Engine Hybrid Model A mean value model of the engine.
Distributed Control Systems Emad Ali Chemical Engineering Department King SAUD University.
Unit 3a Industrial Control Systems
CRYOGENICS AND POWERING
CV activities on LHC complex during the long shutdown Serge Deleval Thanks to M. Nonis, Y. Body, G. Peon, S. Moccia, M. Obrecht Chamonix 2011.
Workshop “Vacuum systems of Synchrotron Light Sources“ organized by MAX-LAB & ALBA Barcelona, th September 2005 Instrumentation and Vacuum Control.
Openlab Workshop on Data Analytics 16 th of November 2012 Axel Voitier – CERN EN-ICE.
ITER – Interlocks Luis Fernandez December 2014 Central Interlock System CIS v0.
Frankfurt (Germany), 6-9 June 2011 EL-HADIDY – EG – S5 – 0690 Mohamed EL-HADIDY Dalal HELMI Egyptian Electricity Transmission Company Egypt EXAMPLES OF.
LHC Scrubbing Runs Overview H. Bartosik, G. Iadarola, K. Li, L. Mether, A. Romano, G. Rumolo, M. Schenk, G. Arduini ABP information meeting 03/09/2015.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
John Plummer Technical Specialist Data Platform Microsoft Ltd StreamInsight Complex Event Processing (CEP) Platform.
LHC Cryogenics Control: INTEGRATION OF THE INDUSTRIAL CONTROLS (UNICOS) AND FRONT-END SOFTWARE ARCHITECTURE (FESA) APPLICATIONS Enrique BLANCO Controls.
Modeling and simulation of cryogenic processes using EcosimPro
11/4/2005OLAV 1 Workshop CERNW. Maan Fast Vacuum Valves at CERN -Introduction to fast valves -History of fast valves in use/used at CERN -LEP fast shutters.
CERN openlab technical workshop
Luigi Serio CRYOGENICS PERFORMANCE AND OPERATION L. Serio, on behalf of the LHC Cryogenics Operation and Cryogenics Performance Panel.
Application Development in Engineering Optimization with Matlab and External Solvers Aalto University School of Engineering.
Cryogenics Fault Tree A. Niemi & E. Rogova. Contents 1.Introduction of the current tree structure 2.Failure rates observed in 2015 failure data 3.Unsure.
Data Analytics for Control Systems DAQ Data Analytics Workshop 14 April 2016 JCOP data analytics workshop 1 Axel Voitier Filippo Tilaro Manuel Gonzalez.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
CV works in the non- LHC accelerator complex during 2008 and plans for 2009 ATOP days 2009.
CERN openlab Machine Learning and Data Analytics workshop
SCADA Supervisory Control And Data Acquisition Pantech Solutions Here is the key to learn more.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Artificial Intelligence In Power System Author Doshi Pratik H.Darakh Bharat P.
Introduction to Machine Learning, its potential usage in network area,
Author: Nurul Azyyati Sabri
Combining safety and conventional interfaces for interlock PLCs
A Generic Approach to Big Data Alarms Prioritization
Operations Machine Simulator.
RF acceleration and transverse damper systems
A monitoring system for the beam-based feedbacks in the LHC
Data providers Volume & Type of Analysis Kickers
Technical Services: Unavailability Root Causes, Strategy and Limitations Data and presentation in collaboration with Ronan LEDRU and Luigi SERIO.
2007 IEEE Nuclear Science Symposium (NSS)
FCT and CERN Portuguese Trainee Programme Report
UNICOS: UNified Industrial COntrol System CPC (Continuous Process Control) Basic course SESSION 3: PLC basics UCPC 6 UNICOS-Continuous Process Control.
Performance monitoring framework for the technical infrastructure
We enable Digitalization Thomas Hahn CERN Openlab, March 2016
Future Archiver (librarian) for WinCC OA Control Systems
Introduction Characteristics Advantages Limitations
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
PID tuning & UNICOS PID auto-tuning
OVERVIEW Impact of Modelling and simulation in Mechatronics system
CESR Vacuum System Monitoring and Component Quality Assurance
LHC Performance Workshop (Chamonix 2016)
How SCADA Systems Work?.
ATTRACT TWD Symposium, Barcelona, Spain, 1st July 2016
Hollow e- lens, Cryogenic aspects
BE/ICS Monitoring Systems Survey Results
Development of built-in diagnostics in the RADE framework (EN2746)
Smart Data for Industrial Control Systems
How does a cryogenic system cope with e-cloud induced heat load
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Machine Protection Xu Hongliang.
Mathew C. Wright January 26, 2009
LHCCWG Meeting R. Alemany, M. Lamont, S. Page
C.Octavio Domínguez, Humberto Maury Cuna
Instrumentation and control
Automation Committee Workshop Presentation 2
Operation of Target Safety System (TSS)
Yining ZHAO Computer Network Information Center,
Review of hardware commissioning
Cryogenic management of the LHC Run 2 dynamic heat loads
Power Leads for Test Stands
Laurent Delprat CERN, Geneva, Switzerland
Presentation transcript:

Optimising CERN systems through ML & DA using controls data BE ML and DA Forum Workshop The main reason for this presentation is to introduce to you a very useful work carried out by BE-ICS in the context of the openlab project with different group at CERN in the domain of Data Analytics. Filippo Tilaro, M. Bengulescu, Fernando Varela (BE/ICS) Manuel Gonzalez (BE-BI) in collaboration with Siemens AG CT Munich, St. Petersburg, Brasov CERN – 28th May 2019

CERN Industrial Control Systems

BE-ICS interest in Big Data Exploit the Big Data volume produced by our systems to: Extend the monitoring capabilities of the control systems By detecting symptomatic effects in the data which do not trigger alarms Reduce operational and maintenance costs Increase availability, stability and performance of the processes Detect anomalous behavior and over-usage of sensors, actuators, industrial devices Predictive maintenance Anticipate when sensors and actuators have to be replace Render Industrial Control Systems smarter Guide engineers and operators in order to take corrective actions The work of BE-ICS in Big Data spans well beyond the Department boundaries Control System Data Analytics

BE-ICS Collaborations CERN collaboration with Siemens since 2011 to work in areas of common interest in the domains of: Evolution of SCADA systems 1 Fellow entirely paid by Siemens Big Data 1 LD post, which will now be converted temporary into 2 Fellow positions Access to a network of researches in Siemens specialized in ML techniques Collaboration with University of Valladolid, Spain for PID performance monitoring Potential collaboration with University of Marburg, Germany on Distributed Complex Event Processing

Selected work done so far Some of the work done so far in the area of Big Data: Identifications of CERN use-cases (>40) for Big Data: Offline Stream-data Model trained on archived data but works on stream-data Implementation of algorithms to tackle some selected cases Joint development of a platform for Distributed Complex Event Processing Evaluation of Siemens solutions Mindsphere, IoT and edge devices, … Anomaly Detection based on ML Process optimization based on ML Distributed Complex Event Processing Root-cause analysis } Need for Edge Computing

Our vision Broker Analytics master UI Rules Cloud computing Combining cloud and edge computing into a single analytical framework Stream analysis Central rules deployment Distributed computational load across multiple nodes Support for multiple data ingestion protocols Users Advanced: Jupyter & Python Dummy: Simple web UIs to select type of analysis, time-windows and sets of signals for the analysis VM Driver WinCC OA Archiving NXCALS Broker Analytics worker Cloud & Edge Link Edge computing Middleware Fieldbus

Identified Use-Cases: Partial list Control Systems Online Monitoring Faults diagnosis Engineering design Cryogenics 1. Detection of valves oscillation 2. Anomaly detection for sensors and actuators Compensation of heat excess in LHC magnets due to e- cloud Vibration analysis for cryo compressors PID performance evaluation Cooling & Ventilation Detection of tanks leak Detect PLC anomalies Assessment of dynamic vs fixed thresholds Machine protection LHC Circuit monitoring Identify causes for QPS data loss Data loss detection CERN Power Grid Forecast of the control system behaviour Electrical power quality of service Analysis of electrical power cuts Recommendation system for WinCC OA users Vacuum Vacuum leaks Understanding the degradation of the vacuum system due to leaks Anomalies in process regulation LHC Experiments (Gas Control Systems) Alarms flood management Root-cause fault analysis in Gas Control Systems Analysis of OPC-CAN middleware

UC1: Oscillation analysis for cryogenics valves Goal: detect anomalous oscillation of valves Lifespan of valves in km! Impact on: Process system stability and safety Communication load Maintenance (overuse of valves) Performance (Physic time) Why data analytics? General algorithm to detect different oscillations Monitoring several thousands of signals (not manually!) Over 34000 physical instrumentations and channels 12136 AI, 4856 AO,4536 DI,1568 DO,8000 spare and virtual channels, ~4000 analogical control loops More than 120 PLCs Siemens S7-416-2DP,30000 conceptual objects/parameters Examples of abnormal behaviour of valves

~5000 signal analysed every 24h UC1: Oscillation detection to minimize operational and maintenance cost Actions: Improve tuning of control loops to deal with external disturbances & unexpected interoperations Achievements Reduce maintenance cost by extending valves’ operational life More stable system Results: ~10% of the CRYO valves showed abnormal oscillations Multiple anomalies per valve up to 20 hours/month Wide range of anomalous frequencies: from 2 hours to sec Continuous anomaly detection analysis Web report CALS Hadoop cluster System expert Status: In production ~5000 signal analysed every 24h

UC2: Anomaly detection in CRYO signals Presence of different anomalies not detected by the control systems! Possible causes: hardware failures/degradations wrong tuning/structure false measurements… Impact Process stability and safety Maintenance (overuse of valves) Performance and downtime Why data analytics? Too complex to embed calculations into the control systems Learn from historical data the group of signals with similar behaviour Valves CV910 positions in L2 (26th June 2017) Direct impact on the operational cost!!! Beam dump! Example: large excursions of 17L2 and 19L2 to compensate the temperature increase -> confirmation of additional deposit of heat load Cryo intro

UC2: Anomaly detection in CRYO signals Signals Correlation and K-NN in action! Flipping fault detection Signal offset detection Multi-purpose algorithms Avoid lots of specialized analyses difficult to maintain ~5000 signal analysed continuously Able to detect faults not foreseen by experts Oscillation detection Faulty amplitude detection Better format of the images and split the titles, too many examples

UC2: Anomaly detection in CRYO signals Learn the groups of sensors/actuators which behave similarly Physical and logical relations Exploit historical data (~4GB/day for Cryo) Combine Machine Learning techniques with Experts’ knowledge Build a model to detect abnormal system behaviours Challenges: Model not specific to a domain/system Different types of anomalies, duration and noise Not precise boundaries between normal/anomalous Mostly unsupervised training: no database of faults! Dynamic system => dynamic model Model building

UC4: Root-cause analysis in Experiments Gas Control Systems 9 Apps 28 gas systems deployed around LHC 4 Data Server, 51 PLCs (29 for process control, 22 for flow-cells handling) Essential for particle detection Reliability and stability are critical Any variation in the gas composition can affect the accuracy of the acquired data ~18 000 physical sensors / actuators 6 Apps 7 Apps 6 Apps

Actual problem in the distribution and not in the Pump UC4: Root-cause analysis in Experiments Gas Control Systems Diagnose Alarm flood Domino effect Misleading feedback! Actual problem in the distribution and not in the Pump Fault in the distribution system Alarms flooding Diagnosing a fault is complex: it may take weeks! Alarms flood: a single fault can generate up to thousands of events The 1st alarm is not necessarily the most relevant for the diagnosis The same fault generates different events sequence depending on the system status A single fault can stop the whole control process

UC4: Root-cause analysis in Experiments Gas Control Systems Event stream analysis Analyze Learn Diagnose Data Identify and detect fault / abnormal pattern for Diagnosis and Prognostics Provide experts with Root-cause and Gap Analysis using Rules and Patterns Mining Forecasts, Trends and Early-Warnings to increase Operating Hours Event lists generated by the same fault Achievements: Identification of the root of the problem Algorithm learns patterns and use them to forecast possible faults Early warning to operators to intervene X T C D F A A E D N D B K D F A A B K D АА B A A B Alarm Pattern

UC5: Evaluation of PID performance Assist system engineering BE-ICS in collaboration with the University of Valladolid (not an openlab activity with Siemens) Based on: “Performance monitoring of industrial controllers based on the predictability of controller behaviour”, R. Ghraizi, E. Martinez, C. de Prada Impact on the regulation of the entire control systems Too many PIDs to check manually! A general method to assess different PIDs structure Many sources of faults/malfunctions System status dependency External disturbances/factors Bad tuning/Wrong controller type/structure Slow degradation Process Controller u w y SP CV v MV

UC5: Evaluation of PID performance PID anomaly detection: Assess each PID model based on the historical data Simple performance index Efficiency of control process: Time/actions taken/energy consumed to reach steady points Stability of the controlled variable Improvement of ~10% of the analysed control loops Bad Good

UC7: LHC circuit monitoring Condition monitoring analysis (in collaboration with TE-MPE) Main Goal: evaluation of the superconducting circuits health Degradation after 20 years of operations Monitoring conditions: anomalous change of current flows, impedance, circuit functioning … What to monitor? Electrical circuits magnets, power converters, switches … Control system: 16 WinCC OA servers 44 industrial FECs 2800 radiation-hard devices ~ 500M Signals Readout (from 10KHz to 1Hz)

UC7: LHC circuit monitoring Distribute complex event processing Inefficient current flow of analysis Manual data extraction, transformation and load Many independent scripts Time consuming New expert system as common framework Translate experts’ knowledge into formulation sets / rules Central knowledge database Rule template to be reused, parametrized, validated Domain specific language for simple formulation: Time reasoning and temporal expression Mathematical and logical functions Status: under development [lab testing]! Rule definition: Truth(sma(I_Meas, 1m30s)> I_Threshold)): duration(>=1h) Rules List of similar assets

UC3: Feed analytical results into the control system Visualize the results of the analysis to the operators in order to take the proper actions! Status: Working prototype

Fault detection applied to industrial process Collaboration with U. of Valladolid, Spain Application of PCA (Principal Component Analysis) to detect faults or degradation as early as possible to allow either preventive maintenance or to make operators aware to allow an optimal corrective action to increase uptime of an industrial plant PCA: an unsupervised, non parametric statistical technology, used in Machine learning to reduce the dimensionality of datasets feasible to the most relevant ones Applications: CV, CRYO Contact: Enrique Blanco

Pattern-based KPI discovery via CEP Collaboration: Marburg University, Germany Extract and identify relevant KPI from alarms and data via online exploration, thanks to pre-emptive data indexing and CEP pattern matching techniques being researched at Marburg University (ChronicleDB). compare online query performance between Spark + Hbase / Kudu / Impala vs Marburg’s ChronicleDB and apply online KPI extraction techniques for predictive improvements of alarms Applications: EL distribution, Access Control Contact: Matthias Braegger, Brice Copy

Next use-cases Linac3 ion beam source optimization: In collaboration with BE-ABP Find the optimal settings of control inputs to Optimize the ion current in the beam transformer of LINAC3 Minimize the variance of the ion current Learn from the ~10 years data operation of LINAC3 Assist the operators to choose the best settings for operation Vacuum leak detection: In collaboration with TE-VSC Critical for the proper operation of the accelerators and LHC machine Initial for SPS, then for all the other vacuum systems Historical analysis of pressure sensors (Pirani and Penning gauges) combined with : beam energy, beam mode and temperature sensors Inform operators to take the proper actions

Conclusions Data Analytics has an important added value already today to understand the behaviour and optimize complex systems Big impact on Operation and, running and maintenance Costs BE-ICS working on Data Analytics with Siemens for the last 5 years Openlab collaboration Growing community of users in different Groups and Departments Very distinct use-cases, not only related to controls General approach for multi domains application Reusability of the developed analysis

Use-cases: a partial list Online monitoring Control System Health Electrical power quality of service Looking for heat in superconducting magnets Oscillation in cryogenics valves Discharge of superconducting magnets heaters Trending and forecast of the control process behavior Vacuum Leak detection Faults diagnosis Anomalies in the process regulation PLC anomalies Data loss detection Root-cause analysis for complex WinCC OA installations Analysis of sensors functioning and data quality Analysis of OPC-CAN middleware Analysis of electrical power cuts Cryogenic system breakdowns Engineering design Electrical consumption forecast Efficiency of electric network Predictive maintenance of control systems elements LINAC3 ion beam source optimization Vibration analysis Efficiency of control process … Thank you! CERN BE-ICS https://be-dep-ics.web.cern.ch/

UC1: Compensation of the e- cloud thermal effect In collaboration with TE-CRG (Benjamin Bradu) LHC vacuum chamber Cold bore at 1.9K Beam screen (5-20K) to intercept heat load Interference with Cryo control system Ideal measurement cycle Heat load of the screen Electrons released into the vacuum chamber, amplified via secondary emission from the chamber wall through a beam-induced multipacting process, give rise to an electron cloud. The incident cloud electrons heat the beam screen, for which only a limited cooling capacity is available. If the beam-screen heat load exceeds the available cooling the cold superconducting magnets of the LHC arcs, surrounding the beam pipe, will quench; i.e., they lose their superconducting state. Thereby, the electron cloud may limit the maximum permissible beam current of the LHC. The expected heat load does not only depend on the beam current, but also on the bunch spacing, bunch intensity and the time dependent surface properties of the beam screen. The principal sources of primary electrons are the ionization of the residual gas at injection energy and photoemission at higher energies. Thermal resistance, R =6k/W Thermal capacitance, C = 1200 J/K Main issue: temperature increase close to the quench level trigger!

UC1: Compensation of e- cloud thermal effect Feed-Forward loops to compensate electron cloud heat load Compensation due to Feed Forward loops Currently used in production for Cryo: Keep temperature away from the quench level trigger Data analytics techniques to reduce computing time from weeks to hours! Cloud computing to parallelize and distribute Qdbs= heat load on the beam screen Qsr = synchrotron radiations Qic =image current Qec= electron clouds

UC6: Leak detection in Cooling and ventilation systems Problem: Manually set alarms thresholds Changing filling conditions Anomaly detection based on historical data Detection of “large” leaks: Anomalous valve opening time Detection of “small” leaks: Anomalous frequency of valve openings Achievements: Identification of anomalous behaviours Improving thresholds setting Distribution of valve openings [FSED_001_VMA400]