Data quality, or how to keep afloat in the growing data flood

Slides:



Advertisements
Similar presentations
Create physical barriers (banners and hangers) that direct you to the correct equipment and/or components and away from high risk equipment. Create physical.
Advertisements

Nano-10 based Building Automation System (BAS) Jesús BaezMarch, 2011.
André Augustinus 15 March 2003 DCS Workshop Safety Interlocks.
June 2010 At A Glance The Room Alert Adapter software in conjunction with AVTECH Room Alert™ devices assists in monitoring computer room environments as.
Tool removed during cycle Fault #2 Conditions for setting Tool cocked prox switch goes open during cycle AND force on load cell drops below limit in fault.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Intervention Priority Management This talk will show the CERN priority list, the corresponding check list and the tools used by operators to diagnose a.
Failure mode impact studies and LV system commissioning tests
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 14: Troubleshooting Windows Server 2003 Networks.
Industrial Control Engineering Industrial Controls in the Injectors: "You (will) know that they are here" Hervé Milcent On behalf of EN/ICE IEFC workshop.
ST Technical Committee 15th October LHC Ventilation Control System migration D. Blanc ST-CV.
Overview of Data Management solutions for the Control and Operation of the CERN Accelerators Database Futures Workshop, CERN June 2011 Zory Zaharieva,
VLAN Trunking Protocol (VTP)
06/05/2004AB/CO TC RF controls issues Brief overview & status Requested from AB/CO Hardware, Timing, VME/FESA for LEIR, SPS, LHC Controls for LHC RF Power.
UPS network perturbations in SX2 Vincent Chareyre EN-EL-SN ALICE Technical Coordination Meeting 7 May 2010.
ITER – Interlocks Luis Fernandez December 2014 Central Interlock System CIS v0.
JCOP Workshop September 8th 1999 H.J.Burckhart 1 ATLAS DCS Organization of Detector and Controls Architecture Connection to DAQ Front-end System Practical.
European Organization for Nuclear Research LHC Gas Control System Applications G.Thomas, J.Ortola Vidal, J.Rochez EN-ICE Workshop 23 April 2009.
You’ll get a lot more from the course if you ask questions. If I don’t know the answer, I will find out before the end of the course.
CERN Safety Alarm Monitoring Presented by Luigi Scibile ST division / MO group.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Oracle Tuning Considerations. Agenda Why Tune ? Why Tune ? Ways to Improve Performance Ways to Improve Performance Hardware Hardware Software Software.
Operational tools Laurette Ponce BE-OP 1. 2 Powering tests and Safety 23 July 2009  After the 19 th September, a re-enforcement of access control during.
FAIR Accelerator Controls Strategy
PLC Workshop at ITER, 4-5 th of December 2014 A. Nordt, ESS, Lund/Sweden.
Chiller control system Lukasz Zwalinski – PH/DT.
Chiller control system Specification meeting Lukasz Zwalinski – PH/DT.
The DIAMON Project Monitoring and Diagnostics for the CERN Controls Infrastructure Pierre Charrue, Mark Buttner, Joel Lauener, Katarina Sigerud, Maciej.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
1LHC COOP, Uwe EPTING, CERN, ST/MO LHC - Technical Infrastructure Monitoring Uwe EPTING CERN, ST/MO.
NERC Lessons Learned Summary LLs Published in September 2015.
Strategy to achieve smooth upgrades during operations Vito Baggiolini BE/CO 1.
DCE Infrastructure Maintenance Plan Robert A. Bissell Unixpros, Inc.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
European Organization for Nuclear Research LHC Gas Control System Applications Generation to Deployment phases Strategy/Principles.
Quality assurance - documentation and diagnostics during interventions Corrective maintenance seen from the Technical Infrastructure operation Peter Sollander,
AB/CO Review, Interlock team, 20 th September Interlock team – the AB/CO point of view M.Zerlauth, R.Harrison Powering Interlocks A common task.
Control System Considerations for ADS EuCARD-2/MAX Accelerators for Accelerator Driven Systems Workshop, CERN, March 20-21, 2014 Klemen Žagar Robert Modic.
External Data and DIP Oliver Holme 18 th January 2008.
A Validation System for the Complex Event Processing Directives of the ATLAS Shifter Assistant Tool G. Anders (CERN), G. Avolio (CERN), A. Kazarov (PNPI),
Support for Technical Infrastructure operations P. Sollander, AB/OP/TI.
CV works in the non- LHC accelerator complex during 2008 and plans for 2009 ATOP days 2009.
CERN Marc Magrans de Abril MPP – RBI DCCT Incident Beam Lost in TI2 Event Analysis and Proposed Mitigations SPS and LHC MPP
LabVIEW Core I with RADE introduction EN/ICE/MTA.
INFSO-RI Enabling Grids for E-sciencE Workshop WLCG Security for Grid Sites Louis Poncet System Engineer SA3 - OSCT.
- My application works like a dream…does it. -No prob, MOON is here. F
OPERATES SCADA OPERATION SYSTEM Explain the operational SCADA
L17 - Studio 5000 View Designer™ featured on the PanelView™ 5000
Complete Plant Solution from Mitsubishi
Manufacturing Productivity Solutions
Cisco Unified Operations Manager Proactive Voice Troubleshooting
Technical Services: Unavailability Root Causes, Strategy and Limitations Data and presentation in collaboration with Ronan LEDRU and Luigi SERIO.
SCADA & Monitoring Developments for Vacuum
FCT and CERN Portuguese Trainee Programme Report
SNS Status Report Karen S. White 10/15/08.
CV PVSS project architecture
RF interlocks for higher intensities (LMC 15 June)
How SCADA Systems Work?.
Quality Assurance applied to Accelerator Safety
the CERN Electrical network protection system
FCT Follow-up Meeting 31 March, 2017 Fernando Meireles
Managing infrastructure faults to minimize accelerator down time
Frédéric Hemmer CERN IT Department Head 7 May 2013
Interlocking of CNGS (and other high intensity beams) at the SPS
Process Monitoring and Control Systems
TE-EPC PIQUET BAD MANIPULATION
Operation of Target Safety System (TSS)
TS2 PSS Software Requirements and Software Design
Field installable, upgradeable and scaleable
Lessons Learned: Comments from Operations
Presentation transcript:

Data quality, or how to keep afloat in the growing data flood P. Sollander, CERN 16/4/2013 ARW2013

Outline Control system architecture and data flows/floods Data quality problems and consequences False negatives Unknown state False positives System and software strategies Processes and procedures Summary 16/4/2013 ARW2013

Control system architecture logging Middleware ~1M / day Application server DAQ DAQ DAQ DAQ DAQ 100+ What could possibly go wrong? 16/4/2013 ARW2013

Control system architecture What could possibly go wrong??? ✗ logging Middleware ~1M / day Middleware ~1M / day ✗ ✗ What could possibly go wrong? ✗ 100+ ✗ ✗ 16/4/2013 ARW2013

False negatives No alarm ≠ no problem 11/1/11 – Big power cut at LHC No network  no alarms  no problem? Broken PLC-SCADA connection Monitoring OK  operator confident  hours spent looking elsewhere April 3 2013, inundation alarm on LHC P5. Pumps stopped, but no alarm. The PLC to SCADA connection was not monitored… Must be minimized, zero is impossible? 16/4/2013 ARW2013

Monitoring the system Data Tag Value Timestamp Quality Middleware ~1M / day What could possibly go wrong? 100+ 16/4/2013 ARW2013

Indicating quality on alarms Active alarms get [?] prefix New alarm on faulty controls component Help Alarm 16/4/2013 ARW2013

Indicating quality on synoptics 16/4/2013 ARW2013

Indicating quality on applications 16/4/2013 ARW2013

Acting on bad quality data Indicate to operator What about other applications using the data? Software Interlocks for example? 16/4/2013 ARW2013

Panicky software interlocks LHC Beam dump Data Tag Value: closed Timestamp Quality: OK Data Tag Value: closed? Timestamp Quality: NOK Software Interlock System Software Interlock System Middleware ~1M / day Reboot of an element Software Interlock System tolerance for doubtful data Reduce false positives by waiting a reasonable amount of time before taking action 100+ 16/4/2013 ARW2013

False positive False alarms 1% of Technical infrastructure alarms are real! Easy to miss out on an important one 24/1/2007 – Constant false alarms mask one real alarm  400kV breaker trips, 7 hours to switch everything back 16/4/2013 ARW2013

Software strategies Software Interlock System tolerance for doubtful data Reduce false positives by waiting a reasonable amount of time before taking action Add indications of bad quality, [?] and color 16/4/2013 ARW2013

Operator strategies Wait to see if the alarm stays? Check the trend Poor reading gives brief 0 reading. Diagnose with good tools Worth investing in good tools 1% real alarms for CERN’s technical infrastructure 16/4/2013 ARW2013

Processes to improve quality Alarm and data configuration process Every alarm checked by operation Long and tedious Cannot work without it Test procedures Correction procedures Operating instructions HelpAlarm Diagnostic tools in system 16/4/2013 ARW2013

Data integration process Create request Equipment group System check Computerized Data check Operators Data validation Tests Equipment group and Operators 16/4/2013 ARW2013

Summary CERN technical infrastructure system is huge, a million alarms per year! Control system is event based False negatives – reduced by thorough monitoring of the system itself, diagnostic tools False positives – reduced mainly by procedure Strict integration rules, testing, correction, etc 16/4/2013 ARW2013

16/4/2013 ARW2013