Reliability Analysis of the LHC Beam Dumping System Taking Into Account the Operational Experience during LHC Run 1 Roberto Filippini CERN ATS Seminar,

Slides:



Advertisements
Similar presentations
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Advertisements

Etienne CARLIER AB/BT/EC
PURPOSE OF DFMEA (DESIGN FAILURE MODE EFFECTS ANALYSIS)
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
1 MKBH Erratic Jan Uythoven On behalf of the ABT team (Viliam, Nicolas M., Etienne, Laurent, Francesco etc…..) MPP meeting 8/5/2015.
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
Failure mode impact studies and LV system commissioning tests
Beam Dumping System – Failure Scenarios Brennan Goddard, CERN AB/BT How the dump system can fail Catalogue of primary failures Failure classes and protection.
Technical review on UPS power distribution of the LHC Beam Dumping System (LBDS) Anastasia PATSOULI TE-ABT-EC Proposals for LBDS Powering Improvement 1.
LBDS Kicker Electronic and Slow Control Etienne CARLIER AB/BT/EC.
LHC Beam Dump System Technical Audit Trigger Synchronisation Unit.
BIW May 2004 LHCSILSystemsBLMSSoftwareResults Reliability of BLMS for the LHC. G.Guaglio, B Dehning, C. Santoni 1/15 Reliability of Beam Loss Monitors.
An Example Use Case Scenario
1 LBDS Testing Before Operation Jan Uythoven (AB/BT) Based on the work of many people in the KSL, EC and TL sections.
SPS policy – Information Presentation Presentation to ROS June 16, 2004.
Introduction to availability modelling in ELMAS Arto Niemi.
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan The current qualification and verification.
LBDS overview on system analysis and design upgrades during LS1 Roberto Filippini, Etienne Carlier, Nicolas Magnin, Jan Uythoven CERN Workshop Machine.
MKI Erratics: Hardware and Electronics Related Aspects M.J. Barnes Acknowledgements: A.Antoine, R.A. Barlow, P. Burkel, E. Carlier, N. Magnin, V. Mertens.
LHC Collimation Project Integration into the control system Michel Jonker External Review of the LHC Collimation Project 1 July 2004.
Etienne CARLIER, LBDS Audit, 28/01/2008 LBDS Environmental Aspects EMC, radiation, UPS… Etienne CARLIER AB/BT/EC.
PLC Workshop at ITER, 4-5 th of December 2014 A. Nordt, ESS, Lund/Sweden.
Idaho RISE System Reliability and Designing to Reduce Failure ENGR Sept 2005.
Nov 28, 2013 Power Converters Availability for post-LS1 LHC TE-EPC-CCE.
1 Beam Dumping System MPP review 12/06/2015 Jan Uythoven for the ABT team.
The LBDS trigger and re-trigger schemes Technical Review on UPS power distribution of the LHC Beam Dumping System (LBDS) A. Antoine.
1 Reliability and Availability of the Large Hadron Collider (LHC) MachineProtection System Jan Uythoven CERN, Geneva, Switzerland Thanks to R. Schmidt,
1 Will We Ever Get The Green Light For Beam Operation? J. Uythoven & R. Filippini For the Reliability Working Group Sub Working Group of the MPWG.
1 CC & MP - CC10 - CERN Crab LHC J. Wenninger CERN Beams Department for the LHC Machine Protection Panel.
The HiLumi LHC Design Study (a sub-system of HL-LHC) is co-funded by the European Commission within the Framework Programme 7 Capacities Specific Programme,
BEAM INSTRUMENTATION GROUP DEPENDABILITY APPROACH CERN, Chamonix 26th January 2016 William Viganò
LHC machine protection close-out 1 Close-out. LHC machine protection close-out 2 Introduction The problem is obvious: –Magnetic field increase only a.
Organization and Implementation of a National Regulatory Program for the Control of Radiation Sources Need for a Regulatory program.
Step 1: Specify a null hypothesis
RF acceleration and transverse damper systems
Outcome of BI.DIS Fast Interlocks Peer Review
Data providers Volume & Type of Analysis Kickers
The TV Beam Observation system - BTV
LBDS TSU & AS-I failure report (Sept. 2016)
2007 IEEE Nuclear Science Symposium (NSS)
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
Performance monitoring framework for the technical infrastructure
S. Roesler (on behalf of DGS-RP)
Hardware & Software Reliability
Dependability Requirements of the LBDS and their Design Implications
The LHC Beam Dumping System
RELIABILITY OF 600 A ENERGY EXTRACTION SYSTEMS
Reliability targets in functional specifications
LHC Beam Dumping System Reliability Run Summary
MKD/MKB Review Meeting Scope and Definition
Software Reliability PPT BY:Dr. R. Mall 7/5/2018.
FMEA of a CLIQ-based protection of D1
LHC Risk Review: Kicker Magnet Reliability
FCC hh dump kicker - switch topologies and impact on beam
Alabede, Collura, Walden, Zimmerman
trigger and other general LBDS electronics
Collimator Control (SEUs & R2E Outlook)
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan Robustness Diagram Trends and Challenges.
LHCCWG Meeting R. Alemany, M. Lamont, S. Page
Techniques for Data Analysis Event Study
BEAM LOSS MONITORS DEPENDABILITY
YOU HAVE REACHED THE FINAL OBJECTIVE OF THE COURSE
Elements of a statistical test Statistical null hypotheses
Machine Protection System Commissioning plans
Will We Ever Get The Green Light For Beam Operation?
Statistical Thinking and Applications
DESIGN OF EXPERIMENTS by R. C. Baker
Interlocking strategy
Operation of Target Safety System (TSS)
Close-out.
Presentation transcript:

Reliability Analysis of the LHC Beam Dumping System Taking Into Account the Operational Experience during LHC Run 1 Roberto Filippini CERN ATS Seminar, 4 th July 2013

Reliability Analysis of the LHC Beam Dumping System Taking Into Account the Operational Experience during LHC Run 1  Outline  Machine protection and LBDS  Background to present state  LBDS statistics  Validation of reliability predictions  Safety  Final recommendations and conclusions 4 July 2013R. Filippini CERN ATS Seminar2

The LHC Beam Dumping System  The LBDS is the final element of the protection chain, it performs the extraction of the beams on demand (dump requests) either at the end of machine fills or because of safety reasons. Two LBDS exist, one per beam. LBDS physical layout – point 6 Internal ILK Control References 10 MKB Interlocks and references Control and supervision Actuation 15 MSD 15 MKD External DR TDE Beam Line Functional layout 350 MJ destructive power 4 July 2013R. Filippini CERN ATS Seminar3

Machine Protection and LBDS  The LHC machine protection system MPS allows operation with the beams only if the LHC is cleared from faults/errors, and it supervises its functioning in order to prevent that a failure may develop into a critical accident. LHC State ILK Beam Operation LBDS SupervisionSafety logic Actuation Is it reliable, safe? 4 July 2013R. Filippini CERN ATS Seminar4

Machine Protection System  The reliability sub-working group of the machine protection system working group was charged to perform the analysis of safety and availability of the most critical systems of the MPS  The scope  All active devices, supervision and interlocking elements including the Beam Loss Monitors, Quench Protection System, Beam Interlocking Systems, Power Interlock System, LBDS. B. Todd, MP Workshop Annecy 2013 Most results confirmed, with a few exceptions 4 July 2013R. Filippini CERN ATS Seminar5 Reliability w.g. 2006

LBDS Reliability Analysis  The scope  Magnets, control and supervision, including the triggering system TSDS, the beam energy tracking BETS, the septa MSD, extraction kickers MKD and dilution kickers MKB. Passive protection elements not in the scope.  The assumptions for the analysis  Operation profile of 10 hours, 400 machine fills, 200 days of operation  Only random faults in system components  Post mortem diagnostics returns the system to an “as good as new” state  The goal of the analysis  Predict the probability of the LBDS to fail unsafely  Predict the number of false beam dumps 4 July 2013R. Filippini CERN ATS Seminar6

LBDS Reliability Analysis (2)  Safety =1.8E-07 per year of operation = largely SIL4  Most critical component was the MKD with a 74% contribution  Number of false beam dumps was 8 +/- 2  The biggest contribution was from the MKD with 5 false beam dumps in total (61%) Magnets 4 July 2013R. Filippini CERN ATS Seminar7 Predictions from theoretical models! %

… 7 years later  3 years of LHC operation ( ) with beam and a significant number of data collected among which system faults and repair/diagnostics interventions  The sources are LHC-OP logbook, and LHC-TE/ABT expert logbook  The scope  MKD, MKB and the related control and supervision electronics plus post mortem diagnostics and auxiliary systems on which the LBDS depends  The goals  Produce time series and general statistics  Estimate safety and number of false beam dumps  Validate the reliability prediction models  Discover unforeseen failure mechanisms and scenarios… 4 July 2013R. Filippini CERN ATS Seminar8

Time series Jan 2010Dec 2012 Jan 2010 Dec July 2013R. Filippini CERN ATS Seminar9 Anomalies 1 Vacuum and BEM Anybus® 2 Vacuum and diagnostics 3 SCSS Asibus® Statistics per month

LBDS failure distribution vs. functions  139 failure events recorded of which 90 in the LBDS  Actuation (MKD, MKB) is the largest contributor (60%) July 2013R. Filippini CERN ATS Seminar10

LBDS false dumps vs. machine phase  A total of 97 events during triggered a false dump (with or without the beam) of which 66 from the LBDS, i.e. 73% of the total  The most important contributor is the actuation (MKD, MKB) 66 4 July 2013R. Filippini CERN ATS Seminar11

Actual availability  Assumptions  Only LBDS false beam dumps in the phases injection and stable beam are considered  No repetition of the same internal dump request, i.e. occurrence of the same event (e.g. inaccurate diagnostics) after a short interval => 5 false dumps not considered.  Results  The LBDS counted 29 false beam dumps, against the 24 (on average) foreseen.  The most important contributor is the actuation (15) then surveillance (12) and control (2) Decreasing trend 4 July 2013R. Filippini CERN ATS Seminar12 False beam dumps

The statistical analysis framework Raw data IN PHASE 1 – Censoring data Predictions Reliability models OUT PHASE 2 – Statistics and validation 4 July 2013R. Filippini CERN ATS Seminar13

Reliability prediction: Actuation function Failure mode and identifier # components Time to Failure Hypothesis test 4 July 2013R. Filippini CERN ATS Seminar14 Validation most conservative value is kept Time to recovery

Control and surveillance functions 4 July 2013R. Filippini CERN ATS Seminar15 Not validated

Failure on demand Average failure rate 4 July 2013R. Filippini CERN ATS Seminar16

Failure Dependency 4 July 2013R. Filippini CERN ATS Seminar17

Hypothesis test 4 July 2013R. Filippini CERN ATS Seminar18

Results at a glance  2518 LBDS components exposed to failures during resulted in 90 failure events, distributed in 29 different failure modes…  …but 70 failure modes never occurred Hypothesis test always true with the exception of the PTM power supply that was expected to fail The most conservative TTF was taken 4 July 2013R. Filippini CERN ATS Seminar19

What about safety?  All beams were safely dumped at every beam dump request for LBDS during => this is a necessary but not sufficient condition to state that the system is safe, e.g. SIL3 at least  How to get a sufficient condition for safety without a piece of evidence? 1. Deductive reasoning => as availability and safety were estimated from the same model, and availability agreed with observations, then also safety will. Based on intuition, no statistics proof 2. Inductive reasoning => the failures may have moved the system very close to the accident…  …How close? a metric for the safety distance is needed 4 July 2013R. Filippini CERN ATS Seminar20

UNSAFE Degraded NON acceptable Degraded acceptable Safety margins  From the initial nominal state the LBDS may tolerate a number of failures, some of which are detected and cause the anticipated beam dump  The safety principle => never operate at a single point of failure Nominal state Zero safety margin near miss/single point of failure 1 safety margin Min safety requirement 2 safety margin Min safety requirement 4 July 2013R. Filippini CERN ATS Seminar21

LBDS and safety by design  The behavior at failure of the LBDS is conceived in order to …  Tolerate faults by redundancy => fault masking  Prevent faults by surveillance => failsafe Dump request Primary Synchronized trigger TSU-a TSU-b TFO-a TFO-b RT-b RT-a Back-up a Asynchronous re-trigger TSDS (simplified) Back-up b Asynchronous re-trigger Beam dump 4 July 2013R. Filippini CERN ATS Seminar22

Example: TSDS and safety distance  Simplified state transition diagram of the TSDS  Some failure events may be detected and trigger a false dump UNSAFE TSU-b fails TSU-a fails Safety distance = number of failure events left Initial state 4 July 2013R. Filippini CERN ATS Seminar23 TFO-a False dump TFO-b False dump TSU-a fails RT-b False dump

The safety gauge The nominal dump has to occur with the LBDS as good as new The false beam dump is justified, when the tolerance margins are exceeded  LBDS and the safety policy  The trade-off between safety and availability can be translated into an ideal safety distance at a dump request for the LBDS 4 July 2013R. Filippini CERN ATS Seminar24

Safety distance versus functions Control function (TSDS) is the closest to the safety margins Surveillance function is unbalanced against availability (unnecessary over-protection) 4 July 2013R. Filippini CERN ATS Seminar25 Remark: Every failure event is assigned a safety distance

LBDS safety gauge Ideal behaviour Poor safety margins Over-protected at detriment of availability 4 July 2013R. Filippini CERN ATS Seminar26 Ideal behaviour

LBDS safety gauge (2)  The trend unveils a loss of safety margins in 2011, and a recover in 2012, almost back to the initial level of July 2013R. Filippini CERN ATS Seminar27 Safety distance Most failures at safety margin 0

Actual safety (1)  Problem statement  Given the average safety distance at failures for each LBDS function, over the period , the objective is to calculate the maximum component failure rate below which LBDS is SIL3, for 300 days per year (total = h) with an average machine fill of 10 hours  Data…  P E = probability of the initiating events (90/21600)  N = 1674 number of components at failures in the LBDS  s = safety distance  d = detection rate; 0.73 for LBDS, 0.6 (actuation), 0.87 (control) 0.96(surveillance) safety distance s Initiating event P E 4 July 2013R. Filippini CERN ATS Seminar28 d Unsafe Failsafe d

Actual safety (2) residual safety margin Initiating event SIL3 bounds Continuous Poisson CDF SIL3 = 1xE-07/h 4 July 2013R. Filippini CERN ATS Seminar29 Probability of exceeding the safety margin s The failure rate threshold

Actual safety (3)  Actuation, control, and surveillance functions meet the safety requirements individually and together as LBDS  Example: LBDS SIL3 bound is 2.5 E-05/h - the highest rate is from the TSDS VME crate power supply failure =1.9 E-05/h with all other components being more reliable. SIL3 bound highest failure rate 1.9 E-05/h close to SIL4 bound 4 July 2013R. Filippini CERN ATS Seminar30 SIL4 bound

Actual safety (3)  Extreme outcomes and singularities  failure events that moved the LBDS to a potentially unsafe state, or close to it (near miss) before this was discovered.  1 erratic trigger of 2 MKDs over three years, from 30 independent TFO outputs  The maximum failure rate threshold in order to be SIL3 at least is 7.2 E- 05/h which is met.  2 failure at zero safety margin (detected) in the actuation and control functions, in 3 years  The maximum failure rate threshold for the control is 7.8 E-05. and the one for the actuation is 1.1 E-03, which are both SIL3 at least. 4 July 2013R. Filippini CERN ATS Seminar31

Wrap up and conclusions  This presentation provided a complete and very diverse overview on the behavior of the LBDS at failures during years  The added values are…  The LBDS statistics  Design insights  The methodologies of analysis, validation of reliability prediction models and statistical inference of availability and safety  Summary of results and recommendations follow… 4 July 2013R. Filippini CERN ATS Seminar32

Results to retain  Component statistics  139 failure events in 3 years of which 90 in the LBDS, distributed in 29 different failure modes (3 out of scope) of which 19 confirmed and 7 new failure modes  Availability and safety  29 false beam dumps in 3 years, in agreement with the 24 predicted in 2006  Safety is SIL3 at least, in agreement with 2006 predictions  Remarks…  A positive trend along the years is discovered in the number of false beam dumps, and margin of safety. 4 July 2013R. Filippini CERN ATS Seminar33

Recommendations (1)  Further investigations on failure mechanisms  Common Cause Failure suspected in a few components such as the failure of three High Voltage power supplies in the MKD generators, two Triggering Units not responding, and the spurious firing of two Trigger Fan Out units. Further analysis on CCF and consequences on reliability is recommended.  Availability concerns  7 false beam dumps are from the vacuum  12 failures from post mortem and diagnostics => cause of delays in re-arming  Diagnostics was not always accurate, faults fixed after several interventions  Some functions might be over-protected, e.g. LBDS surveillance  Safety concerns  SIL3 is largely met for LBDS, SIL4 possible but further analysis is recommended  The control functions of the LBDS (TSDS) is estimated to have the smallest safety margin.  HW changes during LS1 in TSDS (controls) and powering.  A reliability analysis of the TSDS with the applied changes is under preparation. 4 July 2013R. Filippini CERN ATS Seminar34

Recommendations (2)  Data quality  Good and large quantity, but inconsistencies existed as well as non- homogeneities in the data reporting, time stamps, consequences from diagnostics and intervention  Improvements during the years should be consolidated by the definition of standard procedures of data reporting and tools for the automatic information retrieval  Product assurance  Several components did not meet the reliability specification because of design flaws, and were returned to the manufacturer (e.g. Asibus®, Power trigger power supply).  Other issues  Maintenance, and diagnostics had a relevant impact on operation  A number of faults/errors are procedural (human factor) and should be taken into account for a more detailed analysis 4 July 2013R. Filippini CERN ATS Seminar35

Conclusions  Where we were…  Reliability and safety prediction from theoretical models => good guesses but lack of evidence  Where we are now  The statistical analysis of the LBDS based on the LHC operations proved that the reliability predictions are in good agreement with the observations.  Where we may go…  The positive trends over the years show that an infant mortality period is over, and statistics are improving => more stable system, safer and better exploitable  and the lesson learnt from this study…  A periodical review of reliability of machine protection elements is recommended and should be integrated in the system life-cycle  Tools must be developed in order to collect data and infer statistical trends  An excerpt from this work will be presented at the conference ICALEPCS 2013  R. Filippini, E. Carlier, N. Magnin, J. Uythoven “Reliability Analysis of the LHC Beam Dumping System Taking Into Account the Operational Experience During LHC Run 1”, 5-11/ July 2013R. Filippini CERN ATS Seminar36

Acknowledgements My special thanks to Jan Uythoven, Etienne Carlier and Nicolas Magnin for their support during the preparation of this work … and all friends and colleagues with whom I had chats and discussions, and who made my second experience at CERN particularly enjoyable and gratifying. 4 July 2013R. Filippini CERN ATS Seminar37

…question time Contact Roberto Filippini 4 July 2013R. Filippini CERN ATS Seminar38

Safety: SIL3,SIL4 graphical tests 4 July 2013R. Filippini CERN ATS Seminar39 Failure rate/h Probability T=10 SIL4 bound SIL3 bound surveillance control actuation LBDS