MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

Slides:



Advertisements
Similar presentations
Prognostics-Informed Diagnostic Analysis DSI International June, 2011.
Advertisements

Tom Powers Practical Aspects of SRF Cavity Testing and Operations SRF Workshop 2011 Tutorial Session.
Reliability Engineering (Rekayasa Keandalan)
Reliability Analysis SNS Linac Reliability Model (MAX Task 4.2) MYRRHA Accelerator 1 st International Design Review Brussels, November Adrian.
Operations and Availability GG3. Key decisions Summary of Key Decisions for the Baseline Design The linac will have two parallel tunnels so that the support.
SMJ 4812 Project Mgmt and Maintenance Eng.
James Ngeru Industrial and System Engineering
Reliable System Design 2011 by: Amir M. Rahmani
Reliability of Systems
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Software Testing and QA Theory and Practice (Chapter 15: Software Reliability) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
What is Fault Tree Analysis?
System Reliability. Random State Variables System Reliability/Availability.
1 Product Reliability Chris Nabavi BSc SMIEEE © 2006 PCE Systems Ltd.
Software Project Management
Reliability Modeling of an ADS Accelerator SNS-ORNL/Myrrha Linac (MAX project) EuCARD 2, GENEVA (20-21 March 2014 ) CERN.
1 Reliability Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS.
-Exponential Distribution -Weibull Distribution
System Testing There are several steps in testing the system: –Function testing –Performance testing –Acceptance testing –Installation testing.
Relex Reliability Software “the intuitive solution
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 15 Software Reliability
Transition of Component States N F Component fails Component is repaired Failed state continues Normal state continues.
Beam tolerance to RF faults & consequences on RF specifications Frédéric Bouly MAX 1 st Design Review WP1 - Task 1.2 Bruxelles, Belgium Monday, 12 th November.
ERT 312 SAFETY & LOSS PREVENTION IN BIOPROCESS RISK ASSESSMENT Prepared by: Miss Hairul Nazirah Abdul Halim.
Background on Reliability and Availability Slides prepared by Wayne D. Grover and Matthieu Clouqueur TRLabs & University of Alberta © Wayne D. Grover 2002,
Software Reliability SEG3202 N. El Kadri.
Chapter 2: Non functional Attributes.  It infrastructure provides services to applications  Many of these services can be defined as functions such.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
ERT 322 SAFETY AND LOSS PREVENTION RISK ASSESSMENT
Frankfurt (Germany), 6-9 June 2011 EL-HADIDY – EG – S5 – 0690 Mohamed EL-HADIDY Dalal HELMI Egyptian Electricity Transmission Company Egypt EXAMPLES OF.
FAULT TREE ANALYSIS (FTA). QUANTITATIVE RISK ANALYSIS Some of the commonly used quantitative risk assessment methods are; 1.Fault tree analysis (FTA)
Part 7 – Common-Mode Failures
Stracener_EMIS 7305/5305_Spr08_ System Reliability Analysis - Concepts and Metrics Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Maintenance Policies Corrective maintenance: It is usually referred to as repair. Its purpose is to bring the component back to functioning state as soon.
Systems Analysis and Design in a Changing World, Fourth Edition
An Application of Probability to
1 Component reliability Jørn Vatn. 2 The state of a component is either “up” or “down” T 1, T 2 and T 3 are ”Uptimes” D 1 and D 2 are “Downtimes”
Idaho RISE System Reliability and Designing to Reduce Failure ENGR Sept 2005.
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
Simulation results for powering serial connected magnets Daniel Siemaszko, Serge Pittet OUTLINE : Serial configuration of full rated converters.
Stracener_EMIS 7305/5305_Spr08_ Systems Reliability Modeling & Analysis Series and Active Parallel Configurations Dr. Jerrell T. Stracener, SAE.
Unit-3 Reliability concepts Presented by N.Vigneshwari.
Stracener_EMIS 7305/5305_Spr08_ Systems Availability Modeling & Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7305/5305.
1 CTF3 CLEX day July 2006 CLEX day 2006 Introduction G.Geschonke CERN.
ATC / ABOC 23 January 2008SESSION 6 / MTTR and Spare Parts AB / RF GROUP MTTR, SPARE PARTS AND STAND-BY POLICY FOR RF EQUIPMENTS C. Rossi on behalf of.
Failure Modes and Effects Analysis (FMEA)
CS203 – Advanced Computer Architecture Dependability & Reliability.
Failure Modes, Effects and Criticality Analysis
Physics Department Lancaster University Physics Department Lancaster University Reliability Rebecca Seviour Cockcroft Institute Dept Physics Lancaster.
Tailoring the ESS Reliability and Availability needs to satisfy the users Enric Bargalló WAO October 27, 2014.
Prof. Enrico Zio Availability of Systems Prof. Enrico Zio Politecnico di Milano Dipartimento di Energia.
Modeling SNS Availability Using BlockSim Geoffrey Milanovich Spallation Neutron Source.
PRODUCT RELIABILITY ASPECT RELIABILITY ENGG COVERS:- RELIABILITY MAINTAINABILITY AVAILABILITY.
More on Exponential Distribution, Hypo exponential distribution
LOG 211 Supportability Analysis “Reliability 101”
Software Metrics and Reliability
Most people will have some concept of what reliability is from everyday life, for example, people may discuss how reliable their washing machine has been.
MPE-PE Section Meeting
Accelerator Reliability requirements for ADS: the MYRRHA project goals
Fault-Tolerant Computing Systems #5 Reliability and Availability2
Software Reliability PPT BY:Dr. R. Mall 7/5/2018.
Update on Linac4 Status, Reliability Run and Modelling
Reliability.
T305: Digital Communications
THE MANAGEMENT AND CONTROL OF QUALITY, 5e, © 2002 South-Western/Thomson Learning TM 1 Chapter 13 Reliability.
Failure Mode and Effect Analysis
RELIABILITY Reliability is -
Definitions Cumulative time to failure (T): Mean life:
Presentation transcript:

MAX (MYRRHA Accelerator eXperiment) – MAX School - Reliability Basics with Focus on the MYRRHA Linac Case Adrian Pitigoi – EA (Spain)

A. Reliability Basics – Concepts & Definitions B. Common techniques in Reliability Analysis C. Modeling High-power Accelerators Reliability - SNS Linac case (SNS-ORNL) - Myrrha Linac (MAX project)

A. Reliability Basics – Concepts & Definitions Reliability / Unreliability Reliability Analysis – objectives:  Evaluate Failure rate of components and overall system reliability  Evaluate Design feasibility, compare design alternatives,  Identify potential failure areas and track reliability improvement.  Failure -The change functioning - failed state  Repair - Change from a failure to a functioning Repairing - bring the component /system back to an “as good as new” condition. For a repairable system, the cycle continues repeatedly with the repair-to failure and the failure-to-repair process.  Reliability, R(t) - probability that the component/system experiences no failures during the time interval 0 - t1 (new condition /functioning at t0).  Unreliability, F(t) - probability that the component /system experiences the first failure or has failed one or more times during the interval 0 - t, (operating or repaired to a like new condition at t0). The numerical values of both reliability and unreliability are expressed as a probability from 0 to 1. R(t) + F(t) = 1 Unreliability F(t) = 1 – R(t) Availability / Unavailability  Availability, A(t) - probability that the component or system is operating at time t, given that it was operating at time zero.  Unavailability, Q(t), - probability that the component or system is not operating at time t, given that is was operating at time zero. Therefore, the following relationship holds true since a component or system must be either operating or not operating at any time: Unavailability Q(t) ≤ Unreliability F(t) – (rep) A(t) + Q(t) = 1)

A. Reliability Basics – Concepts & Definitions Failure Rates  Conditional Failure Rate or Failure Intensity, λ(t) - anticipated number of times an item will fail in a specified time period, (good as new at t0 and functioning at time t).  It is a calculated value that provides a measure of reliability for a product. This value is normally expressed as failures per million hours (fpmh or 10 6 hours) Basic categories of failure rates: Mean time between failures (MTBF) - basic measure of reliability for repairable items = time passed before a component, assembly, or system fails, under the condition of a constant failure rate / expected value of time between two consecutive failures, for repairable systems. It is a commonly used variable in reliability and maintainability analyses. Ex: a component with a failure rate of 2 failures/10 6 h Is expected to fail 2 times in a million-hour time period. MTBF= 1/λ, λ=ct. Mean time to repair (MTTR) - total amount of time spent performing all corrective or preventative maintenance repairs divided by the total number of those repairs. It is the expected span of time from a failure (or shut down) to the repair or maintenance completion. This term is typically only used with repairable systems. Mean time to failure (MTTF) (non-repairable systems)

A. Reliability Basics – Concepts & Definitions Failure Frequencies Failure Density f (t) of a comp./system - probability per unit time that the component or system experiences its first failure at time t, given that the comp./ system was operating at time zero.) Failure Rate r(t) of a component or system, r(t) - probability per unit time that the component or system experiences a failure at time t, (operating at time zero and survived to time t). Conditional Failure Intensity (Conditional Failure Rate) λ (t) - probability per unit time that the component or system experiences a failure at time t, (operating, or was repaired to be as good as new, at time zero and operating at time t). Unconditional Failure Intensity or Failure Frequency ω(t) - probability per unit time that the component or system experiences a failure at time t, (operating at time zero). Relationships Between Failure Parameters r(t), λ(t) Difference: failure rate definition addresses the first failure of the component or system rather than any failure of the component or system CFI-λ(t), ω(t) Difference: the CFI has an additional condition that the component or system has survived to time t. For most reliability and availability studies the unavailability Q(t) of components and systems is very much less than 1. In such cases.

A. Reliability Basics – Concepts & Definitions Constant failure rates If the failure rate - constant then the following expressions apply: A constant failure rate results in an exponential failure density distribution. Repairable and Non-repairable Items Non-repairable items Components or systems as light bulb, transistor, rocket motor, etc. Reliability - survival probability over the items expected life / a specific period of time during its life, when only one failure can occur. The instantaneous probability of the first and only failure is called the hazard rate or failure rate, r(t). Life values such as MTTF -used to define non-repairable items. Non-repairable items Reliability is the probability that failure will not occur in the time period of interest; when more than one failure can occur, reliability can be expressed as the failure rate, λ. Reliability can be characterized by MTBF, but only under the condition of constant failure rate. Availability, A(t), is affected by the rate of occurrence of failures (failure rate, λ) or MTBF plus maintenance time. A(t) is the probability that an item is in an operable state at any time. Maintenance can be corrective (repair) or preventive (reducing the likelihood of failure)

A. Reliability Basics – Concepts & Definitions Redundancy Active Redundancy - Active Standby/Hot Standby All items operating simultaneously in parallel. No change in the failure rate of the surviving item after the failure of a companion item. Standby Redundancy Alternate items are activated upon failure of the first item. Only one item is operating at a time to accomplish the function. Warm Standby Normally active or operational, but not under load. Failure rate will be less due to lower stress. Cold Standby (Passive) Normally not operating. Failure of an item forces standby item to start operating. k-out-of-n Systems Redundant system of n items in which k of the n items must function for the system to function (voting decision). Existence of two or more means, not necessarily identical, for accomplishing a given single function. Active, Standby and Passive Redundancy function. Redundant components can be fully activated (active), partially activated (standby) or switched off completely (passive). A mix of the above activity levels is also possible. Certain Failure modes of one component (short-circuit, major leakeage,etc.) could lead to system failure.

B. Common techniques in Reliability Analysis Reliability block diagrams Advantage: ease of reliability expression and evaluation (common system rel. analysis tool- mission success oriented) A reliability block diagram shows the system reliability structure. It is made up of individual blocks and each block corresponds to a system module or function. The blocks in either series or parallel structure can be merged into a new block with the reliability expression of the equations a), b).

B. Common techniques in Reliability Analysis Reliability block diagrams Five parallel-series connected modules The merged blocks k-out-of-n configuration

B. Common techniques in Reliability Analysis Fault Tree Analysis Common tool in system safety analysis. It has been adapted in a range of reliability applications - mission fail oriented. A fault tree diagram is the underlying graphical model in fault tree analysis. The fault tree shows which combinations of the component failures will result in a system failure; it represents the logical relationships of ‘AND’ and ‘OR’ among diverse failure events. The status of output/top event can be derived by the status of input events and the connections of the logical gates. Fault tree for five modules A fault tree diagram can describe the fault propagation in a system

C. Modeling High-power Accelerators Reliability - SNS Linac case (SNS-ORNL) - Myrrha Linac (MAX project)

1. SNS Linac Modeling  Objective - Feedback on actual SNS reliability performance, in order to develop a reliability modeling tool for MAX project  Activities:  Selection of the accelerator to be used for modeling (SNS)  SNS Design & Reliability data collection  Development of SNS Linac RS reliability model  Performing reliability analysis of SNS Linac systems,  Targets:  Evaluate the SNS Linac model (model results vs. SNS operational data)  Conclusions and recommendations on optimization, increasing reliability. Layout of the SNS Linac

2. SNS Model - INPUT DATA  SNS Design Data  SNS main/auxiliary systems  Number of components (by type) Data Sources: SNS RAMI Static Model; SNS BlockSim model (Reliasoft)  SNS Systems and Functions  SNS Parameters  Systems and components  System functions & interfaces Data Sources: SNS website ( SNS Parameters (doc no. SNS PL001R13) ( SNS Design Control Documents (DCD) SNS BlockSim Model  SNS Reliability Data  Number of components (by type)  Degree of redundancy  Failure data: λ=1/MTTF; MTTR (λ – Failure rate; MTTF-Main Time To Failure; MTTR-Main Time To Repair) Data Sources: RAMI Static Model; SNS BlockSim model  SNS Operating Status  Component failures - cause, type of component, time to repair, etc.  Availability data (component failures causing accelerator trips: cause, component and system concerned, duration of trip) Data Sources: SNS Operation Data collection (

3. Modeling Methodology  General Assumptions  SNS systems/components not modeled – Ring - RTBT, stripper foil, etc. (considered as not relevant for Max project purposes)  Risk Spectrum Type 1 – Repairable components reliability model (continuously monitored) – Type 1 reliability model - modeling all SNS Linac components - Failure/Repair processes – exponential distributions; failure/repair rates ct. - It is assumed q=0 λ=1/MTTF -failure rate); µ=1/MTTR -repair rate (MTTF;MTTR data – BlockSim Model data)  ¨Mean Unavailability¨ type of calculation is used to obtain the unavailability values for the basic events: Q=λ/(λ+µ) (the long-term average unavailability Q was calculated for each basic event)

 SNS Module 1- first modeling step: RFQ + MEBT + DTL  Gradual development of the SNS Linac model  In-depth understanding of the SNS design and functioning for an accurate model. 4. SNS Reliability Model - Fault Tree Model

 SNS Fault Tree (complete model) - graphical representation of the SNS systems functional structure describing undesired events (“ system failures") and their causes. 4. SNS Reliability Model - Fault Tree Model  The Fault tree – logical gates and basic events.  A fault tree - subdivided between several fault tree pages (bound together using transfer gates).

4. Modeling the SNS Linac  SNS Linac Fault Tree Structure - Main levels of the fault trees - major parts of the SNS accelerator (Ion Source, LEBT, RFQ, MEBT, DTL-CCL-SCL, HEBT, CONV - auxiliary systems)

4. Modeling the SNS Linac  DTL RF Fault Tree Structure

4. Modeling the SNS Linac  CCL Transmitter Fault Tree Structure

5. SNS Systems - Reliability Analysis Results  Analysis Case – Results Q = 2.60E-01 = 0.26; Q = 26 % A = 1 - Q = 73 % (the limit Availability – Mean Availability) Minimal Cut-sets (MCS) MCS Contribution

5. SNS Systems - Reliability Analysis Results  Analysis Case – Results Q = 2.60E-01 = 0.26; Q = 26 % A = 1 - Q = 73 % (the limit Availability – Mean Availability) Minimal Cut-sets (MCS)

5. SNS Systems - Reliability Analysis Results  Analysis Case – Results Q = 2.60E-01 = 0.26; Q = 26 % A = 1 - Q = 73 % (the limit Availability – Mean Availability)  MCS Analysis has been performed for the SNS Linac complete model (SNS ACC DOWN), or different parts (SCL, etc.) of the accelerator, with the following conclusions :  Results - wide range of failure modes for comps/systems (wide failures dispersion)  The Linac, (DTL-CCL-SCL) represents the most concerned part (Q=1.25E-01; A=87.5%)  The higher values of Unavailability: SCL (Q=9.85E-02; A=90%) DGN&C (Q=7.15E-02; A=93%) Front-End (Q=6.93E-02; A=93%)  The most affected part of the SCL is the SCL RF system: Q=6.33E-02; A=94% (primarily due to power supplies failures and klystron failures, but also to cooling and vacuum malfunctions)  The most affected parts of the Front-End are the LEBT (Q=2.83E-02; A=97%) and MEBT (Q= 2.82E-02; A=97%), more specifically the magnets the vacuum systems

5. SNS Logbook Data – Accelerator trip failures

SNS Reliability graphics (Logbook Availability and failure data) SNS Outages (Jan-Feb, June 2012) Accelerator trip failures frequency (by system) Accelerator downtime contribution (by system) Availability (Oct June 2012)  RF system and electrical system failures - the most frequent;  Electrical systems failures - the most important contribution to total accelerator downtime (in consonance with the conclusions from the SNS RS Model runs)

5. SNS Logbook Data – Accelerator trip failures The most affected subsystems of the SNS Linac (failures leading to accelerator trips):  SCL-HPRF (Superconducting Linac - High Power Radiofrequency)- (short failures frequency)  HVCM (High Voltage Converter Modulator (duration of trips) (in accordance with the SCL RS analysis) Electrical subsystems contribution to the acc. downtime RF System failures (no. & duration-hours)

5. SNS Reliability modeling – Model evaluation  SNS Reliability considerations (from past operation experience )  The reliability of input data mix used (RAMI static model, BlockSim model) - sources - data from staff Engineers, manufacturers (e.g. Titan, Varian, Maxwel), design reviews, etc.  A reliability program has been implemented at SNS, reaching significant increase of the reliability of SNS installations in the past few years.  SNS RS Model Limitations  SNS reliability data (MTTF; MTTR) - SNS data mix  The reliability improvement program - not quantified/represented in the RS model.  The LEBT and DGN&C modules - relatively less developed (lack of detailed information)  Considering the reliability database used for quantifying, and the fact that the last years reliability improvements have not been included in the model, it can be affirmed that the overall availability of the SNS Linac (A=73%) resulting from RS model is confirmed by the availability figures of the SNS from the first years of SNS operation Accelerator reliability Workshop in Cape Town, South Africa in April 2011 (G.Dodson talk)  The availability results obtained by MCS analysis run separately for the different SNS Linac parts (IS, RFQ, MEBT, DTL, CCL, SCL, HEBT) have matched up very well with the SNS Logbook Availability records, although the global result is A=73%. This is attributable to the fact that the MTTF and MTTR values used for model quantification may be too conservative and other constraints above.

6. Conclusions  The reliability results show that the most affected SNS Linac parts/systems are:  SCL, Front-End systems (IS, LEBT, MEBT), Diagnostics & Controls  RF systems (especially the SCL RF system)  Power Supplies and PS Controllers These results are in line with the records in the SNS Logbook  Reliability issue that most needs to be enforced in the linac design is the redundancy of the systems, subsystems and components most affected by failures  Need for intelligent fail-over redundancy implementation in controllers, for compensation purposes  Enough diagnostics have to be implemented to allow reliable functioning of the redundant solutions and to ensure the compensation function.

7. MAX Task 4.4 – Myrrha linac Reliability model  Overall approach  Fault Tree, based on SNS model + Max design  Basic Events: Component / Function failures  Undeveloped Events/Systems: Reliability targets  Reliability model: Availability / Failure frequency (Linac shutdown)  Reliability Analysis: Design Optimization Design & reliability data base  Data Source: SNS, Max team, suppliers, conservative assumptions / reliability targets  Support systems – gen. level

7. MAX Task 4.4 – Myrrha linac Reliability (MTBF > 250 h) Reliability challenges:  Injector Switch reliability and duration Conditions: High-reliability of injectors, reduced MTTR and possibility to perform maintenance without stopping the beam. Injector Switch sequence:  fault detection and first action of MPS  few beam restart tries (w/ short pulses) by the MPS and the fault confirmation  fault full diagnostic and acknowledgement by the control system  dipole magnets switch  fast beam commissioning before reaching nominal beam Reliability analysis objective: to determine the relation MTTR- MTBF in configuration of 2 injectors, 1 operational and 1 hot standby

7. MAX Task 4.4 – Myrrha linac Reliability (MTBF > 250 h) Reliability challenges:  Fault tolerance/compensation function (linac fault-recovery system) Faults compensation- special conditions for the detuning system (CTS piezo detuning of the failed cavities) - higher failure rate should be considered (lower MTBF) Fault detection + Compensation sequence:  Recovery Data processing - Linac Control System defining new set-points (load or calculate)  RF fields updating in the corrective cavities (by CCSs)  CCS (LLRF loop + CTS)  fast beam commissioning before reaching nominal beam

7. MAX Task 4.4 – Next steps  Development of the Myrrha Linac Reliability model, based on the SNS RS Model and considering the SNS reliability analysis results and conclusions.  Iterative process – Myrrha Linac Model to be updated during design work  Myrrha linac Risk Spectrum fault tree - currently under development  Reliability analysis to be performed, with due consideration of reliability challenges  Special attention - design of Diagnostics and Control systems (advanced)

Thank you