Balancing machine availability and preventive maintenance for HL-LHC

Slides:



Advertisements
Similar presentations
Total Productive Maintenance
Advertisements

Unit III Module 6 - Developing Age Exploration Tasks
K. Potter RADWG & RADMON Workshop 1 Dec WELCOME TO THE 4th RADWG & RADMON WORKSHOP 1 December 2004.
Preventive Maintenance
Module 3 UNIT I " Copyright 2002, Information Spectrum, Inc. All Rights Reserved." INTRODUCTION TO RCM RCM TERMINOLOGY AND CONCEPTS.
LHC UPS Systems and Configurations: Changes during the LS1 V. Chareyre / EN-EL LHC Beam Operation Committee 11 February 2014 EDMS No /02/2014.
LHC Beam Screen Heaters: System overview, consolidation needs and project planning 4 February TE-CRG/JCC (Beam Screen Heater Review) 1.
Total Productive Maintenance (TPM)
WAO, Trieste, 24th - 28th September 2007 An Overview of the Reliability and the Fault Trends of the SRS Cheryl Hodgkinson Daresbury Laboratory Synchrotron.
CV activities on LHC complex during the long shutdown Serge Deleval Thanks to M. Nonis, Y. Body, G. Peon, S. Moccia, M. Obrecht Chamonix 2011.
1v1 Availability Tracking as a Means to Increase LHC Physics Production B. Todd 1, A. Apollonio 1 and L. Ponce 1 1 CERN – European Organisation for Nuclear.
Presentation on Preventive Maintenance
Chamonix Risks due to UPS malfunctioning Impact on the Superconducting Circuit Protection System Hugues Thiesen Acknowledgments:K. Dahlerup-Petersen,
Introduction to availability modelling in ELMAS Arto Niemi.
How can we reduce the “no beam„ time? Walter Venturini Delsolaro D. Arnoult, S. Claudet, G. Cumer, K. Dahlerup Petersen, R. Denz, F. Duval, V. Montabonnet,
Thickness of the Kamaboko Tunnel Shield Wall under Different Assumptions Ewan Paterson Technical Board June 23,
TRACKING OF FAULTS AND FOLLOW-UP Accelerator Fault Tracking project Jakub Janczyk (TE-MPE-PE / BE-CO-DS) with input from: Andrea Apollonio, Chris Roderick,
Review of the operation scenarios and required manning of the activities P. Schnizer and L. Serio.
Long shutdown 1 LHC Machine Status Report K. Foraz June 12 th, 2013.
Hardware Commissioning  Preparation Documentation MTF Programme  Status The Review The commissioning activity in Resources  Outlook The new.
LHC-CC Validity Requirements & Tests LHC Crab Cavity Mini Workshop at CERN; 21. August Remarks on using the LHC as a test bed for R&D equipment.
Maintenance Management [14]
Modern Maintenance. Management
CERN Availability Working Group & Accelerator Fault Tracker Availability Working Group & Accelerator Fault Tracker - Where do we.
Quench Detection System R. Denz TE-MPE-EP on behalf of the QPS team.
1v2 LHC Availability Tracking: Past and Future B. Todd, A. Apollonio, L. Ponce on behalf of the LHC AWG.
Conclusions on UPS powering test and procedure I. Romera Acknowledgements: V. Chareyre, M. Zerlauth 86 th MPP meeting –
ATC / ABOC 23 January 2008SESSION 6 / MTTR and Spare Parts AB / RF GROUP MTTR, SPARE PARTS AND STAND-BY POLICY FOR RF EQUIPMENTS C. Rossi on behalf of.
TE-CRG Activities D. Delikaris, TE-CRG.
BEAM INSTRUMENTATION GROUP DEPENDABILITY APPROACH CERN, Chamonix 26th January 2016 William Viganò
LHC machine protection close-out 1 Close-out. LHC machine protection close-out 2 Introduction The problem is obvious: –Magnetic field increase only a.
Department of Defense Voluntary Protection Programs Center of Excellence Development, Validation, Implementation and Enhancement for a Voluntary Protection.
Department of Defense Voluntary Protection Programs Center of Excellence Development, Validation, Implementation and Enhancement for a Voluntary Protection.
Reactive to Proactive Maintenance through LEAN. What is Reactive, Proactive and Lean?
Machine Protection Review, Markus Zerlauth, 12 th April Magnet powering system and beam dump requests Markus Zerlauth, AB-CO-IN.
R2E/Availability Workshop Report - RadWG October 22 nd 2014 R2E/Availability Workshop 2014 October th 2014 R2E/Availability Workshop RadWG - Brief.
Reliability and Performance of the SNS Machine Protection System Doug Curry 2013.
A Predictive Maintenance Strategy based on Real-Time Systems
Maintenance strategies
CONS and HL-LHC day Analysis of needs from TE-EPC
Situation of the Static Var Compensators at CERN.
Technical Services: Unavailability Root Causes, Strategy and Limitations Data and presentation in collaboration with Ronan LEDRU and Luigi SERIO.
Performance monitoring framework for the technical infrastructure
Minimum Hardware Commissioning – Disclaimer
RELIABILITY OF 600 A ENERGY EXTRACTION SYSTEMS
Machine Coordinators: G. Arduini, J. Wenninger
The LHC - Status Is COLD Is almost fully commissioned
Potential failure scenarios that can lead to very fast orbit changes and machine protection requirements for HL-LHC operation Daniel Wollmann with input.
Machine Availability and Reliability Panel (MARP)
Reliability and Maintenance Management as a Part of Asset Management
Update on Linac4 Status, Reliability Run and Modelling
Increasing Asset Lifecycle through the application of Innovative Technology
Total Productive Maintenance and Quick Changeover
ABB SACE Maintenance Preventive Maintenance Program
Vibration Measurement, Analysis, Control and Condition Based Maintenance 14 Predictive maintenance Dr. Michiel Heyns Pr.Eng. T: C: +27.
Monday 11/07: Recovery Plenty of accesses.
PREVENTIVE MAINTENANCE By: Mary Solomon Inyang. DEFINITIONS Preventive maintenance refers to regular, routine maintenance to help keep equipment up and.
Operations Management
TEST PLANS for HL LHC IT STRING
Collimator Control (SEUs & R2E Outlook)
TPM Definitions Goals and Benefits Components GEOP 4316.
Operations Management
RADIATION induced failures in LHC 28th June 2011
Unit I Module 3 - RCM Terminology and Concepts
Operations Management
Monday 11th June 08:10 Beam dumped in the ramp by OP. OFB did not start its function and kept the injection orbit. 08:50 Switch LHCb polarity to negative.
Operation of Target Safety System (TSS)
Thu 04:00 Fill #2729 Stable Beams. Initial lumi 6.5e33 cm-2 s-1.
Coordination: Eva Barbara Holzer, Mike Lamont
Review of hardware commissioning
Presentation transcript:

Balancing machine availability and preventive maintenance for HL-LHC A. Apollonio, L. Ponce, B. Todd, J. Uythoven, M.Zerlauth Many thanks to the AWG 7th HL-LHC Collaboration Meeting, 13.-16.11. 2017, CIEMAT Madrid

Outline ‘Charge’ questions: Availability: what do we gain (or loose) if we modify the allocation of preventive maintenance in the year? The access in UR for the LSS does it help ?  Maximising the physics output of the (HL-)LHC What is preventive maintenance? Do we do preventive maintenance and how does it impact availability? Conceiving the ideal maintenance plan Can or should we change the time allocated to preventive maintenance? Should we consequently adapt our machine schedules? Can we increase the (HL-)LHC availability by allowing access to the URs? HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Context of Availability integrated luminosity is a function of… time colliding physics beams turnaround between successive physics beams time to clear faults physics performance during colliding beams Availability Maintainability Scheduling machine understanding & operator skill These are not independent variables HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

A typical (good) year of physics production (2016) Difficult to correlate the physics output with maintenance activities performed during a TS… Hardware Beam Commissioning TS1 TS2 TS3 HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Types of Maintenance Breakdown/reactive Maintenance: Waiting until equipment fails before repairing or servicing it (most of todays LHC failures) Preventive Maintenance (PM): (Time-based or run-based) Periodically inspecting, servicing, cleaning, or replacing parts to prevent sudden failure (Cryogenics, Cooling & Ventilation..) (Predictive) On-line monitoring of equipment in order to use important/expensive parts to the limit of their serviceable life (RF components, klystrons,…) Corrective Maintenance: Improving equipment and its components so that preventive maintenance can be carried out reliably -> Long-term feed forward from fault tracking into consolidation Low initial cost Faults piling up Fault rates kept ~ constant High initial cost HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Examples of successful corrective maintenance Water failures Over current Failure Mode 3 LHC600A-10V Converters Consolidation of the Auxiliary Power Supply against SEUs Exploiting AFT data to establish Availability Matrix Replacing Thyristor PC for D1 and D34 to decrease effect of network perturbations Cryogenic availability – reduction of power plants, exchange of PLCs,… HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Benefits of Preventive Maintenance “...the (long-term) cost of breakdown maintenance is usually much greater than preventive maintenance.” Preventive maintenance... Keeps equipment in good condition to prevent large problems Extends the useful life of equipment Finds small problems before they become big ones Is an excellent training tool for technicians Helps eliminate rework/scrap and reduces process variability Keeps equipment safer Parts stocking levels can be optimized Greatly reduces unplanned downtime http://www.prenhall.com/divisions/bp/app/russellcd/PROTECT/CHAPTERS/CHAP15/HEAD01.HTM HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

When does PM make sense? PM makes sense if Cost of DoingPM < Cost of NotDoingPM CDoingPM = f(hours of not running equipment, loss in staff motivation when doing PM instead of “real work”, materials and man-hours consumed in PM, potential for making things worse, etc.) CNotDoingPM = f(cost of losing/reworking a failed batch (unless PM makes no difference in preventing the failure), materials and man-hours spent repairing equipment, loss of equipment lifetime, loss in staff motivation from NOT doing PM, reduced staff familiarity with equipment, etc.) In a 24x7 manufacturing operation, it is typically better to perform the hours of activities in several smaller periods of time Duration and variability in preventive maintenance are key factors in whether equipment will be able to maintain a steady flow of output Performing PMs inconsistently is functionally equivalent to consistently having much longer downtime durations HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Background theory: unscheduled downtime should be a random phenomenon Waddington effect First observed by C.H. Waddington during the 2nd world war studying British aircraft maintenance Background theory: unscheduled downtime should be a random phenomenon If all unscheduled downtime events are plotted with respect to the last PM, there should not be any pattern evident Conrad Hal (C.H.) Waddington  HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Waddington effect A pattern of increased un-scheduled downtime immediately following PM’s is a “Waddington Effect” Increase the time interval between scheduled maintenance cycles, and eliminate all preventive maintenance tasks that couldn’t be demonstrably proven to be beneficial.  -> effective flying hours of fleet increased by 60 percent! Maintenance isn’t an inherently good thing, but it’s a necessary evil (like surgery). We have to do it from time to time, but we sure don’t want to do more than absolutely necessary to keep our aircraft safe and reliable. Doing more maintenance than necessary actually degrades safety and reliability HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Machine Mode by week for 2016 End Beam Commissioning TS1 MD1 TS2 MD4 TS3 HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Recovery from TS in 2017 Maintenance is not all that bad….! We also piggy-back on the TS to do a number of other changes…. Xing angle levelling valildation after TS1 Faults: COLL interlocks, Handshake, RF, LBDS, … 30cm beta * validation after TS2 Faults: Injectors, PM, PC Interlock HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Machine Schedules today and tomorrow? Today a technical stop comes at a loss of physic's time of 5 days (intervention) + 1-2 days (recovery and revalidation) Time for planned TS vs physics time ~ 14% More frequent maintenance but shorter periods of 1-2 days could allow to better accommodate for (true) preventive maintenance actions Shorter stops will naturally limit the extent of applied changes + reduce the extend of re-validation Major interventions (unless for urgencies) only during long (E)YETS For a sc machine like the LHC, the overhead of (re-)commissioning is considerable, should one foresee 2 x 1.5 years (or 2 x 2 years) of beam operation (see FCC scenarios)? Typical allocation of beam time during an operational year HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Accessing the LHC during operation Due to cryogenic, electrical and radiation risks, different scenarios are being applied for the LHC when accessing underground areas Experimental caverns, surface areas Access possible just after dump of beams, no RP piquet, no pre-cycle (a priori) UA galleries Up to 4 adjacent sectors have to be powered down due to cryogenic risk Tunnel, RRs, UJs Radiation survey required for intervention during operation (and at start of TS, YETS…) Access to inner triplet regions only possible with special derogation and well defined interventions Following access, a pre-cycle has to be done (since 2017 to 3.5 TeV only) HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

New HL caverns, will they help? New caverns to hold most of powering and protection equipment (power converters, quench protection, cryogencis, interlocks,…), however most systems are not redundant / hot-swappable, hence primary gain is with early diagnostics and intervention time -> What will be the cryogenic rules for the new HL caverns? HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

2016 Downtime Distribution [h] HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Statistics of powering system failures in IR1/5 Cryogenics Power Converters Quench Protection Power Converters: 32h of fault time out of 302h in high-lumi IRs (3-4 x FGC/R2E + communication loss, over-voltage, collateral trips due to EMC, water leaks,) Quench Protection: 6h of fault time out of 127h in high-lumi IRs (R2E for controller resets, heater firings due to EMC and thunderstorms, trips due to OP parameters) Cryogenics: 53h of fault time out of 570h in high-lumi IRs 23h due to PM18 PLC communication issue (2016), 2x7 due to QURCB-18- CC1 Frequency drive communication card (2017), respectively temperature probe of RQ4.R5B2. HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

How to profit best from access possibilities? For most (recurring) faults interventions are well optimized Only minor availability gain expected from relocation (R2E) To truly profit from access we should increase our capability of doing preventive maintenance Monitoring and diagnostics of arising issues !!! Fault tolerant designs / hot swaps Turnaround time for converter faults 0-2 2-4 4-6 6-8 8-10 10-12 12-14 …. HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Conclusions Preventive maintenance is important to assure the long-term reliability of systems, but… An ideal preventive maintenance plan of equipment depends on its nature and availability figures There cannot be a unique plan for a complex system of systems Based on the AFT data set this could be modelled (for the main systems) to derive an optimized overall plan -> Reliability Centered Maintenance plans Arbitration of PM actions should be increased to reduce the interventions to those considered beneficial for availability If integrated luminosity is highest priority, one should quantify the cost and benefits of a different operational scenario (e.g. 2x2 years of operation between LS) Pushing further the availability of very reliable systems (and profiting from accessibility to equipment) will require incorporating early fault detection and fault tolerance into the designs of future systems and upgrades HL Annual Meeting Madrid 15 Nov 2017 Markus Zerlauth

Thanks a lot for your attention! Questions?