Availability and Reliability Issues for the XFEL Tom Himel SLAC/DESY

Slides:



Advertisements
Similar presentations
Q11: Describe how the effects of power supply failures on integrated luminosity will be mitigated. TESLA Response : –Mainly consider two types of magnet.
Advertisements

Issues in ILC Main Linac and Bunch Compressor from Beam dynamics N. Solyak, A. Latina, K.Kubo.
Proton Source Workshop Booster Downtime Wednesday, December 8, 2010 T. Sullivan.
Operations and Availability GG3. Key decisions Summary of Key Decisions for the Baseline Design The linac will have two parallel tunnels so that the support.
Conventional Magnets Technology Discussion Summary At the LCFOA meeting, 1May at SLAC Cherrill Spencer, ILC at SLAC 2nd May 2006.
Linac Status Eric Prebys DOE Review, Proton Source Breakout Session July 21,2003.
EPAC June 2003 The EPAC June 2003 Questions 1. Clarify the Motivation for the Proposal. 2. How to ensure the e+ polarimeter works right away? 3. What is.
1 Product Reliability Chris Nabavi BSc SMIEEE © 2006 PCE Systems Ltd.
F All Experimenters' Mtg - 9 Jun 03 Week in Review: 06/02/03 –06/09/03 Keith Gollwitzer – FNAL Stores and Operations Summary Standard Plots.
WAO, Trieste, 24th - 28th September 2007 An Overview of the Reliability and the Fault Trends of the SRS Cheryl Hodgkinson Daresbury Laboratory Synchrotron.
Sat 18/6 – Sun 19/6 07:48 Beam dump due to spurious interlock from BIS 08:30 access BIC repair. RF intervention in parallel. 13:00 Intervention on BIC.
Accelerators for ADS March 2014 CERN Approach for a reliable cryogenic system T. Junquera (ACS) *Work supported by the EU, FP7 MAX contract number.
1 Availsim DRFS and klyClus setup, assumptions, questions Tom Himel August 11, 2009.
GDE meeting September 29, Availability Task Force Progress Report Tom Himel for the Availability Task Force Putting the Linac in a single tunnel.
Summary of WG1 K. Kubo, D. Schulte, P. Tenenbaum.
RDR Report Writing Nan Phinney SLAC. 7/20/06 VLCW06 Global Design Effort 2 GLC Report Working model is the 2003 GLC Report ch 4-7
Operations, Test facilities, CF&S Tom Himel SLAC.
1Matthias LiepeAugust 2, 2007 LLRF for the ERL Matthias Liepe.
A general comment first: The questions ITRP has posed to the proponents focus largely upon technical issues for the linear collider. These are important.
Thickness of the Kamaboko Tunnel Shield Wall under Different Assumptions Ewan Paterson Technical Board June 23,
1 Availability and Controls Tom Himel SLAC Controls GG meeting January 20, 2006.
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
REMOTE-HANDLING OVERVIEW I. Bailey Cockcroft Institute / University of Liverpool.
Nick Walker – SA meeting KEK Treaty Points and Costing Guidance Nick Walker 1 st System Area Managers Meeting KEK –
Aug 23, 2006 One-vs-Two Tunnel Tradeoffs (America’s Specific) Chris Adolphsen With contributions from the Civil Group, John Carwardine, Ray Larsen and.
Aug 23, 2006 Half Current Option: Impact on Linac Cost Chris Adolphsen With input from Mike Neubauer, Chris Nantista and Tom Peterson.
Road map of the workshop Machine availability: – “work-horse” in the injection chain – 100% availability not viable,.What is achievable? And at which cost?
Proton Improvement Plan Bill Pellico April 19, 2013 NOvA collaboration Meeting Bill Pellico NOvA.
Proton Planning – Major Projects, Schedule, Decisions, and Projections Eric Prebys FNAL Accelerator Division.
John Carwardine, Ewan Paterson, Marc Ross 14/15 July 2009 Availability Task Force: Subgroup #2.
Thickness of the Kamaboko Tunnel Shield Wall under Different Assumptions Ewan Paterson ADI Meeting July 2, ADI Meeting Ewan Paterson 7/2/15.
1 Run7 startup M. Sullivan MAC Review Nov , 2007 M. Sullivan for the PEP-II Team Machine Advisory Committee Review November 15-17, 2007 Run 7 Startup.
Block Diagram e- source e- DR e- RTML e- ML e+ source e- BDS e+ BDS e+ ML e+ RTML e+ DR Factor availability of single-tunnel housing out of overall availability.
Jan Low Energy 10 Hz Operation in DRFS (Fukuda) (Fukuda) 1 Low Energy 10Hz Operation in DRFS S. Fukuda KEK.
ATC / ABOC 23 January 2008SESSION 6 / MTTR and Spare Parts AB / RF GROUP MTTR, SPARE PARTS AND STAND-BY POLICY FOR RF EQUIPMENTS C. Rossi on behalf of.
Main Linac Issues for RDR 1/20/2006 Area group H. Hayano From Linac Area discussion posted on the web:
PAC Meeting, December 12, Prebys 1 The Problem.
LC availability Simulation done for the LC comparison task force Tom Himel SLAC.
Failure Analysis Tools at DESY. M. Bieler, T. Lensch, M. Werner, DESY ARW 2013, Melbourne,
Tests on production cryomodules Bob Kephart Sept 30, 2006.
S2 Progress Report for EC H. Padamsee and Tom Himel For the S2 Task Force.
Mini MAC II PEP-II hardware M. Sullivan Apr 23-24, 2009Page 1 PEP-II Hardware Status M. Sullivan SLAC National Accelerator Laboratory Mini-MAC II Frascati,
Operation Status of the RF Systems and Taiwan Photon Source
Modeling SNS Availability Using BlockSim Geoffrey Milanovich Spallation Neutron Source.
From Beam Dynamics K. Kubo
Crab Cavity Technical Coordination meeting:
The Survey of the Power Supply Reliability at SSRF
Hardware & Software Reliability
Engineering in High Availability
MPE-PE Section Meeting
For Discussion Possible Beam Dynamics Issues in ILC downstream of Damping Ring LCWS2015 K. Kubo.
Availability Task Force Progress Report
Availsim runs and questions
SuperB Injection, RF stations, Vibration and Operations
Joint Meeting SPS Upgrade Study Group and SPS Task Force
SNS Operational History
General presentation about recent status of DRFS
Top-Up Injection for PEP-II and Applications to a Higgs Factory
CEPC RF Power Sources System
Mathew C. Wright January 26, 2009
PEP-II RF System Status and Upgrades
Thoughts on Energy Reach
Commissioning, Operations, and Reliability Status
Thickness of the Kamaboko Tunnel Shield Wall under Different Assumptions and the Impact on Operations Ewan Paterson LCWS15 Meeting NOV 3, /24/2019.
J. Seeman Assistant Director for PPA/LCLS
Thu 04:00 Fill #2729 Stable Beams. Initial lumi 6.5e33 cm-2 s-1.
Breakout Session SC3 – Undulator
Summary of the maximum SCRF voltage in XFEL
Close-out.
Reliability, MPS, Operations and Tuning Work Packages
Presentation transcript:

Availability and Reliability Issues for the XFEL Tom Himel SLAC/DESY Presented at XFEL/ILC joint meeting 14Nov07

Contents Introduction and purpose of studies The availability simulation Wake up ILC people What was modeled (important assumptions) Some results Conclusions

Introduction Availsim is a Monte Carlo simulation developed over several years for linear collider studies. Given a component list and MTBFs and MTTRs and degradations it simulates the running and repairing of an accelerator. It can be used as a tool to compare designs and set requirements on redundancies and MTBFs. It has now been applied to the XFEL.

Why a simulation? We chose to go with a simulation instead of a spreadsheet calculation for the following reasons: Including tuning and recovery times in a spreadsheet calculation is difficult. Fixing many things at once (during an access) is also difficult to put in a simple spreadsheet formula. If later, one wants to more carefully model luminosity degradation on recovery from downtimes a simulation is simpler A disadvantage of a simulation is its use of random numbers so one needs high enough statistics to get a meaningful answer. This is particularly a concern if one wants to compare two slightly different cases. Random number seeds are handled in a way to allow meaningful comparisons of similar cases. A 20 year simulation which gives good enough statistics takes 90 seconds on my laptop

The Simulation includes: Effects of redundancy such as 21 DR kickers where only 20 are needed or the large energy overhead in the main linac Some repairs require accelerator tunnel access, others can’t be made without killing the beam and others can be done hot. Time for radiation to cool down before accessing the tunnel Time to lock up the tunnel and turn on and standardize power supplies Recovery time after a down time is proportional to the length of time a part of the accelerator has had no beam. Recovery starts at the injectors and proceeds downstream. Manpower to make repairs can be limited.

The Simulation includes: Opportunistic Machine Development (MD) is done when part of the accelerator is down but beam is available elsewhere for more than 2 hours. MD is scheduled to reach a goal of 1 - 2% in each region of the accelerator. All regions are modeled in detail down to the level of magnets, power supplies, power supply controllers, vacuum valves, BPMs … The cryoplants and AC power distribution are not modelled in detail. Non-hot maintenance is only done when the accelerator is broken. Extra non-essential repairs are done at that time though. Repairs that give the most bang for the buck are done first.

The Simulation includes: PPS zones are handled properly e.g. can access linac when beam is in the injector. It assumes there is a tuneup dump at the end of each region. Kludge repairs can be done to ameliorate a problem that otherwise would take too long to repair. Examples: Tune around a bad quad in the cold linac or disconnect the input to a cold power coupler that is breaking down. During the long (1 month) shutdown, all devices with long MTTR’s get repaired.

Wake Up ILC people

ILC – XFEL – SyncLight differences

Scheduled Maintenance Days New feature added for XFEL Regular shutdown for repairs/maintenance Counts as scheduled downtime, not unscheduled. Better for short term users. All repairs which can be completed with available manpower in available time are done. Most bang for buck first. Maintenance items (e.g. cleaning filters) which don’t break things are not simulated. If recovery is longer than scheduled, extra is unscheduled down. If recovery is > 2 hours early, opportunistic MD is done. If we were down at beginning of maintenance day, remaining down is changed to scheduled.

Mined data from old accelerators MTBF data for accelerator components is scarce and varies widely

Recovery Time for PEP-II

List of sub-decks

Full list of Components

Full list of Components

Starting Modeling Assumptions (XFEL/ILC) The full complement of 117 cryomodules is installed. (Only 3 upstream /all) klystrons+modulators are accessible. Accelerator can run with any other klystron off. (Few/Most) electronics modules can be hot swapped. Tune up dump and shielding between each part of accelerator Global controls causes (0.2/0.2)% downtime Global water system causes (0.2/0.0)% downtime Some cryoplant is down 1% including outages due to their incoming utilities. There is (1/6) cryoplant. Major power circuits cause (0.5/0.5)% downtime

Starting Modeling Assumptions (XFEL/ILC) Power coupler interlock electronics and sensors (do not have / have) MTBF of 1E6 due to redundancy. Cavity tuner motors have MTBF of 1E6, 2 times better than SLAC warm experience and MUCH better than TTF experience. Failed linac quads can be tuned around in 2 hours Most failed correctors can be tuned around in 0.5 hours LLRF, klystron support electronics, SC quad supplies (are/are not) in accel tunnel. Other power supplies are accessible Hot spare klystron/modulator with waveguide switches in (no/low energy) linac regions Magnet power supply MTBF of (50,000 for most magnets and 100,000 for SC magnets whose supplies are in the tunnel/200,000 ). ILC assumed redundant regulators. It takes (6/8) hours to replace a klystron

Initial downtime causes 24 electronic cards + 6*24 sensors per RF unit – any failure trips it off Lumped systems Total XFEL downtime = 11.6% where budget = 7% Total ILC downtime = 31% where budget = 15%, but ILC started with some components assumed better than now

Needed XFEL MTBF Improvements 5 = work hard 10-20 = may need 2 iterations with lifetime testing inbetween to find 2nd next level of failures 50 = significant risk of failing.

Comments on MTBF improvements There are 6 coupler sensors and interlock electronics channels per coupler with no redundancy. Not surprising it is a problem. However, not problem at FLASH, so maybe MTBF of 100k hours per 6 channel electronics card is too short? Can one require 2 problems before tripping (ILC assumes this)? Kicker problem easily fixed with extra kicker and pulser. Was already OK for 17.5 GeV energy Power supplies are single points of failure, commonly a problem in accelerators. APS gets MTBF of 560,000 hrs, 10 times greater than SLAC/FNAL/DESY. They stress the supplies before each run and monitor internal temperatures and voltages so degrading components can be replaced on scheduled maintenance days.

Needed ILC MTBF Improvements 5 = work hard 10-20 = may need 2 iterations with lifetime testing inbetween to find 2nd next level of failures 50 = significant risk of failing.

XFEL Simulation Results Xfel2 mainly better because of coupler sensor interlocks Xfel3 MTBF improvements chosen to get 7.0% downtime XFEL4 w/o sched downtime, downtime is 0.3% worse, but lose extra 3.2% to scheduled downtime. XFEL6 .2% improvement from recovery not over-running, .1% from overlap of sched downtime w/ downtime.

For ILC availSim used as input for many design decisions Putting both DR in a single tunnel only decreased int lum by 1%. -- OK Is a hot spare e+ target line needed? -- Not if e+ target can be replaced in the specified 8 hours Confirm that 3% energy overhead is adequate in the linac. Showed that hot spare klystrons and modulators are needed where a single failure would prevent running. Showed 7% availability loss for single tunnel.

For XFEL points out potential problems I’ve chosen particular items to improve. Project needs to look for optimum (cheapest, lowest risk) solution Could the cryoplant or site power be made more reliable than assumed? Can coupler interlocks require 2 problems before tripping?

Benchmarking the Simulation A limited benchmark was done with HERA data. Using MTBFs and component counts taken from HERA as input, it correctly calculated the number of failures. Fancier features like repair time scheduling and recovery time have not been benchmarked. Getting together list of components is real work. MTBFs and MTTRs should be taken from accelerator under study. 50% errors easily happen. Real work. Recovery time is usually accounted as “tuning” instead of downtime. Often repairs are accounted as “scheduled downtime” Simulation results seem reasonable. Back-of-the-envelope checks are OK. Most important results are comparisons of two slightly different accelerators. Systematic errors cancel.

Conclusions Component availability for ILC is a major challenge. Must do R&D, plan, and budget for it up-front. For XFEL, it is not past state-of-the-art, but more attention must be paid to it than for a typical HEP accelerator. X-ray beam-lines have not been modeled. They have few parts but with unknown reliabilities. This simulation is a useful design tool for the ILC and XFEL and other accelerators. Code is available.