Engineering in High Availability

Slides:



Advertisements
Similar presentations
Tom Powers Practical Aspects of SRF Cavity Testing and Operations SRF Workshop 2011 Tutorial Session.
Advertisements

XTCAV X-Band Transverse Deflecting Cavity Project Overview Patrick Krejcik Yuantao Ding, Joe Frisch.
Operations and Availability GG3. Key decisions Summary of Key Decisions for the Baseline Design The linac will have two parallel tunnels so that the support.
Hamid Shoaee LCLS FAC Controls June 17, LCLS Control System Personnel Linac & BC2 Controls progress LTU, Dump Controls.
SLAC ILC Accelerator: Luminosity Production Peter Tenenbaum HEP Program Review June 15, 2005.
ILCTA_NML Progress at Fermilab Jerry Leibfritz September 10, 2007.
Accelerators for ADS March 2014 CERN Approach for a reliable cryogenic system T. Junquera (ACS) *Work supported by the EU, FP7 MAX contract number.
1 Status of EMMA Shinji Machida CCLRC/RAL/ASTeC 23 April, ffag/machida_ ppt & pdf.
1 Availsim DRFS and klyClus setup, assumptions, questions Tom Himel August 11, 2009.
ASTA roadmap to EPICS and other upgrades Remote monitoring and control DAQ Machine protection Dark current energy spectrometer.
GDE meeting September 29, Availability Task Force Progress Report Tom Himel for the Availability Task Force Putting the Linac in a single tunnel.
HINS 6-Cavity Test 3 Month Plan Jan 12, Current Schedule 2-3 day shutdown scheduled to start Jan 18 (Wednesday) to upgrade BPM system and reinstall.
HLRF DRAFT Global Design Effort 1 Defining EDR* Work Packages [Engineering Design Report] Ray Larsen SLAC ILC Division for HLRF Team DRAFT April.
Operations, Test facilities, CF&S Tom Himel SLAC.
1 Availability and Controls Tom Himel SLAC Controls GG meeting January 20, 2006.
Reliability and availability considerations for CLIC modulators Daniel Siemaszko OUTLINE : Give a specification on the availability of the powering.
RHIC Run14 Time Meeting. Run14 Status – Apr. 15 Operations running well, 12 stores over the past 7 days with 1 lost to another “super quench”. Testing.
Machine Protection at the 1MW CEBAF Electron Accelerator and Free Electron Laser Facility Kelly Mahoney Presented at the Workshop for.
Aug 23, 2006 Half Current Option: Impact on Linac Cost Chris Adolphsen With input from Mike Neubauer, Chris Nantista and Tom Peterson.
The recent history and current state of the linac control system Tom Himel Dec 1,
Managed by UT-Battelle for the Department of Energy SCL Vacuum Control System Upgrade Derrick Williams
GG3 Operations & Reliability (Availability) Eckhard Elsen Tom Himel
Updated Overview of Run II Upgrade Plan Beam Instrumentation Bob Webber Run II Luminosity Upgrade Review February 2004.
John Carwardine, Ewan Paterson, Marc Ross 14/15 July 2009 Availability Task Force: Subgroup #2.
General remarks: I am impressed with the quantity and quality of the work presented here and the functioning of the organization. I thank ILC and FNAL.
Simulations - Beam dynamics in low emittance transport (LET: From the exit of Damping Ring) K. Kubo
Timing System R+D for the NLC Josef Frisch. NLC and PEPII Phase and Timing Requirements (approximate)
ATC / ABOC 23 January 2008SESSION 6 / MTTR and Spare Parts AB / RF GROUP MTTR, SPARE PARTS AND STAND-BY POLICY FOR RF EQUIPMENTS C. Rossi on behalf of.
DRAFT: What have been done and what to do in ILC-LET beam dynamics Beam dynamics/Simulations Group Beijing.
PAC Meeting, December 12, Prebys 1 The Problem.
LC availability Simulation done for the LC comparison task force Tom Himel SLAC.
RF Commissioning D. Jacquet and M. Gruwé. November 8 th 2007D. Jacquet and M. Gruwé2 RF systems in a few words (I) A transverse dampers system ACCELERATING.
ILCTA_NML Progress at Fermilab Jerry Leibfritz August 16, 2007.
S2 Progress Report for EC H. Padamsee and Tom Himel For the S2 Task Force.
Operation Status of the RF Systems and Taiwan Photon Source
Cost Optimization Models for SRF Linacs
RF acceleration and transverse damper systems
Multi-bunch Operation for LCLS, LCLS_II, LCLS_2025
For Discussion Possible Beam Dynamics Issues in ILC downstream of Damping Ring LCWS2015 K. Kubo.
Availability Task Force Progress Report
Second SPL Collaboration Meeting, Vancouver May 2009
BDS Cryogenic System RDR Status and EDR Plans
Availsim runs and questions
WP02 PRR: Master Oscillator and RF Reference Distribution
ILC Cryogenic Systems Draft EDR Plan
RF cables calibration Matteo Volpi Thomas Geoffry Lucas
Accelerator Layout and Parameters
LLRF and Beam-based Longitudinal Feedback Readiness
Cost Optimization Models for SRF Linacs
A Portion of the SCP RF Control System LCLS Related
CEPC RF Power Sources System
Mathew C. Wright January 26, 2009
Interlocking of CNGS (and other high intensity beams) at the SPS
Low Level RF Status Outline LLRF controls system overview
Availability and Reliability Issues for the XFEL Tom Himel SLAC/DESY
Electron Source Configuration
Extract from today’s talk given to DCB
Commissioning, Operations, and Reliability Status
Low Level RF Status Outline LLRF controls system overview
ATF project meeting, Feb KEK, Junji Urakawa Contents :
J. Seeman Assistant Director for PPA/LCLS
Report on ATF2 Third Project Meeting ATF2 Magnet Movers ATF2 Q-BPM Electronics Is SLAC ILC Instrumentation Group a good name?
SNS Commissioning Mike Plum
Operational Experience with LCLS RF systems
UITF Conduct of Operations Review
ERL Director’s Review Main Linac
Thu 04:00 Fill #2729 Stable Beams. Initial lumi 6.5e33 cm-2 s-1.
Breakout Session SC3 – Undulator
ILC Beam Switchyard: Issues and Plans
Reliability, MPS, Operations and Tuning Work Packages
Presentation transcript:

Engineering in High Availability Tom Himel Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Contents Why High Availability is important General HA issues Controls specific HA issues Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Why High Availability is important The ILC will be an order of magnitude more complex than most present accelerators. If it is built like present HEP accelerators, it will be down an order of magnitude more. That is, it will always be down. The integrated luminosity will be zero. Not good. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Availability Design Philosophy Design it in up front. Budget 15% downtime total. Keep an extra 10% as contingency. Try to get the high availability for the minimum cost. First stab was done for the RDR. Will need to iterate during the ED phase. Quantities are not final Engineering studies may show that the cost minimum would be attained by moving some of the unavailability budget from one item to another. This means some MTBFs may be allowed to go down, but others will have to go up. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Will Need Improvement Program Must design to meet the budget on the first pass. Assume we are only partly successful and unavailability will be too high when we turn on. Will need operations budget and engineering to make the necessary improvements. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Top level approach Use Availsim a Monte Carlo simulation developed over several years. Given a component list and MTBFs and MTTRs and degradations it simulates the running and repairing of an accelerator. It can be used as a tool to compare designs and set requirements on redundancies and MTBFs. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

MTBF goals Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Global availability work Will need to improve unavailability budget. Follow design changes as they occur (e.g. device counts and located in tunnel or not) Re-apportion the allowed unavailability as we find cheaper ways to attain the budgeted 15% downtime. The present HA budget is just a guess at the cost optimum. Provide a framework (FMEA package) for groups to use to evaluate their system’s availability. Push groups to do HA work that is being ignored (e.g. water instrumentation, collimators, and coupler interlocks). Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Criteria for doing HA design during ED phase rather than waiting Decisions which couple multiple systems Designs which need a large availability enhancement compared to present designs (e.g. power supplies and water cooled magnets) These need to be prototyped and tested Devices which will be produced in very large quantities (e.g. LLRF electronics, controls crates) or are difficult to retrofit (e.g. cryomodules, klystrons). Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Example trade studies during ED Phase Redundancy of cavity sensors and interlock electronics vs. extremely high availability or extra energy overhead Design of vacuum manifolds, valves, pumps, pump supplies for couplers to be slightly redundant vs. extra energy overhead Use of waveguide switches and hot spare klystrons and modulators in large fractional energy gain regions vs. some other way of saving that ~1% downtime Locating extremely HA magnet power supplies in the beam tunnel vs. HA supplies in alcoves with long cables and the consequent heat load Do we have to do 1 vs. 2 tunnel yet again? Large redundant water pumps vs. small segmented pumps that don’t kill the accelerator when one fails (e.g. for a single klystron). Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Example trade studies during ED Phase Placing some LLRF (maybe just down mixers) in tunnel to reduce the cable plant while making them extremely HA or increasing the energy overhead Fast e+ target replacement vs. extra e+ target region beam-line. What temperature should the electronics be cooled to? Is the 0.5% downtime allocated to AC power source and distribution enough? Is there some way we can prevent 0.25 second power dips from causing 8 hours of downtime? Should we get power from two places on the grid? Use of Uninterruptible Power Supplies vs. more extra redundancy in the AC power distribution. Is the allocation of 1% downtime for cryo problems enough (10 times better than CERN does)? How will it be achieved? Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Example trade studies during ED Phase What are the right operating margins for cryo, magnet supplies, RF, utilities to optimize cost and availability? Is it better to add RF units or make each more reliable? Should there be redundant beam instrumentation for critical feedback loops? How much electronics should be put in the tunnel? Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Example Trade-off studies that can probably wait Should individual channels of electronics (e.g. timing generator, RF amplitude and phase detector) be hot swappable or is OK to put many of them on a single board? Detailed designs of items that do not need major availability improvements e.g. beam loss monitor readouts, laser wire electronics, ADC boards, magnet supports Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Two aspects to controls and availability Controls itself should not go down often. (Covered in talk by Krause) Controls needs to provide tools to help discover what is wrong in other systems. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Tools to help other systems Network access for laptops and diagnostic equipment near all hardware Readout and recording of diagnostic information built into other systems (e.g. a power supply may record its voltage and current at a megahertz) Either record everything very often or allow flexibly triggered readout of everything or both. Provide analysis tools of the data that is recorded. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Example of need for sync readout Based on SLC “flyer pulses” Infrequently a single bunch causes very high backgrounds. Need to figure out why. Only know few seconds after the fact that a bunch was bad. Could be caused by bad kicker pulse. Need to know kicker strength on each bunch. Could be caused by DR phase instability (saw tooth). Need to know orbit and phase of that bunch on many turns prior to extraction Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Help Solve Subtle Problems Phase drifts – compare redundant readouts Lying BPMs – chisquared? Redundancy? Drifting BPMs (both mechanical and electrical) Difficult to localize problems (normal module swaps don’t fix it). E.g. noise coupling in on a long cable or a flakey connector. Vacuum bursts in DRs (present PEP problem) – read 1/sec, provide good analysis package Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Conclusions Designing for High Availability is vital for all systems. A good fraction of it must be done during the ED phase. Controls has the added responsibility of providing the tools to help other systems diagnose their problems quickly. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

Backup slides Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

The Simulation includes: Effects of redundancy such as 21 DR kickers where only 20 are needed or the 3% energy overhead in the main linac Some repairs require accelerator tunnel access, others can’t be made without killing the beam and others can be done hot. Time for radiation to cool down before accessing the tunnel Time to lock up the tunnel and turn on and standardize power supplies Recovery time after a down time is proportional to the length of time a part of the accelerator has had no beam. Recovery starts at the injectors and proceeds downstream. Manpower to make repairs can be limited. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

The Simulation includes: Opportunistic Machine Development (MD) is done when part of the LC is down but beam is available elsewhere for more than 2 hours. MD is scheduled to reach a goal of 1 - 2% in each region of the LC. All regions are modeled in detail down to the level of magnets, power supplies, power supply controllers, vacuum valves, BPMs … The cryoplants and AC power distribution are not modelled in detail. Non-hot maintenance is only done when the LC is broken. Extra non-essential repairs are done at that time though. Repairs that give the most bang for the buck are done first. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007

The Simulation includes: PPS zones are handled properly e.g. can access linac when beam is in the DR. It assumes there is a tuneup dump at the end of each region. Kludge repairs can be done to ameliorate a problem that otherwise would take too long to repair. Examples: Tune around a bad quad in the cold linac or a bad quad trim in either damping ring or disconnect the input to a cold power coupler that is breaking down. During the long (3 month) shutdown, all devices with long MTTR’s get repaired. Controls & LLRF EDR Kick-off Meeting, Aug 20-22, 2007