Download presentation
Presentation is loading. Please wait.
Published byQuentin Johns Modified over 8 years ago
1
B. Todd, A. Apollonio, S. Gabourin, S. Uznanski Principles and Experience in the 1v2 Design & Operation of Dependable Systems
2
CERN benjamin.todd@cern.ch 2. Dependable Design Principles 3. Experiences to date Dependable systems are the result of good engineering practices Good engineers = good systems failure modes are just as important as rates 1. CERN and the LHC watch out for the dependencies System specifications need dependability requirements
3
CERN benjamin.todd@cern.ch Founded in 1954 Funded by the European Union 20 Member States 8 Observer States and Organisations 35 Non-Member States …Japan, Russia, USA… 580 Institutes World Wide 2500 Staff 8000 Visiting Scientists …Australia, Canada, New Zealand… …most of the EU… European Centre for Nuclear Research Conseil Européen pour la Recherche Nucléaire Pure Science – Particle Physics 1.Pushing the boundaries of research, physics beyond the standard model. 2.Advancing frontiers of technology. 3.Forming collaborations through science 4.Educating the scientists and engineers of tomorrow
4
CERN benjamin.todd@cern.ch particle accelerators and detectors to study the basic constituents of matter. Accelerators boost beams of particles to high energies before they are made to collide with each other or with stationary targets. Detectors observe and record the results of these collisions. Our flag-ship project is the Large Hadron Collider…
5
CERN benjamin.todd@cern.ch CERN CERN Accelerator Complex Lake Geneva Geneva Airport CERN LAB 1 (Switzerland) CERN LAB 2 (France)
6
CERN benjamin.todd@cern.ch CERN CERN Accelerator Complex Lake Geneva Geneva Airport CERN LAB 1 (Switzerland) CERN LAB 2 (France) Proton Synchrotron (PS) Super Proton Synchrotron (SPS) Large Hadron Collider (LHC) 27km long 150m underground
7
CERN Accelerator Complex Lake Geneva Geneva Airport CERN LAB 1 (Switzerland) CERN LAB 2 (France) Proton Synchrotron (PS) Super Proton Synchrotron (SPS) Large Hadron Collider (LHC)
8
CERN benjamin.todd@cern.ch CERN, the LHC and Machine Protection CERN 8 of 23 CERN Accelerator Complex Large Hadron Collider (LHC) Beam-1 Transfer Line (TI2) Beam-2 Transfer Line (TI8) Beam Dumping Systems Super Proton Synchrotron (SPS) 100us for one turn
9
CERN benjamin.todd@cern.ch CERN CERN Accelerator Complex CMS ALICE ATLAS LHC-b
10
CERN benjamin.todd@cern.ch ATLAS – A Toroidal LHC ApparatuS 10
11
CERN benjamin.todd@cern.ch ATLAS – A Toroidal LHC ApparatuS 11
12
CERN benjamin.todd@cern.ch Stored energy in the magnet circuits is 9 GJ LHC Parameters LHC needs high luminosity of 10 34 [cm -2 s -1 ] 3 x 10 14 p per beam at 7 TeV 8.3 Tesla dipole fields with circumference of 27 kms (16.5 miles) LHC needs super-conducting magnets <2°K (-271°C) with an operational current of ≈13kA cooled in superfluid helium maintained in a vacuum [11] A magnet will QUENCH with milliJoule deposited energy Stored energy per beam is 360 MJ …to see the rarest events… … to get 7 TeV operation… … to get 8.3 Tesla … two orders of magnitude higher than others 20103.54 x 10 13 2.0 x 10 32 20113.52.0 x 10 14 3.6 x 10 33 201242.2 x 10 14 7.7 x 10 33 LS 1-2 ≈6.5≈3 x 10 14 ≈1 x 10 34 Year Peak Energy [TeV] Peak Intensity [p] Peak Luminosity [cm -2 s -1 ] [1,2,3,4] 45 pb -1 5.3 fb -1 21.7 fb -1 >20 fb -1 Total Physics [yr -1 ]
13
CERN benjamin.todd@cern.ch Dependable Design Principles - a design flow
14
CERN benjamin.todd@cern.ch Systems… a non-complex system… with many components…<1k lines a complex system … with few components … Safe Machine Parameters S M P >80k lines a complex system … with many components … Beam Interlock System B I S Function Generator Controller Lite F G C Lite >>80k lines Critical code
15
CERN benjamin.todd@cern.ch Power Converter Types 15 [4,5]
16
CERN benjamin.todd@cern.ch Power Converter Types Function Generator Controller F G C 16 [5,6] ≈1000 replaced with FGClite
17
CERN benjamin.todd@cern.ch Power Converter
18
CERN benjamin.todd@cern.ch Reliability Requirements For > 1000 units… acceptable failure rate < 40 per year… Mean Time Between Failures > 200000 hours electrical SEE radiation cross-section <1 x 10 -12 > 300000 hours equipment lifetime > 25 years… electrical DD / TID radiation >200 Grays design for 25 years 18 Techniques such as application of MIL-217 = predict electrical reliability Scientific testing and analysis = predict radiation cross-section and lifetime working on a model to integrate radiation effects with electrical in ISOGRAPH
19
CERN benjamin.todd@cern.ch FGClite Design Flow 19
20
CERN benjamin.todd@cern.ch FGClite Design Flow Class 0 (C 0 ) Class 1 (C 1 ) Class 2 (C 2 ) components known to be resistant, or easily replaced, conceptual design not influenced by these components. components potentially susceptible to radiation, in less-critical parts of the system. Substitution of parts or mitigation of issues is possible with a re-design. components potentially susceptible to radiation, in more-critical parts of the system. The conceptual design is compromised if these components do not perform well. Substitution of parts or mitigation of issues would be difficult. Resistors, capacitors, diodes, transistors… Regulators, memory, level translators… Precision ADC, FPGA… 20
21
CERN benjamin.todd@cern.ch FGClite Design Flow 21
22
CERN benjamin.todd@cern.ch FGClite Design Flow 22
23
CERN benjamin.todd@cern.ch FGClite Design Flow 23
24
CERN benjamin.todd@cern.ch FGClite Design Flow 24 [7]
25
CERN benjamin.todd@cern.ch FGClite Design Flow 25 [7]
26
CERN benjamin.todd@cern.ch Example HW Reliability Optimisation 26
27
CERN benjamin.todd@cern.ch Experiences Running LHC to Date Availability Working Group
28
CERN benjamin.todd@cern.ch Physics Fill Abort Root Cause 2012 28 585 physics fills [9]
29
CERN benjamin.todd@cern.ch Lost Physics and Fault Time 2012 29 812 hours = 34 days = lost physics 1524 hours = 64 days = fault time [9]
30
CERN benjamin.todd@cern.ch Machine Protection Faults 2012 30 7 systems, >250 faults, ≈36 failure modes, >360h repair time BLM QPS Failure modes very important for fault evolution Unrealistic to draw proper conclusions – don’t record raw data consistently [10]
31
CERN benjamin.todd@cern.ch 2005 Predictions… 31 false dumps: failure of system which leads to “fail-safe” premature abort System Predicted 2005 Observed 2010 Observed 2011 Observed 2012 LBDS6.8 ± 3.69114 BIS0.5 ± 0.5210 BLM17.0 ± 4.00415 PIC1.5 ± 1.2250 QPS15.8 ± 3.9244856 SIS-424 reliability in line with expectations… (!!) despite the almost-witchcraft used to create the numbers… But the failure modes are not the same. [9]
32
CERN benjamin.todd@cern.ch 2005 Predictions… 32 System Predicted 2005 Observed 2010 Observed 2011 Observed 2012 LBDS6.8 ± 3.69114 BIS0.5 ± 0.5210 BLM17.0 ± 4.00415 PIC1.5 ± 1.2250 QPS15.8 ± 3.9244856 SIS-424 [9] false dumps: failure of system which leads to “fail-safe” premature abort reliability in line with expectations… (!!) despite the almost-witchcraft used to create the numbers… But the failure modes are not the same.
33
CERN benjamin.todd@cern.ch Proposal - An LHC Fault Tracker 33 Visualisation of Events of 15 th – 16 th August 2012
34
CERN benjamin.todd@cern.ch Proposal - An LHC Fault Tracker 34 Visualisation of Events of 15 th – 16 th August 2012
35
CERN benjamin.todd@cern.ch Proposal - An LHC Fault Tracker 35 Visualisation of Events of 15 th – 16 th August 2012
36
CERN benjamin.todd@cern.ch Proposal - An LHC Fault Tracker 36 Visualisation of Events of 15 th – 16 th August 2012
37
CERN benjamin.todd@cern.ch Proposal - An LHC Fault Tracker 37 Visualisation of Events of 15 th – 16 th August 2012 LHC “e-logbook” TE-EPC Log TE-MPE-COMS TE-EPC view TE-MPE view OP view Impact on machine easier to infer + +
38
CERN benjamin.todd@cern.ch Personal experience with the Beam Interlock System…
39
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to?
40
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? CERN Controls Standard Power PC 8 out of 33 failed to date Outside the analysis scope
41
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Redundancy is more effective when it goes beyond the system boundary
42
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Consider dependability during installation: Connections between systems influence reliability Maintenance directly influences availability Reliability-Centred-Maintenance? Preventive Maintenance?
43
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Consider dependability during installation: Connections between systems influence reliability Maintenance directly influences availability Reliability-Centred-Maintenance? Preventive Maintenance? A.N. Other User System… Where do we start debugging? Beam Interlock Controller
44
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Consider dependability during installation: Connections between systems influence reliability Maintenance directly influences availability Reliability-Centred-Maintenance? Preventive Maintenance? open racks… mystery of the missing 220V cable…
45
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Consider dependability during installation: Connections between systems influence reliability Maintenance directly influences availability Reliability-Centred-Maintenance? Preventive Maintenance? 100kg of batteries in front of the spares cupboard… and no pallet lifter in sight…
46
CERN benjamin.todd@cern.ch Blurred Lines at System Boundaries Identify and account for dependencies - Services- Infrastructure- Controls Not part of analysis… …failures attributed to? Consider dependability during installation: Connections between systems influence reliability Maintenance directly influences availability Reliability-Centred-Maintenance? Preventive Maintenance?
47
CERN benjamin.todd@cern.ch 2. Dependable Design Principles 3. Experiences to date Dependable systems are the result of good engineering practices Good engineers = good systems failure modes are just as important as rates 1. CERN and the LHC watch out for the dependencies System specifications need dependability requirements
48
CERN benjamin.todd@cern.ch Fin Thank you!
49
CERN benjamin.todd@cern.ch References From the Chamonix Performance Workshop 2011 http://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=103957 [1] 49 Extracted from http://lhc-statistics.web.cern.ch/LHC-Statistics/index.php http://lhc-statistics.web.cern.ch/LHC-Statistics/index.php [2] Extrapolated from W. Herr’s talk: “Luminosity Performance Reach After LS1” [3] Total Physics is from ATLAS https://twiki.cern.ch/twiki/bin/view/AtlasPublic/LuminosityPublicResults https://twiki.cern.ch/twiki/bin/view/AtlasPublic/LuminosityPublicResults [4] Figures and flow derived from work by Y. Thurel and S. Uznanski[7] Derived from http://cdsweb.cern.ch/record/1123729/files/LHC-PROJECT-REPORT-1133.pdf?version=1 [5] Photographs courtesy Y. Thurel et al, from: “LHC Power Converters the Proposed Approach” [6] From M. Kwiatkowski’s talk during SMP review at MPP http://lhc-mpwg.web.cern.ch/lhc-mpwg/MPP-Meetings/No40-04-03-2011/SMP_at_MPP_1v1.pptx [8] B. Todd et al, “Review 2012 – Operational Availability & Efficiency” https://indico.cern.ch/internalPage.py?pageId=0&confId=211614 [9] B. Todd et al, “Performance & Availability of MPS 2008 – 2012” https://indico.cern.ch/conferenceOtherViews.py?confId=227895 [10]
50
CERN benjamin.todd@cern.ch Hidden Faults 50 A worked example of potential dormant failure…
51
CERN benjamin.todd@cern.ch Hidden Faults 51 275 hardware inputs, 4 software inputs
52
CERN benjamin.todd@cern.ch Hidden Faults 52 136 (48%) never triggered 53 (19%) triggered once 564 (>50%) beam aborts from 12 inputs 165 x Operator Buttons 148 x Programmable Dump 93 x BPM (IR6) 49 x SIS 45 x BLM (SR7) 43 x RF 21 x PIC (US15) testing & maintenance plan needed - periodically ensure function. 564 (>50%) beam aborts from 7 systems: 275 hardware inputs, 4 software inputs
53
CERN benjamin.todd@cern.ch Software versus Programmable Logic 53
54
CERN benjamin.todd@cern.ch 54 2006 – 2012 BIS reliability not enough data yet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.