Beam Loss Monitor System – Preliminary Feedback November 11 th, 2010.

Slides:



Advertisements
Similar presentations
Verification and Validation
Advertisements

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
1 / 24 CS 425/625 Software Engineering Software Evolution Based on Chapter 21 of the textbook [SE-8] Ian Sommerville, Software Engineering, 8 th Ed., Addison-Wesley,
Testing an individual module
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Tony Gould Quality Risk Management. 2 | PQ Workshop, Abu Dhabi | October 2010 Introduction Risk management is not new – we do it informally all the time.
Purpose of the Standards
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
Vegard Joa Moseng BI - BL Student meeting Reliability analysis summary for the BLEDP.
Lecture 1.
LECC 2006 Ewald Effinger AB-BI-BL The LHC beam loss monitoring system’s data acquisition card Ewald Effinger AB-BI-BL.
Issues on Software Testing for Safety-Critical Real-Time Automation Systems Shahdat Hossain Troy Mockenhaupt.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
The Research Problem and Objectives Lecture 6 1. Organization of this lecture Research Problem & Objectives: Research and Decision/Action Problems Importance.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 11 Slide 1 Architectural Design.
Airbus flight control system  The organisation of the Airbus A330/340 flight control system 1Airbus FCS Overview.
Airbus flight control system
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
CS 4310: Software Engineering
An Introduction to AlarmInsight
S/W Project Management
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
TESTING.
No: 1 CEMSIS 1 WP3 - Use of pre-developed products Key issues N. Thuy EDF R&D.
1 Software testing. 2 Testing Objectives Testing is a process of executing a program with the intent of finding an error. A good test case is in that.
Response to TeV BLM Review Report (Beams Doc 1147) 1 Recommendation # 1: …the BLM upgrade be done Response to #1: Great - thankyou for this endorsement.
Testing Vs. Inspection Research Paper Diala T. Gammoh, Ph.D. Student Dr. Damla Turgut, Ph.D. University of Central Florida, Orlando Florida
Radiation Tolerant Electronics New Policy? Ph. Farthouat, CERN.
B. Todd et al. 25 th August 2009 Observations Since v1.
Defining and applying mitigating and aggravating circumstances. Relevant changes to the amount of fine. Defining and applying mitigating and aggravating.
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Chapter 19 Verification and Validation.
CLIC Implementation Studies Ph. Lebrun & J. Osborne CERN CLIC Collaboration Meeting addressing the Work Packages CERN, 3-4 November 2011.
PLC Workshop at ITER, 4-5 th of December 2014 A. Nordt, ESS, Lund/Sweden.
LESSON 3. Properties of Well-Engineered Software The attributes or properties of a software product are characteristics displayed by the product once.
Input Design Lecture 11 1 BTEC HNC Systems Support Castle College 2007/8.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Acquisition Crate Design BI Technical Board 26 August 2011 Beam Loss Monitoring Section William Vigano’ 26 August
Radiation Tolerant Electronics Expected changes Ph. Farthouat, CERN.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
IAEA International Atomic Energy Agency Methodology and Responsibilities for Periodic Safety Review for Research Reactors William Kennedy Research Reactor.
Vegard Joa Moseng BI - BL Student meeting Reliability analysis of the Input Monitor in the BLEDP.
Smart Home Technologies
SUMMARY: EXTERNAL REVIEW RESULTS FOR THE LHC BLM SYSTEM Christos Zamantzas for the BLM team. Machine Protection Panel 24/06/2011.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
The Research Problem and Objectives Lecture 6 1. Organization of this lecture Research Problem & Objectives: Research and Decision/Action Problems Importance.
1 Commissioning and Early Operation – View from Machine Protection Jan Uythoven (AB/BT) Thanks to the members of the MPWG.
BEAM INSTRUMENTATION GROUP DEPENDABILITY APPROACH CERN, Chamonix 26th January 2016 William Viganò
Failure Modes and Effects Analysis (FMEA)
LHC machine protection close-out 1 Close-out. LHC machine protection close-out 2 Introduction The problem is obvious: –Magnetic field increase only a.
Eva Barbara Holzer MPP, CERN July 31, Eva Barbara Holzer, CERN MPP CERN, July 31, 2009 BLM System Audit Sequel.
Software Maintenance1 Software Maintenance.
Organizations of all types and sizes face a range of risks that can affect the achievement of their objectives. Organization's activities Strategic initiatives.
2007 IEEE Nuclear Science Symposium (NSS)
Software Testing.
Dependability Requirements of the LBDS and their Design Implications
The Systems Engineering Context
What is performance management?
CS 425/625 Software Engineering Software Evolution
Remote setting of LHC BLM thresholds?
BEAM LOSS MONITORS DEPENDABILITY
Chapter 13 Quality Management
Software Testing “If you can’t test it, you can’t design it”
Configuration of BLETC Module System Parameters Validation
Close-out.
Presentation transcript:

Beam Loss Monitor System – Preliminary Feedback November 11 th, 2010

2 Purpose and scope This review will seek to:  assess the adequacy of the overall BLM system design with a focus on the programmable parts  identify possible weaknesses in the programmable parts of the mission-critical BLM  suggest activities that could increase the level of confidence that the programmable parts of BLM system performs as intended  suggest potential improvements of the BLM  provide a general comparison of the BLM with approaches in industrial systems. The scope of this review is also limited to a consideration of:  Potential sources of unsafety within the BLM, where the detection of an amount of particle losses that has the potential to quench the magnet is not relayed to the Beam Interlock System, resulting in a ‘missed generation of beam dump trigger’ and potentially machine damage.  Potential sources of unavailability, where failure of the BLM leads to a request to the Beam Interlock System to dump the beam, resulting in a ‘false dump trigger’and some machine downtime.

3 Novel solutions in the design Dynamic range of detectors is broad and required novel approach to capture and monitor this information Has introduced some novel solutions to solve problems, e.g., use of ADC to increase dynamic range. Compared to BIS which is very much all just proven technology. Novelty is usually avoided in critical systems but in this case it is a necessary novelty – not just novelty for the sake of being novel

4 Fault Tolerance is substantial Fault tolerance of critical data path  Sensors  Communication links  Computation and decision making components Many strong and impressive aspects to the fault tolerance, eg.,:  Using CRC check + 8b/10b protocol for optical links, redundant of optical links  reading out threshold values from FPGA every minute and checking CRC  ~4000 sensors, infrequent cases of high loss detected by only one sensors  Dozens of other examples…

5 Critical data path only partially redundant Substantial, but not “end to end” as per other finding Some computation and decision making components (such as threshold comparison in the surface FPGA) is not redundant

6 BLMS has mixed purposes This system seems to have started its life as a measurement system; it has evolved towards being a critical system Possibly this affected some early decisions about budget and design Going forward, does management regard the BLMS to primarily be a measurement system, with a secondary role as a contribution to machine protection?  Seems to be some uncertainty or mixed messages in this regard

7 Critical and non-critical function mixed Mix of critical and non-critical functionality in design  50% (?) of the logic on the surface card is not related to machine protection A common practice in industry is to separate the critical and non-critical functionalities  To guarantee high availability of resources required for critical functions  To avoid the possibility of negative effects on the critical functionality by non-critical functionality

8 % Logic Used on FGPAs is Very High At first glance, the % logic used is a very big concern. Non-technical constraints appear to have forced this threshold to be exceeded, which may be regrettable in the long term for various reasons The designers used ingenuity to find a solution. Workarounds (i.e., squeezing more logic than desired into the FPGA) appears to be done carefully with attention to risks But could be a problem in the long run, with respect to maintainability, e.g., changes to the design, re-use and porting to new hardware. Now “boxed in” to a very small corner. Hard to imagine going beyond 95% and yet they seems to what to add more

9 No (Current) Functional Specification There is no one document that specifies the functional behaviour for the current system, as now deployed Various documents (e.g., published articles) capture the intent for the functional behaviour but this is fragmented and scattered What is the basis of design testing in the absence of a functional specification? (It depends too much on knowledge stored in people’s head) Even a modest effort could produce a comprehensive and up-to-date functional specification that would provide linkage between intent and design

10 Design testing is impressive Very impressive effort to test the design using complementary levels of testing, e.g.,:  Radiation testing of Tunnel cards  Surface Board uses Simulation, Hardware based- testing, Software-Based testing  Combiner card

11 Design Testing is not sustainable While the testing is impressive, it is not always documented and therefore not sustainable There is no master test plan The group’s philosophy seems to be automation over documentation but will anyone know what to do when there is a change to the design

12 “Proof Testing” is impressive Very impressed with the approach taken in test the proper operation of the detectors each 12 hours* For example, basic connectivity test has evolved into a much more comprehensive test for proper function that has already been beneficial Importance of this testing is amplified by fact that detectors are a single point of failure (in the worst case) and some failures will mask a dangerous loss, i.e., a possible failure mode is indistinguishable from low count Will proof testing with CS source in tunnel be performed on a regular basis?? * To be precise, a new fill is not allowed if this proof testing has not been performed within the previous 12 hours

13 Some known limitations Limit of dynamic range close to the Injection Points Others?

14 Implications of Operating at Higher Energy Levels Noise  The current dynamic range for ADC may result in detecting noise when operating in nominal energy  One possible solution is using ASIC instead of ADC  The group is aware of these problems and is looking at ways to address them

15 Human Error Typically there would be limits on how much a critical value can change in one interaction For the BLMS this is not true for threshold values in the Master table

16 Maintainability Aspects General Principles of Software Validation; Final Guidance for Industry and FDA Staff – January 11, 2002 Many safety problems are introduced by maintenance actions For example, US FDA analysis of 3140 medical device problems between 1992 and 1998 showed that 7.7% were due to software failures and of these, 192 (79%) were caused by defects introduced after initial production and distribution

17 Maintainability Aspects Over the long term (the next 20 years?), it could be difficult to maintain the system Currently, depends far too much on knowledge in people’s head such as:  smart optimizations (in VHDL) not easily understandable,  various levels of testing BLMS team shares this concern:  seems to favor automation over documentation as a means of addressing this concern  However some documents produced such as “Management Procedures of the BLM System Settings”

18 Maintainability Aspects BLMS already close to the physical limits of some components (Surface FPGA). In this context, any change to the design carries some significant risk and will be challenging for the maintainers

19

20 Percentage of Logic used on FPGAs Approx. 85% for the Tunnel card (BL…) Approx 95% for the Surface card (BLETC)

21 Percentage of Logic used on FPGAs Using too much of the logic on an FPGA is not recommended because:  Heat related issues  Placing and routing become more difficult and performance  ‘Performance’ degradation: the output may become unreliable at the intended clock rate The suggested upper limit we have heard is 70%