12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004.

Slides:



Advertisements
Similar presentations
Quantitative and Scientific Reasoning Standard n Students must demonstrate the math skills needed to enter the working world right out of high school or.
Advertisements

System Development Life Cycle (SDLC)
 Acceptance testing is a user-run test that demonstrates the application’s ability to meet the original business objectives and system requirements and.
OSHA’s Voluntary Protection Program (VPP) Job Hazard Analysis Mishap reporting 1 This class is only intended to familiarize you with the programs in place.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
1 Computer Technician Computer Trouble Shooting & Repair Process Copyright © Texas Education Agency, All rights reserved.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Complete CompTIA A+ Guide to PCs, 6e Chapter 5: Logical Troubleshooting © 2014 Pearson IT Certification
Help Desk Troubleshooting Computer Problems. 2 Certificate III Software Applications Troubleshooting Computer Problems Solving computer problems is one.
Ken KUSUKAMI Director Safety Research Laboratory Research and Development Center of JR East Group East Japan Railway Company Development of human factors.
Reliability and Safety Lessons Learned. Ways to Prevent Problems Good computer systems Good computer systems Good training Good training Accountability.
Overview Lesson 10,11 - Software Quality Assurance
1 SWE Introduction to Software Engineering Lecture 28 – Introduction to Software Testing.
Introduction to Software Engineering CS-300 Fall 2005 Supreeth Venkataraman.
DITSCAP Phase 2 - Verification Pramod Jampala Christopher Swenson.
Distributed Deadlocks and Transaction Recovery.
Incident Reporting Procedure
“Knowing Revisited” And that’s how we can move toward really knowing something: Richard Feynman on the Scientific Method.
Hands-On Microsoft Windows Server 2003 Administration Chapter 2 Managing Windows Server 2003 Hardware and Software.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
1 ISA&D7‏/8‏/ ISA&D7‏/8‏/2013 Systems Development Life Cycle Phases and Activities in the SDLC Variations of the SDLC models.
Protecting the Public, Astronauts and Pilots, the NASA Workforce, and High-Value Equipment and Property Mission Success Starts With Safety Believe it or.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Computer Forensics Principles and Practices
Design Process 중앙대학교 전자전기공학부. Design for Electrical and Computer Engineers 2. Design Process  Engineering : Problem solving through specialized scientific.
ECE 2300 Circuit Analysis Lecture Set #6 The Node Voltage Method with Voltage Sources.
10-January-2003cse Context © 2003 University of Washington1 What is a development project? CSE 403, Winter 2003 Software Engineering
Space Systems Engineering Database (SSED) Seminar on Aerospace Mishaps and Lessons Learned 2004 MAPLD Conference 8 September 2004 Jon Binkley (310) 336.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Thomson South-Western Wagner & Hollenbeck 5e 1 Chapter Sixteen Critical Thinking And Continuous Learning.
12004 MAPLDVHDL Synthesis Introduction VHDL Synthesis for High-Reliability Systems 2004 MAPLD International Conference Washington, D.C. September 7, 2004.
LESSON 3. Properties of Well-Engineered Software The attributes or properties of a software product are characteristics displayed by the product once.
What is an accident and why should it be investigated?
1 System Clock and Clock Synchronization.. System Clock Background Although modern computers are quite fast and getting faster all the time, they still.
Unit 1 Lesson 2 Scientific Investigations
12005 MAPLDDesign Integrity - Introduction Design Integrity 2005 MAPLD International Conference Washington, D.C. September 6, 2005.
18 th GIST Meeting 14 th –16 th May 2003 Ground Segment (GGSPS) Report 1 GERB Ground Segment B.C.Stewart RAL.
Unit 1 Lesson 2 Scientific Investigations Testing, Testing, 1, 2, 3 What are some parts that make up scientific investigations? Scientists investigate.
Mr. Bob Hahn Associate Director, School of Aviation Safety Naval Aviation Schools Command NAS Pensacola MISHAP PREVENTION EDUCATION IN NAVAL AVIATION.
WHAT IF ANALYSIS USED TO IDENTIFY HAZARDS HAZARDOUS EVENTS
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
SwCDR (Peer) Review 1 UCB MAVEN Particles and Fields Flight Software Critical Design Review Peter R. Harvey.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Two day training seminar Introduction. Cathodic Protection Engineering This seminar is designed to update those are responsible for executive decisions.
Toolbox Meetings What is a toolbox meeting? An informal 5 to 15 minute meeting held by supervisors used to promote safety.
Troubleshooting Windows Vista Lesson 11. Skills Matrix Technology SkillObjective DomainObjective # Troubleshooting Installation and Startup Issues Troubleshoot.
WORK STUDY HOW THE TOTAL TIME OF A JOB IS MADE UP (IE411)
CONDUCTING TEST ON THE INSTALLED COMPUTER SYSTEM
Science Fair Information Night
Chapter 33 Introduction to the Nursing Process
Product Validation Adapted from the NASA Systems Engineering Handbook for CSULB EE 400D by Alia Bonetti.
Scientific Investigations
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
To Do: Monday bell ringer
9/4 Today you will need… Pick up paper from side table
Professional Practices
Why study Software Design/Engineering ?
Module 3 – Part 2 Node-Voltage Method with Voltage Sources
Module 3 – Part 4 Mesh-Current Method with Current Sources
CIS 510 Education for Service-- snaptutorial.com.
CIS 510 Teaching Effectively-- snaptutorial.com
Programmable Logic Controllers (PLCs) An Overview.
Complete CompTIA A+ Guide to PCs, 6e
Science Fair Information
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
Unit 1 Lesson 2 Scientific Investigations
Chemistry Lab Reports.
Computer in Safety-Critical Systems
Presentation transcript:

12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004

22004 MAPLDAerospace Mishaps and Lessons Learned "... most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices." Nancy Leveson in Safeware, 1995.

32004 MAPLDAerospace Mishaps and Lessons Learned Seminar Program TimeSpeakerAffiliationMishap Title 9:00Richard Katz NASA Office of Logic Design Introduction 9:15Faith ChandlerNASA HQUsing Root-Cause Analysis to Understand Failures 10:00Jonathan F BinkleyAerospace Corp.The Space System Engineering Database (SSED) 10:45BREAK 11:00Owen BrownDARPAApollo 13 Mishap 12:00Kathryn Anne WeissMITAn Analysis of Causation in Aerospace Accidents 12:45LUNCH 1:30Susan C. LeeJHU/APL The Near Earth Asteroid Rendezvous (NEAR) Rendezvous Burn Anomaly 2:45Rick ObenschainNASA GSFCSEASAT: Lessons Learned and Not Learned 3:30BREAK 3:45Keith E. Van TasselNASA JSCSTS-86/SAFER 4:30Paul ChengAerospace Corp Aerospace 100 Questions That Should Be Asked During Technical Reviews 5:15Keith AveryMission Research Corp.STRV-1c/1d Mishap 6:00SESSION ENDS

42004 MAPLDAerospace Mishaps and Lessons Learned Training vs. Education The NASA Office of Logic Design works to educate design engineers, not train them. –Training promotes rote responses –Education promotes thinking and the ability to adapt to and cope with new situations. Hence, MAPLD hosts seminars and not training sessions.

52004 MAPLDAerospace Mishaps and Lessons Learned Design Seminars These case studies are real and are not contrived examples. Many of the leaders have first hand knowledge of these mishaps. Contribute: Discuss the topics presented, disagree with them, present interesting cases you wish to share, additional lessons, or alternative viewpoints. Do not sit there quietly and expect to be treated like a cocker spaniel being trained and drilled to emit Pavlovian responses in response to stimuli (bell for dogs, donuts for engineers).

62004 MAPLDAerospace Mishaps and Lessons Learned Material Material will be made available on –CD-ROM –Hardcopy –klabs.org All public domain, you may use the material as you wish.

72004 MAPLDAerospace Mishaps and Lessons Learned I Was Reading AW&ST … Aviation Week & Space Technology, August 23/30, 2004, pp

82004 MAPLDAerospace Mishaps and Lessons Learned Barto's Law: Every circuit is considered guilty until proven innocent.

92004 MAPLDAerospace Mishaps and Lessons Learned A Recent Mishap (that gave me the idea for this seminar)

MAPLDAerospace Mishaps and Lessons Learned Background Popular single board computer Everything was working fine Ran vibration test –Unpowered and unmonitored Subsequently failed to boot intermittently –Testing at manufacturer’s also showed intermittent failures, although at a lower rate than observed at the contractor.

MAPLDAerospace Mishaps and Lessons Learned Project’s Corrective Action Unit (S/N 031) pulled from the flight instrument New unit (S/N 034) installed in the flight instrument Repeated testing with the new unit was successful Signed off, ready for launch

MAPLDAerospace Mishaps and Lessons Learned Risk Reduction Effort Reviewed problem/failure report –No root cause or failure mechanism identified –Conclusion of the Verification and Analysis Section stated: –No direct or indirect evidence given in the “Verification and Analysis” section to support a workmanship issue. –No analysis given to show that the workmanship problem was not systemic to all units. Since the unit is clearly marginal and it is difficult to make fail, it is not shown that other units have sufficient margin to support operation in all operating environments over the design life of the unit. … Each time there was a failure to boot, the power was cycled and the computer subsequently rebooted. The result of the testing at XXXXXX was that the most probable cause of the boot failure was a workmanship issue specific to SN034 and is not endemic to the XXXXXXXX computer and therefore does not affect SN031.

MAPLDAerospace Mishaps and Lessons Learned Risk Reduction Effort –Note: the “analyst” consistently remarks that after a failed boot the next power cycle results in correct operation of the board. Yet the board fails multiple times. This is evidence of the “PC mentality” seen in many Projects where, when there is a problem, the solution is to switch the power off and back on to “correct it.” –Contractor and Project claimed repeatedly that the unit was troubleshot and nothing more could be done.

MAPLDAerospace Mishaps and Lessons Learned Let’s Take a Closer Look Examination of failures at manufacturer –The failures reported were a result of test equipment; there was zero failures detected at the manufacturer –Intermittent operation of the computer could not be supported. Electrical environment suspicion grows –“What if” analysis results in a large number of possible failure mechanisms

MAPLDAerospace Mishaps and Lessons Learned Let’s Take a Closer Look Examination of troubleshooting at contractor –Previously claimed fully troubleshot –Examination shows that no oscilloscope probe ever touched the board Examined at interface points only –Throughout organization “failures to boot” were routine Many failures reports written over many units. –Contractor did not use available diagnostic signals and port to ascertain status of the CPU and computer

MAPLDAerospace Mishaps and Lessons Learned Troubleshooting Again Contractor fought hard to prevent –Stalled effort for many months Initial examination showed that the protection signals for the EEPROM memories did not behave as predicted by the analysis –Contractor would not show the analysis Examination of diagnostic signals quickly showed that the CPU had halted

MAPLDAerospace Mishaps and Lessons Learned Troubleshooting Results Cause of failure determined –Known issue with pipeline timing –Software service routines not installed to handle all conditions –Project previously had assured the independent review that software was installed to handle all conditions Did not fail at manufacturer since test software installed properly handled the interrupt from the pipelining issue No support for “a workmanship issue specific to SN034 …” Flight software rewritten

MAPLDAerospace Mishaps and Lessons Learned Lessons and Suggestions Problem/Failure Reports –Examine original documents. –Request and examine all related P/FRs from all units Provide direct evidence (at a minimum!) for determination of the cause of failure –Intermittent’s after vibration test led to the conclusion of a workmanship error; the “bad solder joint” was never identified –“Failures” at the manufacturer reinforced the false conclusion as those “failures” were not examined in detail and were a result of a testing error. Do not conduct reviews in a board room with PowerPoint slides –Pack up your oscilloscope and go into the lab

MAPLDAerospace Mishaps and Lessons Learned Enjoy your seminar!