Download presentation
Presentation is loading. Please wait.
Published byMyra Kelley Modified over 9 years ago
1
12004 MAPLDAerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004
2
22004 MAPLDAerospace Mishaps and Lessons Learned "... most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices." Nancy Leveson in Safeware, 1995.
3
32004 MAPLDAerospace Mishaps and Lessons Learned Seminar Program TimeSpeakerAffiliationMishap Title 9:00Richard Katz NASA Office of Logic Design Introduction 9:15Faith ChandlerNASA HQUsing Root-Cause Analysis to Understand Failures 10:00Jonathan F BinkleyAerospace Corp.The Space System Engineering Database (SSED) 10:45BREAK 11:00Owen BrownDARPAApollo 13 Mishap 12:00Kathryn Anne WeissMITAn Analysis of Causation in Aerospace Accidents 12:45LUNCH 1:30Susan C. LeeJHU/APL The Near Earth Asteroid Rendezvous (NEAR) Rendezvous Burn Anomaly 2:45Rick ObenschainNASA GSFCSEASAT: Lessons Learned and Not Learned 3:30BREAK 3:45Keith E. Van TasselNASA JSCSTS-86/SAFER 4:30Paul ChengAerospace Corp Aerospace 100 Questions That Should Be Asked During Technical Reviews 5:15Keith AveryMission Research Corp.STRV-1c/1d Mishap 6:00SESSION ENDS
4
42004 MAPLDAerospace Mishaps and Lessons Learned Training vs. Education The NASA Office of Logic Design works to educate design engineers, not train them. –Training promotes rote responses –Education promotes thinking and the ability to adapt to and cope with new situations. Hence, MAPLD hosts seminars and not training sessions.
5
52004 MAPLDAerospace Mishaps and Lessons Learned Design Seminars These case studies are real and are not contrived examples. Many of the leaders have first hand knowledge of these mishaps. Contribute: Discuss the topics presented, disagree with them, present interesting cases you wish to share, additional lessons, or alternative viewpoints. Do not sit there quietly and expect to be treated like a cocker spaniel being trained and drilled to emit Pavlovian responses in response to stimuli (bell for dogs, donuts for engineers).
6
62004 MAPLDAerospace Mishaps and Lessons Learned Material Material will be made available on –CD-ROM –Hardcopy –klabs.org All public domain, you may use the material as you wish.
7
72004 MAPLDAerospace Mishaps and Lessons Learned I Was Reading AW&ST … Aviation Week & Space Technology, August 23/30, 2004, pp. 29-30
8
82004 MAPLDAerospace Mishaps and Lessons Learned Barto's Law: Every circuit is considered guilty until proven innocent.
9
92004 MAPLDAerospace Mishaps and Lessons Learned A Recent Mishap (that gave me the idea for this seminar)
10
102004 MAPLDAerospace Mishaps and Lessons Learned Background Popular single board computer Everything was working fine Ran vibration test –Unpowered and unmonitored Subsequently failed to boot intermittently –Testing at manufacturer’s also showed intermittent failures, although at a lower rate than observed at the contractor.
11
112004 MAPLDAerospace Mishaps and Lessons Learned Project’s Corrective Action Unit (S/N 031) pulled from the flight instrument New unit (S/N 034) installed in the flight instrument Repeated testing with the new unit was successful Signed off, ready for launch
12
122004 MAPLDAerospace Mishaps and Lessons Learned Risk Reduction Effort Reviewed problem/failure report –No root cause or failure mechanism identified –Conclusion of the Verification and Analysis Section stated: –No direct or indirect evidence given in the “Verification and Analysis” section to support a workmanship issue. –No analysis given to show that the workmanship problem was not systemic to all units. Since the unit is clearly marginal and it is difficult to make fail, it is not shown that other units have sufficient margin to support operation in all operating environments over the design life of the unit. … Each time there was a failure to boot, the power was cycled and the computer subsequently rebooted. The result of the testing at XXXXXX was that the most probable cause of the boot failure was a workmanship issue specific to SN034 and is not endemic to the XXXXXXXX computer and therefore does not affect SN031.
13
132004 MAPLDAerospace Mishaps and Lessons Learned Risk Reduction Effort –Note: the “analyst” consistently remarks that after a failed boot the next power cycle results in correct operation of the board. Yet the board fails multiple times. This is evidence of the “PC mentality” seen in many Projects where, when there is a problem, the solution is to switch the power off and back on to “correct it.” –Contractor and Project claimed repeatedly that the unit was troubleshot and nothing more could be done.
14
142004 MAPLDAerospace Mishaps and Lessons Learned Let’s Take a Closer Look Examination of failures at manufacturer –The failures reported were a result of test equipment; there was zero failures detected at the manufacturer –Intermittent operation of the computer could not be supported. Electrical environment suspicion grows –“What if” analysis results in a large number of possible failure mechanisms
15
152004 MAPLDAerospace Mishaps and Lessons Learned Let’s Take a Closer Look Examination of troubleshooting at contractor –Previously claimed fully troubleshot –Examination shows that no oscilloscope probe ever touched the board Examined at interface points only –Throughout organization “failures to boot” were routine Many failures reports written over many units. –Contractor did not use available diagnostic signals and port to ascertain status of the CPU and computer
16
162004 MAPLDAerospace Mishaps and Lessons Learned Troubleshooting Again Contractor fought hard to prevent –Stalled effort for many months Initial examination showed that the protection signals for the EEPROM memories did not behave as predicted by the analysis –Contractor would not show the analysis Examination of diagnostic signals quickly showed that the CPU had halted
17
172004 MAPLDAerospace Mishaps and Lessons Learned Troubleshooting Results Cause of failure determined –Known issue with pipeline timing –Software service routines not installed to handle all conditions –Project previously had assured the independent review that software was installed to handle all conditions Did not fail at manufacturer since test software installed properly handled the interrupt from the pipelining issue No support for “a workmanship issue specific to SN034 …” Flight software rewritten
18
182004 MAPLDAerospace Mishaps and Lessons Learned Lessons and Suggestions Problem/Failure Reports –Examine original documents. –Request and examine all related P/FRs from all units Provide direct evidence (at a minimum!) for determination of the cause of failure –Intermittent’s after vibration test led to the conclusion of a workmanship error; the “bad solder joint” was never identified –“Failures” at the manufacturer reinforced the false conclusion as those “failures” were not examined in detail and were a result of a testing error. Do not conduct reviews in a board room with PowerPoint slides –Pack up your oscilloscope and go into the lab
19
192004 MAPLDAerospace Mishaps and Lessons Learned Enjoy your seminar!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.