Why Do Airplanes Crash? An Open Source Air Data Inertial Reference Unit Investigation *** 2012 PSU/Galois Capstone Project Chris Andrews, Trang Nguyen,

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Principles of Engineering System Design Dr T Asokan
Categories of I/O Devices
Computer Architecture
Chapter 8 Fault Tolerance
Byzantine Generals. Outline r Byzantine generals problem.
EECE499 Computers and Nuclear Energy Electrical and Computer Eng Howard University Dr. Charles Kim Fall 2013 Webpage:
Chapter 2Test Specification Process. n Device Specification Sheet – Purpose n Design Specification – Determine functionality of design n Test List Generation.
Why Do Airplanes Crash? An Open Source Air Data Inertial Reference Unit Investigation *** 2012 PSU/Galois Capstone Project Chris Andrews, Trang Nguyen,
The Byzantine Generals Problem Boon Thau Loo CS294-4.
Justin Akerman (CE) Chris Hoerbelt (ME) Hersh Anand (EE) Anthony Poli (ME) Greg Mucks (ME) Jon Notaro (EE) Eric Tripp (EE) Customer: Stan Rickel.
Byzantine Generals Problem: Solution using signed messages.
Byzantine Generals Problem Anthony Soo Kaim Ryan Chu Stephen Wu.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
8. Fault Tolerance in Software 8.5 Construction of Acceptance Tests Goal Goal: describe the types and selection criteria for acceptance tests Two levels.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Software Testing for Safety- Critical Applications Presented by: Ciro Espinosa & Daniel Llauger.
CHAPTER 9: Input / Output
©Ian Sommerville 2006Critical Systems Slide 1 Critical Systems Engineering l Processes and techniques for developing critical systems.
Airbus flight control system  The organisation of the Airbus A330/340 flight control system 1Airbus FCS Overview.
Airbus flight control system
Patrick Lazar, Tausif Shaikh, Johanna Thomas, Kaleel Mahmood
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Software Engineering Dr. K. T. Tsang
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
1 System Models. 2 Outline Introduction Architectural models Fundamental models Guideline.
1 INS Data Collection System For the Quarterly Review of the NASA/FAA Joint University Program for Air Transportation Research Wednesday October 10 th,
Khaled A. Al-Utaibi  Interrupt-Driven I/O  Hardware Interrupts  Responding to Hardware Interrupts  INTR and NMI  Computing the.
Sérgio Ronaldo Barros dos Santos (ITA-Brazil)
INS Data Collection System Presenter: Curtis Cutright Advisor: Dr. Michael Braasch.
Autonomous Helicopter James LydenEE 496Harris Okazaki.
Jon Perez, Mikel Azkarate-askasua, Antonio Perez
Practical Byzantine Fault Tolerance
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
REAL-TIME SOFTWARE SYSTEMS DEVELOPMENT Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Page 1 Analysis of Asynchronous Systems Steven P. Miller Michael W. Whalen {spmiller, Advanced Computing Systems Rockwell.
Safety-Critical Systems 7 Summary T V - Lifecycle model System Acceptance System Integration & Test Module Integration & Test Requirements Analysis.
Why Do Airplanes Crash? An Open Source Air Data Inertial Reference Unit Investigation *** 2012 PSU/Galois Capstone Project Chris Andrews, Trang Nguyen,
Hwajung Lee. One of the selling points of a distributed system is that the system will continue to perform even if some components / processes fail.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
CSE 60641: Operating Systems Implementing Fault-Tolerant Services Using the State Machine Approach: a tutorial Fred B. Schneider, ACM Computing Surveys.
Chapter 11 Fault Tolerance. Topics Introduction Process Resilience Reliable Group Communication Recovery.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Why Do Airplanes Crash? Investigating Air Data Inertial Reference Units Department of Electrical and Computer Engineering INTRODUCTION Modern aircraft.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
"... To design the control system that effectively matches the plant requires an understanding of the plant rivaling that of the plant's designers, operators,
Introduction to Fault Tolerance By Sahithi Podila.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Fault Tolerance
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
ARTEMIS SRA 2016 Trust, Security, Robustness, and Dependability Dr. Daniel Watzenig ARTEMIS Spring Event, Vienna April 13, 2016.
Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Chapter 2: Computer-System Structures
Faults and fault-tolerance
FAULT TOLERANCE TECHNIQUE USED IN SEAWOLF SUBMARINE
Fault Tolerance In Operating System
Module 2: Computer-System Structures
Fault Tolerance Distributed Web-based Systems
Faults and fault-tolerance
Introduction to Fault Tolerance
Knowing When to Stop: An Examination of Methods to Minimize the False Negative Risk of Automated Abort Triggers RAM XI Training Summit October 2018 Patrick.
Mark McKelvin EE249 Embedded System Design December 03, 2002
Module 2: Computer-System Structures
Abstractions for Fault Tolerance
Presentation transcript:

Why Do Airplanes Crash? An Open Source Air Data Inertial Reference Unit Investigation *** 2012 PSU/Galois Capstone Project Chris Andrews, Trang Nguyen, Mark Craig, Kayla Seliner

Presentation Air Data Inertial Reference Unit Our Project: building an open source ADIRU Overview: what is an ADIRU? Motivation: why are they important? Fault Tolerance: types of faults. Approach: voting methods. Design: hardware and software architecture. Results Conclusion 2

Project Goals Construct a small low power ADIRU system to deploy on an RC aircraft Implement a Byzantine fault tolerant algorithm on a system of multiple microprocessors (voters) and sensors. Use input from multiple sensors including gyroscopes, accelerometers, GPS, and airspeed. Use open source hardware and software when possible. 3

Air Data Inertial Reference Units are an essential component in modern avionics systems The ADIRU system collects and processes sensor values from accelerometers, gyroscopes, altimeters and airspeed indicators and functions as the single source of sensor data aboard the aircraft. Many commercial aircraft including the Airbus A330 and Boeing 777 implement ADIRU units as part of their avionics suite. The Air Data Inertial Reference Unit may itself be triple redundant. The ADIRU system replaces earlier fault tolerant triple modular redundant systems. Autopilot and unstable flight regimes depend upon valid and uninterrupted sensor data for safe flight. 4

TMR vs. ADIRU 5

Benefits of ADIRU Systems Redundancy: redundant sensors make system less vulnerable to single sensor failure. Modularization and fault containment: the ADIRU system is the single source of sensor data for all the cockpit instruments and avionics software on aircraft. Deferred maintenance: sufficient margin of safety may be preserved in some systems to operate with small number of faulty components and avoiding expensive emergency repairs 6

ADIRU Vulnerabilities System complexity Closed source, proprietary system 7

T riple M odular R edundant System Votes on outputs of three redundant sensors. System can tolerate single sensor fault. Relatively simple to implement and diagnose. B yzantine F ault T olerant System At least 4 different voters each with a sensor. Tolerates fault in sensor or in voter. F faults require 3F+1 voters with sensors. Requires complex voting algorithm. Can survive class of faults not dealt with by TMR. 8

ADIRU failures are a critical event With serious consequences if the Aircraft is not in a visual flight mode. [1] Retrieved May 24, 2012 from uv2Uml0vQjTVrW6z0zXHMhr6MdlZkyQJhHD5D5h_2vwZA

Air France Flight 447 On May 31st, 2009, Air France Flight 447 enroute from Rio de Janeiro to Paris crashed into the atlantic ocean killing all passengers. 10 Retrieved June 1, 2012 from: cdn.blogs.sheknows.com/ thewire.sheknows.com/2011/05/airfrance447.jpg

Corrupted Sensor Data: pitot tubes blocked with ice transmit Byzantine faults to ADIRU. Loss of Control: Autopilot disengages. Flight crew receive erratic and inconsistent airspeed data and stall the aircraft. No Recovery: Flight crew fails to recover from stall because crew cannot determine actual airspeed. Flight computer does not restart. Aircraft free falls into Atlantic. Sequence of Events Leading To Crash 11

Qantas Flight 72 On October 7th, 2008, Qantas Flight 72 enroute from Singapore to Perth suffered a malfunction in the ADIRU and flight computer causing a series of rapid descents that threw passengers and crew about the cabin. 12

Sequence of Events Leading To Incident ADIRU failure: A “spiky” series of measurements from the angle of attack sensors that measure aircraft pitch in relation to airflow exploited a vulnerability in the ADIRU software. Bad data is output to the flight controller from one of the ADIRU units. Flight Computer software failure: The flight computer under autopilot fails to filter the bad data and executes an abrupt dive of -0.8G. The flight crew disengages autopilot and makes an emergency landing at Learmouth, Western Australia. 13

How ADIRU Systems Fail Failure of ADIRU may be intermittent and cause cockpit instrumentation to send contradictory warnings (stall and high speed). ADIRU is the root of all sensor data for flight avionics. Failure in the ADIRU can instantly propagate throughout flight control system. Failures of the ADIRU system effect both autopilot and manual flight modes 14

Multiple Sources of Failure Human Causes: Deferred maintenance can cause errors to accumulate until the ADIRU system fails. Environmental Causes: ADIRU systems interface with physical sensors outside the cabin that can be effected by ice and environmental conditions. Software: Software may hide bugs that appear under anomalous conditions. Most accidents have multiple causes. 15

Types of Faults Fail Silent: system fails to send data. This fault is masked by a redundant system Byzantine Failure: system sends arbitrary data including different data to different controllers. This fault cannot be masked by simple redundancy. 16

Project Requirements Exhibit Byzantine and fail silent fault tolerance Include fault injection Must be able to mask faults System must be expandable Must follow open source guidelines 17

Build a four redundant network using arduino microcontrollers polling gyroscopes and accelerometers. Network with an I2C bus. 18

Features: I2C and Power Bus Environmental Enclosure Separate board for power supply 19

20

Reasons For Choosing Arduino Open Source Hardware and Software Large community of developers Libraries for I2C communication already exist Lowest hardware entry cost to develop a multi-module fault tolerant system Quickest start time (no hardware development necessary) 21

Arduino ArduIMU+V3 Features: Atmega 328 uP 3D Accelerometer and 3D Gyroscope 3D Magnetometer 22

23

Software Algorithm Clock Synchronization Multi-Master I2C bus Byzantine Algorithm Fault injection 24

25

 Safety critical systems should be able to handle failures of one or more of its components and continue to operate correctly.  Byzantine faults consist of one or more components or subsystems sending inconsistent data to other components and subsystems.  Handling these type of failures is known as the Byzantine Generals Problem. 26

 The Byzantine generals problem guarantees fault tolerant behavior under the following premises.  All loyal generals decide upon the same plan of action  A small number of traitors cannot cause the loyal generals to adopt a bad plan.  More than 2/3 of the generals must be loyal.  Must have 3*N + 1 generals to handle N traitors. 27

 General sends command to N-1 lieutenants.  All loyal lieutenants obey the same command.  If the general is loyal, then every loyal lieutenant obeys the command he sends.  Each lieutenant communicates the command they received from the general to each other.  Each lieutenant reaches a decision based on a majority vote of the commands received from the general and all other lieutenants. 28

29

30

31

32

33

34

35

 Sensor reads are interrupt driven.  Must synchronize clocks for all modules to ensure an “apples-to-apples” comparison of sensor values.  Variable used to synchronize all modules is the Timer/Compare interrupt counter.  By ensuring the counter is the same on all modules we can ensure that the interrupt that drives sensor reads occurs at the same time in all modules. 36

 One module is dedicated to synchronizing the clocks of all other modules.  Accuracy of clock synchronization is determined by Timer Interrupt clock speed and is approximately:  Timing of clock synchronization cycles is set so that each device is synchronized to the master every few data cycles.  This helps to ensure a tight synchronization as well as lessen the interference of data processing. 37

38

 The output displays the original clock value, the clock value from the master, the offset, and the new clock value.  The offset is “0” because a delay of “1” was calculated. 39

Results System exhibits Byzantine fault tolerance. A system that is BFT requires 3F+1 voters. System masks fail silent faults (need a graphic to show this) 40

Budget 41

Conclusion Our system exhibits basic fault tolerant functionality. It demonstrates the feasibility of an open source fault tolerant project. 42

Further Work Integrate GPS, magnetometer, altimeter and other sensors into the system. Implement kalman filters in the SW to smooth out sensor noise. Gather real data by launching aboard a vehicle. 43

Lessons Learned Interrupt routines on microcontrollers Debugging methods Code development: algorithm>python>C How to organize a large project involving hardware and software Documentation 44

Acknowledgements We would like to thank our sponsors: Dr. Lee Pike and Galois Inc. We also acknowledge the help of our advisor: Dr. Christof Teuscher Portland State University 45

References 2. 46