Fault Tolerant Computing ARIANE 5 FLIGHT 501 FAILURE GARIMELLA SUDHEER EE 585: Fault Tolerant Computing Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing INTRODUCTION Ariane 5 is a launch vehicle developed by the European space agency. The maiden flight of Ariane 5 launcher took place on 4th June 1996. 37 seconds into its ascent at an altitude of 3700m the launcher exploded. An enquiry board submitted its report on the reasons of failure. Introduction about the launcher. Date of the launch and failure and about the enquiry board report. Launched from Kourou, French Guiana was first of its class 5. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing General Description Weather conditions at launch time were acceptable. The launch was initiated at 09h 33mn 59s( Local Time). Nominal behavior of the launcher (36s). Failure of the back up Inertial reference system followed by failure of actual inertial system caused the destruction of the flight. Report submitted by the board. General data given like the weather conditions before launch were given. Self destruction of the launcher correctly triggered by rupture of the links between the solid boosters and the core stage. Destruction occurred near to launch pad at an altitude of 4000m.Debris was scattered over an area of 12 square kilometers east of launch pad. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing ANOMOLIES OBSERVED One Major anomaly was gradual development of variations in the hydraulic pressure of actuators of the main engine nozzle. This anomaly was observed at H0+22 seconds These variations had a frequency of approximately 10Hz. In this slide points relating to physical failure of launcher before destruction are explained. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing Image of the launch. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing Debris of the launcher after explosion. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing ANALYSIS OF FAILURE In general flight control system of the Ariane 5 is of standard design. The attitude of the launcher and its movements in space are measured by an Inertial Reference system (SRI). SRI has its internal computer in which angles and velocities are calculated. The attitude of the launcher and its movements in space are measured by an SRI. It has its own internal computer, in which angles and velocities are calculated on the basis of information from a “strap-down” inertial platform, with laser gyros and accelerometers Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
ANALYSIS OF FAILURE (CONT) The data from the SRI are transmitted through data bus to on board computer. The on-board computer (OBC), executes the flight program and controls the nozzles of the solid boosters and the cryogenic Vulcain engine, via servovalves and hydraulic actuators. The design of the SRI used in Ariane 5 is almost identical to that of Ariane 4 Here how the software failure has occurred has been explained and the kind of failure has been elaborated. At 36.7 seconds after H0 (approx. 30 seconds after lift-off) the computer within the back-up inertial reference system, which was working on stand-by for guidance and altitude control, became inoperative. This was caused by an internal variable related to the horizontal velocity of the launcher exceeding a limit which existed in the software of this computer Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
ANALYSIS OF FAILURE (CONT) software specifications from the Ariane 4, were reused in Ariane 5 but its flight path was different and beyond the range for which the code had been reused. The disintegration occurred due to software exception of On board computer. The software exception was caused during conversion of 64 bit floating point to 16 bit signed integer value. The internal SRI software exception was caused during execution of a data conversion from a 64-bit floating-point number to a 16-bit signed integer value. The value of the floating-point number was greater than what could be represented by a 16-bit signed integer. The result was an operand error. The data conversion instructions (in Ada code) were not protected from causing operand errors, although other conversions of comparable variables in the same place in the code were protected. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing COMMENTS The primary cause for failure scenario are operand error when converting the horizontal bias variable BH. Lack of protection of this conversion which caused the SRI computer to stop. Specifically a 64 bit floating point number relating to the horizontal velocity of the rocket with respect to the platform was converted to a 16 bit signed integer. The number was larger than 32,767, the largest integer storable in a 16 bit signed integer, and thus the conversion failed. The operand error occurred because of an unexpected high value of an internal alignment function result, called BH (horizontal bias), which is related to the horizontal velocity sensed by the platform. This value is calculated as an indicator for alignment precision over time. The value of BH was much higher than expected because the early part of the trajectory of Ariane 5 differs from that of Ariane 4 and results in considerably higher horizontal velocity values. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing OTHER WEAKNESSES The review has covered following areas. - The design of the electrical system - Embedded on board software in subsystem other than the Inertial frame of reference system. -The on board computer and the flight program software Here the major weaknesses of the design are told as given in the report submitted by the enquiry board. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Lessons For Software Engineering Test! Try to write code so that it cannot fail. Don't allow errors or exceptions to propagate in an uncontrolled manner Reused code still needs to be tested. Concluding points. Sudheer Fault Tolerant Computing Garimella Sudheer Fault Tolerant Computing
Fault Tolerant Computing REFERENCES ARIANE 5 FLIGHT 501 FAILURE Report by chairman of enquiry board: Prof.J.L.LIONS. http://www.arianespace.com http://www.vuw.ac.nz/staff/stephen_marshall/SE/Failures/SE_Ariane.html http://www-aix.gsi.de/~giese/swr/ariane5.html Sudheer Fault Tolerant Computing