Download presentation
Presentation is loading. Please wait.
1
Failure in the PATHFINDER Mission
Chandan Kumar EE 585: Fault Tolerant Computing
2
Outline Background Simplified view of H/W architecture
S/W architecture Failure Cause Correction Here I would like to give an outline of what my todays talk is going to be about. I will be talking about the basic details of the pathfinder mission and its objectives, etc. and then discuss in detail about the failure which occurred and how it was corrected. CHANDAN EE 585: Case Study Chandan
3
Background Launched Dec 4 1996 Landed July 4 1997. Mission Objectives:
To prove that the development of "faster, better and cheaper" spacecraft is possible (with three years for development and a cost under US$ 150 million). To show that it is possible to send a load of scientific instruments to another planet with a simple system and at one fifth the cost of a Viking mission. Here I would like to mention some important dates in the mission and also the mission objectives.Launched aboard a delta 2 rocket.After a 7 month journey it landed on Ares Vallis, in a region called Chryse Planitia on Mars, on 4 July 1997. CHANDAN EE 585: Case Study Chandan
4
Background Contd. To demonstrate NASA's commitment to low-cost planetary exploration finishing the mission with a total expenditure of US$ 280 million, including the launch vehicle and mission operations. Demonstrate the mobility and usefulness of a micro rover on the surface of Mars It carried a number of scientific instruments like Mars Pathfinder Lander: Imager for Mars Pathfinder (IMP), (includes magnetometer and anemometer) Atmospheric and meteorological sensors (ASI/MET) A continuation of the obejectives of the mission. It carried a number of scisntific instruments for performing various tests. CHANDAN EE 585: Case Study Chandan
5
Background Contd. Rover Sojourner:
Imaging system (three cameras: front B&W stereo, 1 rear color) Laser striper hazard detection system Alpha Proton X-ray Spectrometer (APXS) Wheel Abrasion Experiment Material Adherence Experiment Accelerometers Potentiometers Final transmission Sept 16500 images sent from lander,550 from rover 15 analysis of rocks. Overview of the instruments on the rover. CHANDAN EE 585: Case Study Chandan
6
Simplified view of Hardware Architecture
Single CPU – Controls the Spacecraft. Resides on VME bus. Interface cards for Radio and Camera. Interface to 1553 bus. 1553 bus connects to ‘cruiser’ and ‘lander’ stages. H/W on Cruiser – controls thrusters .etc H/W on Lander – interface to instruments like accelerometer,radar altimeter and ASI/MET etc. I would like to give a basic overview of the hardware architecture of the pathfinder. It can be viewed as a cpu residing on a vme bus. Vme bus is a type of bus architecture. This cpu is interfaced to various parts as fllows. CHANDAN EE 585: Case Study Chandan
7
The Software Architecture
|< seconds >| |<***************| |********| |**>| |<- bc_dist active ->| bc_sched active | < - bus active - >| |<->| ----| | | |-----|----|--- t t t t4 t5 t1 The *** are periods when tasks other than the ones listed are executing. There is some idle time. t1 - bus hardware starts via hardware control on the 8 Hz boundary. The transactions for the this cycle had been set up by the previous execution of the bc_sched task. t traffic is complete and the bc_dist task is awakened. t3 - bc_dist task has completed all of the data distribution t4 - bc_sched task is awakened to setup transactions for the next cycle t5 - bc_sched activity is complete S/W implemented in 2 tasks. The first task controlled the setup of transactions on the 1553 bus (called the bus scheduler or bc_sched task) and the second task handled the collection of the transaction results i.e. the data. The second task is referred to as the bc_dist (for distribution) task. A typical timeline for the bus activity for a single cycle is shown below. It is not to scale. This cycle was constantly repeated. The bc_sched task is the highest priority task in the system (except for the vxWorks "tExec" task). The bc_dist is third highest (a task controlling the entry and landing is second). All of the tasks which perform other spacecraft functions are lower. Science functions, such as imaging, image compression, and the ASI/MET task are still lower. CHANDAN EE 585: Case Study Chandan
8
The Failure: The spacecraft began experiencing total system resets.
This reset reinitializes all of the hardware and software. It also terminates the execution of the current ground commanded activities. The remainder of the activities for that day were not accomplished until the next day What actually happened to the system. Ie the effect of the failure. CHANDAN EE 585: Case Study Chandan
9
The Cause The Failure - a case of Priority Inversion
In scheduling, priority inversion is the scenario where a low priority task holds a shared resource that is required by a high priority task. This causes the execution of the high priority task to be blocked until the low priority task has released the resource, effectively "inverting" the relative priorities of the two tasks. If some other medium priority task attempts to run in the interim, it will take precedence over both the low priority task and the high priority task. Here I would like to talk about priority inversion which was the root cause for the failure. An idea of PI is as follows. CHANDAN EE 585: Case Study Chandan
10
The Cause Contd. The failure was identified by the spacecraft as a failure of the bc_dist task to complete its execution before the bc_sched task started The ASI/MET task is delivered its information via an interprocess communication mechanism (IPC). IPC mechanism based on using Pipes. The higher priority bc_dist task was blocked by the much lower priority ASI/MET task that was holding a shared resource. Here I want to discuss about how the failure occurred due to the ipc used by Atmospheric struct intsru /met package. CHANDAN EE 585: Case Study Chandan
11
The Cause contd.. The resource that caused this problem was a mutual exclusion semaphore used within the select() mechanism. The ASI/MET task had acquired this resource and then been preempted by several of the medium priority tasks. The bc_dist task attempted to send the newest ASI/MET data via the IPC mechanism which called a Pipe. This pipe blocked taking the semaphore. The ipc of asi/met task used a mutual exclusion semaphore and after that it was prempted by several lower priority tasks. CHANDAN EE 585: Case Study Chandan
12
The Cause contd.. The medium priority tasks ran, still not allowing the ASI/MET task to run, until the bc_sched task was awakened. At that point, the bc_sched task determined that the bc_dist task had not completed its cycle (a hard deadline in the system) and declared the error that initiated the reset. Due to this the Asi/met task was not completed until the schedulin decision for the next cycle had to be taken. As a result an error was declared and the system was reset. CHANDAN EE 585: Case Study Chandan
13
Correction Changing the creation flags for the semaphore so as to enable the priority inheritance Modify the semaphore associated with the pipe used for bc_dist task to ASI/MET task communications corrected the problem. Here I would like talk about how the correction of the code was carried out by the wind river people. CHANDAN EE 585: Case Study Chandan
14
S/W modification on the spacecraft
Patching is a specialised process. Send the difference b/w what you have onboard and what you want on the spacecraft. S/W on the spacecraft modifies the onboard copy. After the completion of the correcting the code then we need to update the s/w onboard also. For this patching process is followed. Here we send the diff btwn both codes ie onboard and the corrected code and s/w onboard changes the s/w. CHANDAN EE 585: Case Study Chandan
15
Questions?? CHANDAN EE 585: Case Study
16
References http://mars.jpl.nasa.gov/missions/past/pathfinder.html
CHANDAN EE 585: Case Study
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.