Failure in the PATHFINDER Mission

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Resource Access Protocols Peter Marwedel Informatik 12 TU Dortmund Germany 2008/12/06.
Advertisements

Chapter 7 - Resource Access Protocols (Critical Sections) Protocols: No Preemptions During Critical Sections Once a job enters a critical section, it cannot.
Priority Inheritance and Priority Ceiling Protocols
Priority Inversion BAE5030 Advanced Embedded Systems 9/13/04.
Real-time Embedded Systems Complex RMS and deadline monotonic scheduling.
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 8 SCHEDULING.
0 Synchronization Problem Resource sharing –Requires mutual exclusion –Critical section A code section that should be executed mutually exclusively by.
CS5270 Lecture 31 Uppaal, and Scheduling, and Resource Access Protocols CS 5270 Lecture 3.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Resource Access Control Protocols.
Mutual Exclusion.
Interprocess Communication
- the new generation realtime operating system For embedded and fault tolerant applications.
Mars Pathfinder Mission Breakthrough on the Surface of Mars.
Mars Rovers. Rover Details A rover is a machine that scientists use to explore the planets in our Solar System. They also use rovers to collect minerals.
Modern Exploration Global Surveyor.  Objectives:  High resolution imaging of the surface  Study the topography and gravity  Study the role of water.
Principles of Engineering System Design Dr T Asokan
CS 552 Spring 2006 Architecture1 Architecture part 1 CS 552 Spring ‘06.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
Networking Theory (Part 1). Introduction Overview of the basic concepts of networking Also discusses essential topics of networking theory.
Inter Process Communication:  It is an essential aspect of process management. By allowing processes to communicate with each other: 1.We can synchronize.
Resource Access Control (Part I) The Mars Pathfinder Incident Resource Model Priority Inversion.
Chapter 11 Operating Systems
UCDavis, ecs251 Fall /23/2007ecs251, fall Operating System Models ecs251 Fall 2007 : Operating System Models #3: Priority Inversion Dr. S.
Modern Exploration Mars Pathfinder  “NASA’s Mars Pathfinder mission – the first spacecraft to land on Mars in more than 20 years and the first ever to.
Timing and Race Condition Verification of Real-time Systems Yann–Hang Lee, Gerald Gannod, and Karam Chatha Dept. of Computer Science and Eng. Arizona State.
Early Spacecraft Exploration Viking  “The scientific goal of the Viking missions is to ‘increase our knowledge of the planet Mars with an emphasis on.
Mars Exploration Rovers. SpiritOpportunity Mars Exploration Rovers  Launch: June 10, 2003  Landed on Mars: January 4  Location: Gusev Crater  Planned.
Astronomy 101 The Solar System Tuesday, Thursday Tom Burbine
Mars Exploration Rovers (MER) Entry, Descent, Landing, and Deployment.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Deadlocks Silberschatz Ch. 7 and Priority Inversion Problems.
Dedicated Systems Experts Martin TIMMERMAN p. 1 Mars pathfinder failure.
Deadlock Detection and Recovery
1 Review of Process Mechanisms. 2 Scheduling: Policy and Mechanism Scheduling policy answers the question: Which process/thread, among all those ready.
Accelerated Long Range Traverse (ALERT) Paul Springer Michael Mossey.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
1 VxWorks 5.4 Group A3: Wafa’ Jaffal Kathryn Bean.
Aquarius Mission Simulation A realistic simulation is essential for mission readiness preparations This requires the ability to produce realistic data,
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
A presentation for Brian Evans’ Embedded Software Class By Nate Forman Liaison Technology Inc. 3/30/2000 For Real-Time Scheduling.
Interlude  Viking mission operations ended in the early 1980s  Viking missions gave scientists the most complete picture of Mars to date. What does this.
Robotics and Autonomy Test Facility - Hardware Verification needs Elie Allouis HRAF Workshop – 28/02/2012.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Early Exploration Viking  “The scientific goal of the Viking missions is to ‘increase our knowledge of the planet Mars with an emphasis on the search.
Brittany Grinner Paul Lim PATHFINDER & SOJOURNER.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
USNA Standard CubeSat Bus USNA-P1 CubeSat (USNA-14)
REAL-TIME OPERATING SYSTEMS
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
RTOS Scheduling 2.0 Problems - Solutions
Soviet Venera Program.
Control Unit Lecture 6.
Topics Covered What is Real Time Operating System (RTOS)
Scheduling and Resource Access Protocols: Basic Aspects
Background on the need for Synchronization
Detlef Koschny Research and Scientific Support Department ESA/ESTEC
Chapter 2: System Structures
Rover Components.
Mars Rover By Benn Davis.
Rate Monotonic Analysis For Real-Time Scheduling A presentation for
Processor Fundamentals
Lecture 2 Part 2 Process Synchronization
Computer Science & Engineering Electrical Engineering
by M. P. Golombek, R. A. Cook, T. Economou, W. M. Folkner, A. F. C
First slide Rest of project 2 due next Friday Today:
Simplified Model for MER Activity Planning
Real-Time Process Scheduling Concepts, Design and Implementations
<Your Team # > Your Team Name Here
Real-Time Process Scheduling Concepts, Design and Implementations
Chapter 3: Process Management
Presentation transcript:

Failure in the PATHFINDER Mission Chandan Kumar EE 585: Fault Tolerant Computing

Outline Background Simplified view of H/W architecture S/W architecture Failure Cause Correction Here I would like to give an outline of what my todays talk is going to be about. I will be talking about the basic details of the pathfinder mission and its objectives, etc. and then discuss in detail about the failure which occurred and how it was corrected. CHANDAN EE 585: Case Study Chandan

Background Launched Dec 4 1996 Landed July 4 1997. Mission Objectives: To prove that the development of "faster, better and cheaper" spacecraft is possible (with three years for development and a cost under US$ 150 million). To show that it is possible to send a load of scientific instruments to another planet with a simple system and at one fifth the cost of a Viking mission. Here I would like to mention some important dates in the mission and also the mission objectives.Launched aboard a delta 2 rocket.After a 7 month journey it landed on Ares Vallis, in a region called Chryse Planitia on Mars, on 4 July 1997. CHANDAN EE 585: Case Study Chandan

Background Contd. To demonstrate NASA's commitment to low-cost planetary exploration finishing the mission with a total expenditure of US$ 280 million, including the launch vehicle and mission operations. Demonstrate the mobility and usefulness of a micro rover on the surface of Mars It carried a number of scientific instruments like Mars Pathfinder Lander: Imager for Mars Pathfinder (IMP), (includes magnetometer and anemometer) Atmospheric and meteorological sensors (ASI/MET) A continuation of the obejectives of the mission. It carried a number of scisntific instruments for performing various tests. CHANDAN EE 585: Case Study Chandan

Background Contd. Rover Sojourner: Imaging system (three cameras: front B&W stereo, 1 rear color) Laser striper hazard detection system Alpha Proton X-ray Spectrometer (APXS) Wheel Abrasion Experiment Material Adherence Experiment Accelerometers Potentiometers Final transmission Sept 27 1997. 16500 images sent from lander,550 from rover 15 analysis of rocks. Overview of the instruments on the rover. CHANDAN EE 585: Case Study Chandan

Simplified view of Hardware Architecture Single CPU – Controls the Spacecraft. Resides on VME bus. Interface cards for Radio and Camera. Interface to 1553 bus. 1553 bus connects to ‘cruiser’ and ‘lander’ stages. H/W on Cruiser – controls thrusters .etc H/W on Lander – interface to instruments like accelerometer,radar altimeter and ASI/MET etc. I would like to give a basic overview of the hardware architecture of the pathfinder. It can be viewed as a cpu residing on a vme bus. Vme bus is a type of bus architecture. This cpu is interfaced to various parts as fllows. CHANDAN EE 585: Case Study Chandan

The Software Architecture |< ------------------------ .125 seconds ---------------------------->| |<***************| |********| |**>| |<- bc_dist active ->| bc_sched active | < - bus active - >| |<->| ----|-------------------------|-------------------------|------------|-----|----|--- t1 t2 t3 t4 t5 t1 The *** are periods when tasks other than the ones listed are executing. There is some idle time. t1 - bus hardware starts via hardware control on the 8 Hz boundary. The transactions for the this cycle had been set up by the previous execution of the bc_sched task. t2 - 1553 traffic is complete and the bc_dist task is awakened. t3 - bc_dist task has completed all of the data distribution t4 - bc_sched task is awakened to setup transactions for the next cycle t5 - bc_sched activity is complete S/W implemented in 2 tasks. The first task controlled the setup of transactions on the 1553 bus (called the bus scheduler or bc_sched task) and the second task handled the collection of the transaction results i.e. the data. The second task is referred to as the bc_dist (for distribution) task. A typical timeline for the bus activity for a single cycle is shown below. It is not to scale. This cycle was constantly repeated. The bc_sched task is the highest priority task in the system (except for the vxWorks "tExec" task). The bc_dist is third highest (a task controlling the entry and landing is second). All of the tasks which perform other spacecraft functions are lower. Science functions, such as imaging, image compression, and the ASI/MET task are still lower. CHANDAN EE 585: Case Study Chandan

The Failure: The spacecraft began experiencing total system resets. This reset reinitializes all of the hardware and software. It also terminates the execution of the current ground commanded activities. The remainder of the activities for that day were not accomplished until the next day What actually happened to the system. Ie the effect of the failure. CHANDAN EE 585: Case Study Chandan

The Cause The Failure - a case of Priority Inversion In scheduling, priority inversion is the scenario where a low priority task holds a shared resource that is required by a high priority task. This causes the execution of the high priority task to be blocked until the low priority task has released the resource, effectively "inverting" the relative priorities of the two tasks. If some other medium priority task attempts to run in the interim, it will take precedence over both the low priority task and the high priority task. Here I would like to talk about priority inversion which was the root cause for the failure. An idea of PI is as follows. CHANDAN EE 585: Case Study Chandan

The Cause Contd. The failure was identified by the spacecraft as a failure of the bc_dist task to complete its execution before the bc_sched task started The ASI/MET task is delivered its information via an interprocess communication mechanism (IPC). IPC mechanism based on using Pipes. The higher priority bc_dist task was blocked by the much lower priority ASI/MET task that was holding a shared resource. Here I want to discuss about how the failure occurred due to the ipc used by Atmospheric struct intsru /met package. CHANDAN EE 585: Case Study Chandan

The Cause contd.. The resource that caused this problem was a mutual exclusion semaphore used within the select() mechanism. The ASI/MET task had acquired this resource and then been preempted by several of the medium priority tasks. The bc_dist task attempted to send the newest ASI/MET data via the IPC mechanism which called a Pipe. This pipe blocked taking the semaphore. The ipc of asi/met task used a mutual exclusion semaphore and after that it was prempted by several lower priority tasks. CHANDAN EE 585: Case Study Chandan

The Cause contd.. The medium priority tasks ran, still not allowing the ASI/MET task to run, until the bc_sched task was awakened. At that point, the bc_sched task determined that the bc_dist task had not completed its cycle (a hard deadline in the system) and declared the error that initiated the reset. Due to this the Asi/met task was not completed until the schedulin decision for the next cycle had to be taken. As a result an error was declared and the system was reset. CHANDAN EE 585: Case Study Chandan

Correction Changing the creation flags for the semaphore so as to enable the priority inheritance Modify the semaphore associated with the pipe used for bc_dist task to ASI/MET task communications corrected the problem. Here I would like talk about how the correction of the code was carried out by the wind river people. CHANDAN EE 585: Case Study Chandan

S/W modification on the spacecraft Patching is a specialised process. Send the difference b/w what you have onboard and what you want on the spacecraft. S/W on the spacecraft modifies the onboard copy. After the completion of the correcting the code then we need to update the s/w onboard also. For this patching process is followed. Here we send the diff btwn both codes ie onboard and the corrected code and s/w onboard changes the s/w. CHANDAN EE 585: Case Study Chandan

Questions?? CHANDAN EE 585: Case Study

References http://mars.jpl.nasa.gov/missions/past/pathfinder.html http://research.microsoft.com/%7embj/Mars_Pathfinder/Authoritative_Account.html http://en.wikipedia.org/wiki/Mars_Pathfinder CHANDAN EE 585: Case Study