Download presentation
Presentation is loading. Please wait.
Published byEsmond Spencer Barber Modified over 9 years ago
1
0 Deterministic Replay for Real- time Software Systems Alice Lee Safety, Reliability & Quality Assurance Office JSC, NASA alice.t.lee1@jsc.nasa.gov Yann-Hang Lee Computer Science & Eng Arizona State University, Tempe, AZ yhlee@asu.edu
2
lee_IV&V-1 Background Major difficulties of building real-time embedded applications m handling concurrent events (real-world events occur in parallel) m timing control and temporal dependence in program behavior m asynchronous operations Non-deterministic operation, Time-dependent behavior, and race condition m difficult to model, analyze, test, and re-produce. Example: NASA Pathfinder spacecraft m Total system resets in Mars Pathfinder m An overrun of data collection task a priority inversion in mutex semaphore failure of communication task a system reset. m Took 18 hours to reproduce the failure in a lab replica the problem became obvious and a fix was installed
3
lee_IV&V-2 Background (Cont’d) Other examples m select(2)/accept(2) Race Condition in TCP Servers of NetBSD the bug depends on a specific event and is sometimes difficult to reproduce, particularly if the server is very fast and the network is relatively slow. m The Delphi Bug Report 459 difficult to reproduce the bug since the timing of the two threads (one is being destroyed and one is being created) has to be “right” for it to occur. it is easy to identify the faults and fix them once the failing sequences are reproduced (or observed). The failures are rooted in the interaction of multiple concurrent operations/threads and are based on timing dependencies.
4
lee_IV&V-3 Deterministic Replay Can we re-produce the exact execution behavior with additional delays in a controlled environment m the delays may be caused by instrumentation and break points For multiple purposes: m Test analysis m Debugging m Recovery Execution/ Instrumentation D. replay/ Instrumentation Execution Execution/ Observation/ Assertion D. replay/ Observation/ Assertion Execution Execution/ Checkpointing/ Msg logging Rollback/ D. replay
5
lee_IV&V-4 Deterministic Replay (Cont’d) Programs read in the same input values (timer, DAQ, status, etc.) Interrupts occurs in the same program execution instances Need to log external events during real-time execution and re- submit the events during replay m recording and replaying stages real-time execution interrupt_1 interrupt_2 PC=1000 PC=2000 deterministic replay interrupt_1 interrupt_2 PC=1000 PC=2000 time intrusions
6
lee_IV&V-5 Testing Analysis and Timing Intrusion Software quality analysis and test coverage m Instrumentation at source programs m program behavior may be changed due to timing intrusion test a robotic controller in the target system – hardware and human-in-the loop operations m some solutions : hardware-based trace collection (Applied Microsystems) special data logging, monitoring, and test facility (SVF for NASA ISS) Apply instrumentation during deterministic replay m if the overhead of logging external events can be minimized
7
lee_IV&V-6 Our Approach -- A Two-stage Instrumentation Instrumentation based on RTOS -- for context switches, interrupts, events, and task communication Annotation for device drivers Synchronize program execution with external events m cannot rely on program counter an interrupt during a loop (need loop count and program counter) m simulated time must be adjusted to match with the real execution time determine when an event occurs if no data dependence, it can occur at any instance during a block execution else, need to know the corresponding statement
8
lee_IV&V-7 Software Instruction Counter Exact instance in program execution m specified by program counter (PC) p Software instruction counter (SIC) -- m incremented when backward jump or procedure call m software or hardware implemented m Has been applied to recovery and debugging read I/O check value I/O status changed read I/O check value
9
lee_IV&V-8 Current Status source program code analyzer ESIC, system, and event instrumentation instrumented program_1 target - record environment code instrumentation ESIC and replay instrumentation instrumented program_2 event trace_1 event trace_2 PC stamp converter target - replay environment execution trace
10
lee_IV&V-9 Current Status (Cont’d) Works for single execution thread in the whole system (vxWork + MPC860) There are kernel and non-instrumented threads m test analysis of one program in a multitasking environment m debug a program which calls library routines m system calls to RTOS Can we still reach deterministic replay if the execution of the instrumented thread is interleaved with other threads? If interrupts (input) thread_1 thread_2, then, both threads must be instrumented instrumented program RTOS The other thread semTake() interrupt semGive() ISR
11
lee_IV&V-10 Current Status (Cont’d) If interrupts (input) thread_2 and thread_1 thread_2, m thread_1 doesn’t need to be instrumented m however, interrupts can occur while thread_1 is running (I.e. execution is not in the instrumentation region due to a blocked system call or library call) Solution: m check thread id when an interrupt occurs m if the interrupted instruction is in the instrumentation region, use PC+SIC for replay m else, replay the interrupt just before the call (RTOS or library)
12
lee_IV&V-11 Current Tasks Tool integration and GUI Experiments m joystick program with input and timer m DC motor controller with a LabView-based simulator Applications in JSC m X38 m AERCam Porting m vxWorks and Suds on MBX860 embedded controller m porting to RT-linux and other platforms Documentation and dissemination
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.