Timing and Race Condition Verification of Real-time Systems Yann–Hang Lee, Gerald Gannod, and Karam Chatha Dept. of Computer Science and Eng. Arizona State University Tempe, AZ W. Eric Wong Dept. of Computer Science University of Texas at Dallas Richardson, TX
1 Background – RT Embedded Systems Real-time Embedded Systems – software and hardware components that form an essential part of an application system and have tight interaction with the external environment Major difficulties of building real-time embedded applications Handling concurrent events (events that occur in parallel) Timing control and temporal dependence in program behavior Asynchronous operations Non-deterministic operations, time-dependent behavior, and race conditions Difficult to model, analyze, test, and re-produce. Plantsensoractuator Control-raw computation A/D D/A Reference input Controller
2 Background (Cont’d) Example: NASA Pathfinder spacecraft Total system resets in Mars Pathfinder An overrun of a data collection task a priority inversion in mutex semaphore a failure of a communication task a system reset. Took 18 hours to reproduce the failure in a lab replica the problem became obvious and a fix was installed Errors rooted in the interaction of multiple concurrent operations/threads and are based on timing dependencies. Easy to identify the errors and fix them once the failing sequences are reproduced (or observed).
3 Temporal Dependence Predicting and controlling timing and responses are based on event occurrences Timing relationship: (can you guarantee it?) Predictable actions in response to external stimuli if event E 1 occurs at time t 1, will an action A 1 be triggered at time t 2 Deadline (absolute or relative), and jitter Program execution If event E 1 occurs at time t 1 + , will the same action A 1 be triggered at time t 2 + ? Will the program execution be identical ? Should this case be tested ?
4 Race Conditions Necessary conditions: Concurrent operations At least one is “update” No mechanism to guarantee any specific order 3 operations – A, B, & C A race condition occurs when two threads manipulate a shared data structure simultaneously without synchronization. Race conditions are common errors in multi-threaded programs; Since they are timing-dependent, they are notoriously hard to catch during testing Possible consequences: inconsistent data unexpected (non-deterministic) execution sequence (order of actions) ACB CAB BCA
5 Research Goals Establish a model to represent the inter- dependence of program execution behavior and external events external events (interrupts or sensor value changes) recognized by the executing program processing and state transitions actions Adopt interval logic as a formal representation of system behavior for timing verification Identify the minimal set of timed test stimuli
6 Race Condition Approaches Existing Techniques Ahead-of-time Static analysis and compile-time heuristics for race condition detection (e.g., rccjava) Advantage - Low overhead Limitation - False detection On-the-fly Dynamic analysis to detect race conditions during program execution (e.g., Eraser) Advantage - Overcomes false detection Limitation - Larger run-time overheads; spurious thread interactions Post-mortem Combination of run-time event capture and static post-execution analysis (e.g., Recplay, Deja Vu) Advantage - Best of both ahead-of-time and on-the-fly techniques Limitation - Run-time overhead; spurious thread interactions
7 Proposed Approach: Post-mortem with Temporal Analysis Static analysis (control flow and data dependence) Dynamic analysis (execution flow, timing, synchronization, and I/O operations) Run test cases in target environment Formal model of events and program execution Model deduction from multiple test runs Timing and race condition verification Create new event occurrences from uncovered intervals Formal Analysis Instrumentation
8 Tools Analysis Tools Considered a set of 34 tools Performed detailed analysis of 17 tools In-depth case study of 5 tools using Fischer’s Mutual Exclusion problem Kronos, UPPAAL, HyTech, Spin and Spin variants Timing Measurement and Instrumentation Tools Based on software instrumentation and a high-precision hardware timer (available in most high-end embedded processors) Reduction of intrusion Dominator analysis and super-block structure Measurement during replay phase
9 Research Plan Existing technologies – Static analysis, program instrumentation, formal methods, scheduling, and real-time operating systems Year 1 – Development of analysis techniques, timing measurement, interval logic representation and deduction, and proof-of- concept demonstration Year 2 – Optimize analysis techniques, tool development and interface with NASA’s target environment Year 3 – Demonstration with NASA’s applications, tool verification. time intrusions