Download presentation
Presentation is loading. Please wait.
Published byHenry Hoover Modified over 9 years ago
1
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher R. Andrews, and Yuanyuan Zhou Review by M. Kozuch
2
Motivation Better debugging Reproducing errors is hard because 1. Bug setup may require significant time 2. Exact inputs may be difficult to reproduce 3. Some are Heisenbugs Solution: Provide a mechanism for deterministic replay after a software error is detected.
3
Related Work Compile-time static checking Run-time dynamic checking Hardware support
4
Overview State = checkpoint() discard(State) replay(State)
5
Main Idea Part I Use fork()-like mechanism to create shadow copy of a process Copy-on-write memory image Copy-on-write memory image Register values Register values Some process state Some process state Not exactly fork Shadow not runnable Shadow not runnable Not all reference counts are incremented Not all reference counts are incremented
6
Memory Image Copy-on-write semantics Reduces cost of checkpoint() Reduces cost of checkpoint() Reduces memory footprint Reduces memory footprint Reduces impact of multiple checkpoints Reduces impact of multiple checkpoints Reduces cost of replay() Reduces cost of replay()
7
Multithreaded Processes Option 1: Rollback the process “Trivially” ensures consistency “Trivially” ensures consistency Option 2: Rollback the thread Requires ordering log for memory and files Requires ordering log for memory and files Rollback thread set with inter-dependencies Rollback thread set with inter-dependencies Problems: Problems: Logic adds overhead and is error-proneLogic adds overhead and is error-prone Data races may require watching all threadsData races may require watching all threads Overhead paid even when no errorsOverhead paid even when no errors Multithreaded state capture is still hard if some of the threads can be in a different context (i.e. kernel).
8
Main Idea Part II Can re-execute code Must replay inputs (e.g. from read()) After checkpoint(), log syscall return values* After replay(), replay the log
9
Log Weaknesses Shared memory Regions must be identified, and all accesses set to generate #PF Regions must be identified, and all accesses set to generate #PF Not currently handled Not currently handled Signals Replay of asynchronous events is challenging Replay of asynchronous events is challenging Not currently handled Not currently handled
10
uBenchmark Performance I Checkpoint() is 25-1600us Discard()/Replay() is 28-2800/7500us
11
uBenchmark Performance II read()/write() syscalls Used cold caches “for consistency”
12
Application Performance Experiments One application Network bound Logging disabled Conclusion: the overhead of checkpointing is low.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.