SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS Ghent University
July 30, 2003SSGRR The Debugging Problem The debugging process is hard to automate Current tools are inadequate for debugging large scale, interactive, multi-threaded, and event-driven applications Hard to find bugs: Synchronization errors Memory leaks Data races Dangling pointers
July 30, 2003SSGRR Inadequate Tools Most common debugging technique: cyclic debugging Problem: there is no guarantee that the same behavior is observed during subsequent runs as many applications are non-deterministic Ideal situation: reverse execution…
July 30, 2003SSGRR Solution: Execution Replay Execution 1Execution 2 Trace file record replay
July 30, 2003SSGRR Requirements Record must have low intrusion Replay must be accurate Record phase must be space efficient Replay phase must be time efficient
July 30, 2003SSGRR Tornado RecPlay JaRec jRapture Interrupt Replay Scheduling Replay Compressed differences Instant Replay Input Replay Output Replay RSA DejaVu Igor Recap
July 30, 2003SSGRR Outline Introduction Content-based vs. ordering-based Dealing with input Dealing with timing Dealing with other processors Conclusion
July 30, 2003SSGRR Content-based Record input for every instruction … add r1,1 → r1 load 8(r1) → r2 store r2 → 12(r1) … r1 = 10 r1 = 11 r2 = 401 r1=11 + Instruction can be executed in isolation – Huge trace files
July 30, 2003SSGRR Ordering-based Record control flow of program from a given initial state C1; C2 + Smaller trace files – Reexecution required
July 30, 2003SSGRR Sources of non-determinism Input (e.g. a database, time, pixel coordinates) Timing (e.g. interrupts, scheduler actions) Interaction with other processors (processor, DMA, coprocessor)
July 30, 2003SSGRR Outline Introduction Content-based vs. ordering-based Dealing with input Dealing with timing Dealing with other processors Conclusion
July 30, 2003SSGRR Input instructions application kernel IO-instructions System calls content-based ordering-based content-based Tornado jRapture
July 30, 2003SSGRR Outline Introduction Content-based vs. ordering-based Dealing with input Dealing with timing Dealing with other processors Conclusion
July 30, 2003SSGRR Dealing with timing Interrupts Input/output (timing aspect; not input aspect) Scheduling application other code ordering-based
July 30, 2003SSGRR How to determine the ordering PC is not enough Need extra counter: SIC 1 - Instructions executed - No of backward jumps 1 Software Instruction Counter Interrupt replay Repeatable scheduling DejaVu
July 30, 2003SSGRR Outline Introduction Content-based vs. ordering-based Dealing with input Dealing with timing Dealing with other processors Conclusion
July 30, 2003SSGRR Dealing with other processors Multi-threading (multiple threads in one address space) Multi-processing (multiple processes sharing a common block of memory) A coprocessor (video, DMA, …) Code 1Code 2 data c1c1 c2c2 c1c1 c2c2
July 30, 2003SSGRR Many systems RecPlay Ordering-based up to the first data race JaRec Ordering-based up to the first data race IGOR Content-based: checkpointing Recap Content-based: reverse execution Instant Replay Ordering-based: version numbers Netzers’s approach Ordering-based: also replays data races
July 30, 2003SSGRR Overview DejaVu 1DejaVu 2 IGOR Instant Replay Interrupt Replay JaRec jRapture Recap RecPlay RSA Tornado Input System Calls Interrupts SM (content-based) SM (ordering-based)
July 30, 2003SSGRR Conclusion No execution replay system deals with all forms of non-determinism The more accurate the system gets, the more resources it needs (time, space), and hence becomes less useful There is a need for stable and platform-independent tools to further support debugging