PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park and Yuanyuan Zhou (UCSD) Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu.

Slides:

Advertisements

Similar presentations

Triage: Diagnosing Production Run Failures at the Users Site Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou Department of Computer.

Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

Artificial Intelligence: Chapter 2

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 15: Exception Handling.

Chapter 16: Exception Handling C++ Programming: From Problem Analysis to Program Design, Fifth Edition.

Objectives In this chapter you will: Learn what an exception is Learn how to handle exceptions within a program See how a try / catch block is used to.

C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 16: Exception Handling.

CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.

Continuously Recording Program Execution for Deterministic Replay Debugging.

Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn University of Michigan, Ann Arbor Respec: Efficient.

DoublePlay: Parallelizing Sequential Logging and Replay Kaushik Veeraraghavan Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn,

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 5 Data Flow Testing

Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.

Deterministic Replay of Java Multithreaded Applications Jong-Deok Choi and Harini Srinivasan slides made by Qing Zhang.

Rahul Sharma (Stanford) Michael Bauer (NVIDIA Research) Alex Aiken (Stanford) Verification of Producer-Consumer Synchronization in GPU Programs June 15,

Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

A Survey of Rollback-Recovery Protocols in Message-Passing Systems.

- 1 - Dongyoon Lee †, Mahmoud Said*, Satish Narayanasamy †, Zijiang James Yang*, and Cristiano L. Pereira ‡ University of Michigan, Ann Arbor † Western.

SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS.

AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore.

Chapter 14: Exception Handling. Objectives In this chapter, you will: – Learn what an exception is – Learn how to handle exceptions within a program –

Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.

Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.

Ali Kheradmand, Baris Kasikci, George Candea Lockout: Efficient Testing for Deadlock Bugs 1.

Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

Chapter 15: Exception Handling C++ Programming: Program Design Including Data Structures, Fifth Edition.

Detecting Atomicity Violations via Access Interleaving Invariants

CAPP: Change-Aware Preemption Prioritization Vilas Jagannath, Qingzhou Luo, Darko Marinov Sep 6 th 2011.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

Guided Practice Let’s Brainstorm an Idea. Step 1: Let’s relax and have some fun.

EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

Atom-Aid: Detecting and Surviving Atomicity Violations Brandon Lucia, Joseph Devietti, Karin Strauss and Luis Ceze LBA Reading Group 7/3/08 Slides by Michelle.

55:032 - Intro. to Digital DesignPage 1 VHDL and Processes Defining Sequential Circuit Behavior.

1 State machine by nature are ideally suited to track state and detect specific sequence of events For example, we may design specific machines to track.

Synthesizing Component- Level Behavior Models from Scenarios and Constraints Ivo Krka.

Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.

Soyeon Park, Shan Lu, Yuanyuan Zhou UIUC Reading Group by Theo.

1 Distributed Vertex Coloring. 2 Vertex Coloring: each vertex is assigned a color.

CHESS Finding and Reproducing Heisenbugs in Concurrent Programs

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Agenda  Quick Review  Finish Introduction  Java Threads.

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.

SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.

Chapter 4: Threads.

G.Anuradha Reference: William Stallings

Reactive Synchronization Algorithms for Multiprocessors

Classifying Race Conditions in Web Applications

Outline Announcements Fault Tolerance.

EEC 688/788 Secure and Dependable Computing

Using surface code experimental output correctly and effectively

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

GPU Scheduling on the NVIDIA TX2:

EEC 688/788 Secure and Dependable Computing

Understanding Real-World Concurrency Bugs in Go

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Presentation transcript:

PRES: Probabilistic Replay with Execution Sketching on Multiprocessors Soyeon Park and Yuanyuan Zhou (UCSD) Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H.Lee and Shan Lu (UIUC) SOSP 2009 LBA Reading Group 9/15/09 Presented by: Michelle Goodstein

Outline Motivation PRES Architecture Capturing Sketches Replaying Intelligently Evaluation Conclusion

Motivation Concurrency bugs are hard… Deterministic Replay can help, but… Deterministic Replay can be expensive What if only record partial information? –Good enough to reproduce bug vs actual execution –Reproduce bug in small (5-50) number of replays rather than first attempt?

PRES Probabilistic Replay via Execution Sketching –Records partial ordering during production run –Intelligently explores space of partial orderings Use feedback from failed attempts to reproduce bug in subsequent explorations

PRES Architecture Sketch Recorders Partial Information based Replayer (PI- Replayer) Replay Recorder Monitor Feedback Generator

PRES Architecture Sketch Recorders –During production run –Captures partial ordering of events –Balance of efficiency and usefulness in replay

PRES Architecture Partial Information based Replayer (PI-Replayer) –During bug reproduction phase –Consults with sketch, feedback from attempts to reproduce bug Sketch specifies ordering  do what sketch proscribes Feedback says ordering did not produce bug  do something else No info available – execute however desired

PRES Architecture Replay Recorder –Deterministic replay recorder –Necessary to produce feedback –When bug reproduced, have a deterministic record of how to repeat with 100% probability

PRES Architecture Monitor: –Tracks replays and detects: Deviations from sketch (new replay necessary) Bug reproduced (success!)

PRES Architecture Feedback Generator –Uses info from recorder to provide feedback for future replay attempts –Try to figure out why bug not discovered

Sketch Recorders Baseline (Base) –Everything necessary for det. replay on uniprocessor Synchronization recorder (Sync) –Above + global order at high-level synch ops System call recorder (Sys) –Above + global order of syscalls Function call recorder (Func) –Global order of all function calls (Michelle: also + above???) Nth-Basic block recorder (BB-n) –Records the nth basic block executed, (count is global) Basic Block recorder (BB) –Global order of all basic blocks Shared reads/writes (RW) –Standard deterministic replay

Sketch Recorders

Replaying Intelligently Monitor observes currently replay –Compares current replay to sketch to notice when to abort Inconsistent or off-sketch Bug reproduced –Operates only on visible events Exceptions, timer signals, outputs

Replaying Intelligently Unsuccessful replays –Sketches that are not RW miss some shared memory data races –If race occurs in certain orders, bug may not manifest –Idea: use info (feedback) from prior runs to guide choice of ordering in next replay attempt

Replaying Intelligently: Generating Feedback Need to do full RW recording of replay attempt Using failed replay recordings, identify data races Filter out data races where sketch implies ordering Select a data race to invert ordering of –Heuristic, chooses a replay recording and then the race closest to fault On next replay, execute deterministically until data race encountered, flip order Then, default PI-Replayer behavior takes over

Evaluation

Conclusion Interesting use of partial orders as compromise between efficiency and replay Partial information often sufficient to recover buggy ordering Similarities to the CHESS paper presented earlier