Debugging Techniques For Concurrent Programs Divjyot Sethi with Stephen Beard, Raman Arun, Arnab Sinha, Yun Zhang, Soumyadeep Ghosh.

Slides:



Advertisements
Similar presentations
CHESS : Systematic Testing of Concurrent Programs
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
1 Lecture 11: Transactions: Concurrency. 2 Overview Transactions Concurrency Control Locking Transactions in SQL.
Transaction Management: Concurrency Control CS634 Class 17, Apr 7, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
4 December 2001 SEESCOASEESCOA STWW - Programma Debugging of Real-Time Embedded Systems: Experiences from SEESCOA Michiel Ronsse RUG-ELIS.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Chapter 6: Process Synchronization
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS: A Systematic Testing Tool for Concurrent Software CSCI6900 George.
Tom Ball, Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
An Case for an Interleaving Constrained Shared-Memory Multi- Processor CS6260 Biao xiong, Srikanth Bala.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
/ PSWLAB Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs By Cormac Flanagan, Stephen N. Freund 24 th April, 2008 Hong,Shin.
Concurrency.
CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.
5.6 Semaphores Semaphores –Software construct that can be used to enforce mutual exclusion –Contains a protected variable Can be accessed only via wait.
Chapter 6: Process Synchronization. Outline Background Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores Classic Problems.
Concurrency CS 510: Programming Languages David Walker.
Transaction Processing: Concurrency and Serializability 10/4/05.
CS533 Concepts of Operating Systems Class 3 Monitors.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.
Solution to Dining Philosophers. Each philosopher I invokes the operations pickup() and putdown() in the following sequence: dp.pickup(i) EAT dp.putdown(i)
Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS.
AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Concurrency in Shared Memory Systems Synchronization and Mutual Exclusion.
1 Lecture 4: Transaction Serialization and Concurrency Control Advanced Databases CG096 Nick Rossiter [Emma-Jane Phillips-Tait]
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
CHESS Finding and Reproducing Heisenbugs in Concurrent Programs
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Agenda  Quick Review  Finish Introduction  Java Threads.
Detecting Data Races in Multi-Threaded Programs
Last Class: Canonical Problems
Background on the need for Synchronization
Threads and Memory Models Hal Perkins Autumn 2011
References [1] LEAP:The Lightweight Deterministic Multi-processor Replay of Concurrent Java Programs [2] CLAP:Recording Local Executions to Reproduce.
Designing Parallel Algorithms (Synchronization)
Threads and Memory Models Hal Perkins Autumn 2009
Part 1: Concepts and Hardware- Based Approaches
Distributed Transactions
2P13 Week 3.
Eraser: A dynamic data race detector for multithreaded programs
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Debugging Techniques For Concurrent Programs Divjyot Sethi with Stephen Beard, Raman Arun, Arnab Sinha, Yun Zhang, Soumyadeep Ghosh

The Northeastern blackout, August 2003 Toronto

The Northeastern blackout, August 2003 Toronto went dark Investigation Revealed … a race condition, triggered on August 14th by a perfect storm of events … SecurityFocus, April 7th, 2004

Concurrent Programming is HARD 4 Easy to write racy code Deadlocks 

Concurrent Programming is HARD 5 Nontrivial specification How does one know if the parallel implementation of an algorithm still implements the algorithm?

Overview Issues in concurrent programs Simulation based verification Runtime Verification Summary and ongoing research 6

Concurrent Code Some Important bugs Deadlock and starvation Livelock Race conditions 7

Deadlock and Starvation Deadlock – A situation where no further progress can be made by the program (when all processes are blocked). Let S and Q be two locks P0P1P0P1 S.Lock();Q.Lock(); Q.Lock();S.Lock();  S.Unlock();Q.Unlock(); Q.Unlock();S.Unlock(); Starvation - A process is ready to run or to use a resource, but is never given a chance by the scheduler. 8

Livelock Threads are doing something… yet they are not doing anything useful!! 9 Livelock when two polite people in hallway

// thread 1 getLocks12(lock1, lock2) { lock1.lock(); while (lock2.locked()) { lock1.unlock(); wait(); lock1.lock(); } lock2.lock(); } Where is the Livelock? 10 // thread 2 getLocks21(lock1, lock2) { lock2.lock(); while (lock1.locked()) { lock2.unlock(); wait(); lock2.lock(); } lock1.lock(); }

Data Race Two concurrent accesses to a memory location where at least one of them is a write. Example: int x = 1; Parallel.Invoke( () => { x = 2; }, () => { System.Console.WriteLine(x);} ); Possible outcomes? o1o1 o2o2 writes x reads x 11

Concurrent Code What is defined as correct? 12 Parallel Queue P1 P2 P3 P4 C1 Producers Consumer Enq(v1) Deq Deq -> v2 Deq -> v4 Deq -> v1 Is this correct?? Enq(v2) Enq(v3) Enq(v4) Depends on what you define as correct!

Correctness Condition Data type implementations must appear sequentially consistent to the client program. enqueue(1) dequeue() -> 2 enqueue(2) dequeue() -> 1 enqueue(1) enqueue(2) dequeue() -> 1 dequeue() -> 2  Observation Courtesy: Rajiv Alur 13 The observed argument and return values must be consistent with some global sequence of the operations. Thread 1 Thread 2Serialization  Witness Interleaving

~ ~ ~ ~ ~ ~ Serializability Equivalent global execution Concurrent execution ~ ~ ~ [Papadimitriou ‘79]

Correctness Condition Data type implementations must appear sequentially consistent to the client program. enqueue(1) dequeue() -> 2 enqueue(2) dequeue() -> 1 enqueue(1) enqueue(2) dequeue() -> 1 dequeue() -> 2  Observation 15 Here means values seen by dequeue in the serialization matches original values Thread 1 Thread 2 Serialization  Witness Interleaving ~ ~ ~ Courtesy: Rajiv Alur

Challenges in Parallel programming: Summary What could go wrong? o Deadlocks o Livelocks o Data Races What is defined as correct?? o Correctness notions Serializability 16

Overview Issues in concurrent programs Simulation based verification Runtime Verification Summary and ongoing research 17

Simulation based verification: Challenges Concurrent executions are highly non-determinisitic Rare thread interleavings result in Heisenbugs Difficult to find, reproduce, and debug Observing the bug can “fix” it Likelihood of hitting the same bug can change, say, when you add printfs A huge productivity problem Developers and testers can spend weeks chasing a single Heisenbug Courtesy Madan Musuvathi 18

Execution Replay: Bug reproduction Run the program On finding bug, have a mechanism to reproduce the bug. Challenging due to non-determinism at various levels. o Input o Timing o Multiple processors 19

Content-based Record input for every instruction 20 … add r1,1 → r1 load 8(r1) → r2 store r2 → 12(r1) … r1 = 10 r1 = 11 r2 = 401 r1=11 + Instruction can be executed in isolation – Huge trace files Courtesy Frank Cornelis et al.

Ordering-based 21 Record ordering of important events in the program from a given initial state, to aid replay. C1; C2 + Smaller trace files – Reexecution required Courtesy Frank Cornelis et al.

Execution replay systems: Summary DejaVu 1 DejaVu 2 IGOR Instant Replay Interrupt Replay JaRec jRapture Recap RecPlay RSA Tornado Input System Calls Interrupts Shared-Memory (content-based) Shared-Memory (ordering-based) 22 Courtesy Frank Cornelis et al.

CHESS : Smart bug hunting x = 1; y = 1; x = 1; y = 1; x = 2; y = 2; x = 2; y = 2; 2,1 1,0 0,0 1,1 2,2 2,1 2,0 2,1 2,2 1,2 2,0 2,2 1,1 1,2 1,0 1,2 1,1 y = 1; x = 1; y = 2; x = 2; Probability of finding a bug is low due to interleavings Probability of hitting a buggy branch is low! 23 Thread1 Thread2 Courtesy Madan Musuvathi et al.

CHESS CHESS is a user-mode scheduler o Controls all scheduling non-determinism Guarantees: o Every program run takes a different thread interleaving o Reproduce the interleaving for every run 24 Courtesy Madan Musuvathi et al.

CHESS Basic Idea o Synchronization Wrappers: All important calls like enter_critical, create_thread etc. are wrapped using CHESS defined functions. Programs call these CHESS defined functions. o These CHESS defined functions inform the CHESS scheduler about process status, and also make actual system calls 25

CHESS Synchronization Wrappers Expose nondeterministic choices Provide enabled information CHESS_EnterCS{ while(true) { canBlock = TryEnterCS (&cs); if(canBlock) Sched.Disable(currThread); } CHESS_EnterCS{ while(true) { canBlock = TryEnterCS (&cs); if(canBlock) Sched.Disable(currThread); } 26 Chess scheduler finds disabled threads this way.

CHESS Scheduler Introduce an event per thread Every thread blocks on its event The scheduler wakes one thread at a time by enabling the corresponding event The scheduler does not wake up a disabled thread o Need to know when a thread can make progress Wrappers for synchronization provide this information The scheduler has to pick one of the enabled threads o The exploration engine decides for the scheduler 27 Courtesy Madan Musuvathi et al.

x = 1; … y = k; x = 1; … y = k; CHESS: State space explosion x = 1; … y = k; x = 1; … y = k; … n threads k steps each Number of executions = O( n nk ) Exponential in both n and k o Typically: n 100 Limits scalability to large programs Huge State Space!

x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … y = k; x = 1; … x = 1; … x = 1; … x = 1; … y = k; … y = k; … y = k; Choose c preemption points Permute n+c atomic blocks CHESS: State space explosion Terminating program with fixed inputs and deterministic threads n threads, k steps each, c preemptions Number of executions <= nk C c. (n+c)! = O( (n 2 k) c. n! ) Exponential in n and c, but not in k

Chess: Summary CHESS is a tool for Systematically enumerating thread interleavings o Algorithms for systematic scheduling Reliably reproducing concurrent executions Limits state space by limiting preemption points. 30

Overview Issues in concurrent programs Simulation based verification Runtime Verification Summary and ongoing research 31

Challenges in Simulation 32 Too many traces Poor absolute coverage Difficult to derive useful traces Difficult to characterize true coverage Courtesy Valeria Bertacco

Runtime Verification On-the-fly checking Focus on current trace Complete coverage 33 Courtesy Valeria Bertacco

Runtime Verification Case Study: Transactional Memory 34

35 What is Transactional Memory? System state S1 System state S2 Transaction Boundaries begin_transaction read a; write b; end_transaction Modified system state (S2) is visible only if transaction completes execution (commit), otherwise (abort) earlier state (S1) is preserved. Courtesy Arnab Sinha et al.

36 init: a = b = 0; // Transaction T1 begin_transaction read a; read b; end_transaction // Transaction T2 begin_transaction write b:=1; end_transaction time init: a = b = 0; // Transaction T1 begin_transaction read a; read b; end_transaction // Transaction T2 begin_transaction write a:=1; write b:=1; end_transaction time Serializability in Transactional Memory Transaction T1 reads (a, b) = (0, 1). Transaction T2 writes (a, b) = (0, 1). Hence, in effect, trans. T1 follows trans. T2. Therefore, this execution trace is serializable! We assume that this trace is serializable. Transaction T1 reads (a, b) = (0, 1). Transaction T2 writes (a, b) = (1, 1). Case 1: T1 < T2 T1 should have read (0, 0). Case 2: T2 < T1 T1 should have read (1, 1). Contradiction! Therefore, this execution trace is not serializable! Serializability: Global order of operations (transaction execution) exists. Courtesy Arnab Sinha et al.

Serializability Checking through Graph Checking time Cycle exists  Therefore, this execution trace is not serializable! T1T1 T2T2 RAW No cycle  Serializable execution T1T1 T2T2 WAR RAW  Vertices are the transactions.  Edges represent the conflicting shared data accesses (e.g. RAW, WAR and WAW).  Edge A  B, if trans. A accesses before trans B. init: a = b = 0; // Transaction T1 begin_transaction read a; read b; end_transaction // Transaction T2 begin_transaction write b:=1; end_transaction init: a = b = 0; // Transaction T1 begin_transaction read a; read b; end_transaction // Transaction T2 begin_transaction write a:=1; write b:=1; end_transaction Courtesy Arnab Sinha et al.

Top-Level Algorithm Transaction threadsLogging the “interesting” events Graph construction DSR graph Graph compaction DSR graph Challenge 1: Minimize logging overhead on transactions Challenge 3: Efficient graph compaction Challenge 2: Efficient graph construction Courtesy Arnab Sinha et al.

Runtime Verification Case Study: ERASER 39

ERASER Eraser checks that all shared memory accesses follow a consistent locking discipline. o a locking discipline is a programming policy that ensures the absence of data races. o E.g. every variable shared between threads is protected by a mutual exclusion lock. 40 Courtesy: Stefan Savage et al.

Lockset Algorithm Lockset Algorithm The basic lockset algorithm enforces the locking discipline that every shared variable is protected by some lock. Eraser checks whether the program respects this discipline by monitoring all reads and writes as the program executes. Eraser infers the protection relation from the execution history. 41 Courtesy: Stefan Savage et al.

lock(mu1) ; lock(mu2) ; v := v+1 ; unlock(mu2) ; v := v+2 ; unlock(mu1) ; lock(mu2) ; v := v+1 ; 42 { } {mu1} {mu1,mu2} {mu1} { } {mu2} {mu1, mu2} {mu1} { } issues an alarm!! Programslocks_held C(v): Candidate Locks for variable v Lockset Algorithm Lockset Algorithm Locks: mu1, mu2 Shared Variable: v Courtesy: Stefan Savage et al.

If some lock l consistently protects v, it will remain in C ( v ) as C ( v ) is refined. If C ( v ) becomes empty, this indicates that there is no lock that consistently protects v. In summary, the first lockset algorithm is Let locks _ held ( t ) be the set of locks held by thread t For each v, initialize C ( v ) to the set of all locks. On each access to v by thread t, set C ( v ) := C ( v ) Å locks _ held ( t ) ; if C ( v ) = { }, then issue a warning. 43 Lockset Algorithm Lockset Algorithm Courtesy: Stefan Savage et al.

Overview Issues in concurrent programs Simulation based verification Runtime Verification Summary and ongoing research 44

Summary Challenges in parallel programming Simulation based verification o Execution Replay o Systematic bug hunting: CHESS Runtime Validation o Transaction Memories o Eraser (Lockset Algorithm) 45

Acknowledgements Yun Zhang Arnab Sinha Arun Raman Stephen Beard Soumyadeep Ghosh Daniel Schwartz-Narbonne Prof. David Walker Prof. Sharad Malik Prof. David August 46

THANK YOU 47