Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

Slides:

Advertisements

Similar presentations

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Advertisements

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Chapter 6: Process Synchronization

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)

Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

Today’s Agenda  Midterm: Nov 3 or 10  Finish Message Passing  Race Analysis Advanced Topics in Software Engineering 1.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

/ PSWLAB Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs By Cormac Flanagan, Stephen N. Freund 24 th April, 2008 Hong,Shin.

CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.

CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.

3.5 Interprocess Communication

CS533 Concepts of Operating Systems Class 3 Monitors.

1 Concurrency: Deadlock and Starvation Chapter 6.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.

Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.

15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

50.530: Software Engineering Sun Jun SUTD. Week 8: Race Detection.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.

Games Development 2 Concurrent Programming CO3301 Week 9.

1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.

Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.

“Virtual Time and Global States of Distributed Systems”

Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.

Java Thread and Memory Model

1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.

CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,

Copyright © Curt Hill Concurrent Execution An Overview for Database.

Operating Systems CSE 411 CPU Management Dec Lecture Instructor: Bhuvan Urgaonkar.

13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.

Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.

Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

Detecting Data Races in Multi-Threaded Programs

Presenter: Godmar Back

Healing Data Races On-The-Fly

CS533 Concepts of Operating Systems Class 3

Yuan Yu(MSR) Tom Rodeheffer(MSR) Wei Chen(UC Berkeley) SOSP 2005

EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – F16.

Threads and Memory Models Hal Perkins Autumn 2011

Operating Systems.

Threads and Memory Models Hal Perkins Autumn 2009

Shared Memory Programming

Threads Chapter 4.

Background and Motivation

Dr. Mustafa Cem Kasapbaşı

Multithreading Tutorial

CS533 Concepts of Operating Systems Class 3

“The Little Book on Semaphores” Allen B. Downey

Eraser: A dynamic data race detector for multithreaded programs

Presentation transcript:

Dynamic Data Race Detection

Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric Sobalvarro, Thomas Anderson, ACM Transactions on Computer Systems, Vol. 15, No. 4, November 1997 RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking; –Yuan Yu, Tom Rodeheffer, Wei Chen, Proceedings SOSP ’05, copyright 2005 ACM

The Shared Problem Problem: Data race detection in multithreaded programs. (Implies shared memory) Solution: a tool that automates the problem of detecting potential data races –Each paper describes a different method or technique Basic idea: look for “unprotected” accesses to shared variables. Why important: synchronization errors based on data races are –Timing dependent –Hard to find

Data Race “A data race occurs when two concurrent threads access a shared variable and … –at least one access is a write and –the threads use no explicit mechanism to prevent the accesses from being simultaneous” In other words, a data race can lead to a potential violation of mutual exclusion.

Data Race Example: Threads with unsynchronized access to a shared array Thread 1 int i; … for (i = 1; i < MAX; i++) { cin >> x; A[i] = 2*x; } … Thread 2 int i; … for (i = 1; i < MAX; i++) { if (A[i] < B[i]) B[i] = A[i]; } …

Static Data Race Detection Static data race detection can be done at compile time. –Type-based methods; a language-level approach –Path analysis of code; a compile-time approach Hard to apply to dynamically allocated date Doesn’t scale well to large programs Many false positives – it’s hard to reason about execution behavior.

Dynamic Data Race Detection Dynamic detection is done by code that monitors the software during execution. –The program may be “instrumented” with additional instructions –The additions don’t change program functionality but are used to monitor conditions of interest - in this case, access to shared variables and synchronization operations.

Dynamic Detection Post mortem or on-the-fly analysis of code traces Problems: –Can only check paths that are actually executed –Adds overhead at runtime Techniques –Happens-before (earliest dynamic technique) –Lockset analysis (Eraser) –Various hybrids (RaceTrack)

Dynamic Race Detection Using happens-before Definition of happens-before for data race detection uses accesses to synchronization objects (locks) to synchronize separate threads. –Compare to use of messages to synchronize separate processes in previous applications. In a single thread, happens-before reflects the temporal order of event occurrence (as always).

Happens-before Relation Between threads, events can be causally connected when a lock is accessed in a thread (A) and the next access to that lock is in a different thread B (lock access replaces message exchange) Accesses must obey the semantics of locks: –only the owner of a lock can unlock it, –two threads can’t hold the same lock simultaneously.

Happens-before Relation Let event a be in thread A and event b be in thread B. –If a = unlock(mu) and b = lock(mu) then a → b(a happens-before b) Data races between threads are possible if accesses to shared variables are not ordered by happens-before.

EXAMPLE: Fig. 1 Thread 1 lock(mu); v = v + 1; unlock(mu); Thread 2 lock(mu); v = v + 1; unlock(mu); The arrows represent happens-before. The events represent an actual execution of the two threads. Instead of a logical clock, each thread might maintain a “most recent event” variable. In T1, the most recent event is unlock(mu); when T2 executes lock(mu) the system can establish the happens-before relation.

EXAMPLE: Fig. 2 Thread 1 y = y + 1; lock(mu); v = v + 1; unlock(mu); Thread 2 lock(mu); v = v + 1; unlock(mu); y = y + 1; Accesses to both y and v are ordered by happens-before, so no data race occurred. But … a different execution ordering could get different results. Happens-before only detects data races if the incorrect order shows up in an execution trace.

EXAMPLE: Fig. 2 Thread 1 y = y + 1; lock(mu); v = v + 1; unlock(mu); Thread 2 lock(mu); v = v + 1; unlock(mu); y = y + 1;... If Thread 2 executes before Thread 1, happens-before no longer holds between the two accesses to y, so the possibility of a data race occurs and should be notified to the programmer. Accesses to y are “concurrent” since neither a b nor b a

Problems with happens-before Eraser would find the error for any test case that included both code paths, regardless of the order; happens-before analysis only works if the dangerous schedule is executed. Since there are many possible interleavings, you can’t be sure to test them all so you might miss a potential error. Eraser might miss some data races, but it will catch more than tools based only on happens- before.

Lockset Analysis Background Lock: a synchronization object that is either available, or owned (by a thread). –Operations: lock(mu) and unlock(mu). –No explicit initialize operation. Compare to binary semaphore –Lock( ) ~ P( ); Unlock ~ V( ) –The Lock( ) operation is blocking if the lock is owned by another thread.

Background Simple mutex locks are not the only kind. Some systems provide others: –Read/write locks permit multiple readers, but only one writer. Some shared-memory accesses don’t need locks at all –Read only data: intialized and then never written again.

Basic Premise of Eraser Observe all instances where a shared variable is accessed by a thread. If there is a chance that a data race can occur, be sure the shared variable is protected by a lock. –Simple algorithm – basic locks –Advanced algorithm – reader/writer locks If variable isn’t protected, issue a warning.

How Eraser Works Requires each shared variable to be protected by a lock. (the same lock for all threads) Eraser will monitor all reads and writes (loads and stores) of a variable as the program runs. Eraser must deduce which locks protect each shared variable. Eraser assumes that it knows the full set of locks in advance (they must be declared in the code). Protects at the word level; i.e., a word is considered to be a variable.

How It Works (see Section 2) For each variable v build a set of locks C(v) that holds candidate locks (locks that may be protecting v). –l is in C(v) if every thread that has accessed v so far was holding l at the time of access. Lockset refinement: C(v) is adjusted every time v is accessed. If C(v) becomes empty, the variable is assumed to be unprotected.

The First Lockset Algorithm Let locks_held(t) be the set of locks held by thread t. (a per-thread structure) For each v, initialize C(v) to the set of all locks. (a per-variable structure) Lock sets change over time. Each time a thread t i accesses variable v –Set C(v) = C(v) ∩ locks_held(t) –If C(v) = { Φ } issue a warning

Example (Fig. 3) If a program has two locks, mu1 and mu2, then C(v) is initially {mu1, mu2}. If the first access to v is in a thread holding mu1 then C(v) ∩ locks_held(t) = mu1. If the second access to v is in a thread holding mu2 then C(v) ∩ locks_held(t) = {Φ}.

Refining the Lockset Algorithm The previous algorithm is correct, but flags some situations as potential race conditions when in fact they aren’t: False alarms –Variable initialization (restricted to one thread) –Shared variables that are read-only Sections 2.2 and 2.3 discuss refinements to the algorithm for avoiding some false alarms and handling read-write locks as well as simple locks.

Refinements Until a variable is accessed by a second thread, there’s no danger of a data race so no need to monitor

virgin exclusive shared Shared-modified write Write, new thread Write Read, new thread State transitions for a memory location, based on whether it has been accessed at all, accessed by more than one thread, accessed in read mode only, etc. Figure 4 Race conditions are reported only for locations in the shared- modified state.

Implementing Eraser Eraser instruments the program binary by inserting calls to the Eraser runtime functions. Each load and store is instrumented if it accesses global or heap data. Stack data is assumed not to be shared. The storage allocator is also instrumented to initialize C(v) for dynamic data.

Implementing Eraser Each call to the lock operation is instrumented to keep locks_held(t) updated. When a race is suspected (reference to a shared variable that isn’t protected by a lock) Eraser indicates the file and line # plus other information that can help the programmer locate the problem.

Conclusions A number of systems (AltaVista, the Petal distributed file system) were used as testbeds. Undergraduate programs were also tested. Eraser found a number of potential race conditions and had a few false alarms. Experienced programmers did better than undergraduates!

Summary/Review Data race detection can be done statically or dynamically –Static: compile time analysis, examine all paths or modify language type system to include synchronization relationships –Dynamic: run-time analysis, can only catch errors if they are observed – can’t examine all paths Eraser does a better job than happens-before methods; will detect all potential races in monitored code

EXAMPLE: Fig. 2 Thread 1 y = y + 1; lock(mu); v = v + 1; unlock(mu); Thread 2 lock(mu); v = v + 1; unlock(mu); y = y + 1; Eraser would notice that y is unprotected by a lock and thus detect a data race, even though happens-before would not.

Summary/Review Two earlier techniques –Lockset analysis (Eraser): enforces the requirement that every shared variable is protected by a lock Possible false positives, slow, not “sound”, but relatively unaffected by execution order. May miss some races if a dangerous path is not tested. –Happens-before analysis: based on Lamport’s relation, establish partial ordering of statements based on synchronization events No false positives, but may have false negatives

RaceTrack Claim: Improves on lockset analysis by only looking for data races when shared data is being accessed concurrently. –Eraser does this too, but in a limited fashion Able to handle locks as well as fork-join parallelism Monitors library code also Is sensitive to execution traces

Fork-Join A way to achieve parallelism –Parent thread creates (forks) several sub- threads –Parent thread pauses –Forked threads report results, parent thread resumes execution (the join) and combines child results Similar to UNIX approach with processes, but finer granularity

RaceTrack Does not claim to detect all concurrent accesses (i.e., there may be false negatives) Why: to detect all instances of concurrency the tool would have to keep a complete access history for each shared variable –RaceTrack uses estimation techniques to prune the threadset (set of accesses) and the lockset.

Tool Environment Large multithreaded OO programs running on the.NET platform –All code is translated into an intermediate language (IL) which is later compiled into platform specific code by the JIT compiler in the Common Language Runtime (CLR) Fig. l The CLR manages all runtime activities: object allocation, thread creation, garbage collection, exception handling.

Tool Environment RaceTrack instruments at the virtual machine level (CLR ~ JVM) –The JIT compiler in the CRL inserts calls to RaceTrack tools as it generates native code RaceTrack is language independent, as applications run directly on the modified runtime environment.

Race Track versus Lockset Lockset-based detection does not consider fork/join operations (if only one thread exists no data race is possible) and asynchronous calls (non-blocking). –Result: false alarms Observation: data race can occur only if several threads are currently accessing the variable

RaceTrack Approach RaceTrack maintains a lockset, C x for each shared variable x, but it also maintains a current threadset, S x. Threadset = a set of concurrent accessess, where concurrent is defined in terms of vector clocks. –A thread’s virtual clock ticks at certain synchronization operations; –Synchronization ops transfer information about clock values to other threads, which use it to update their own vector clocks (just as messages are used in earlier examples).

Threadsets Whenever a thread T j accesses a shared variable, it adds an entry (label) to that variable’s threadset. –Label = (thread id, timestamp of the access) T j then uses happens-before analysis, based on the vector clocks, to “prune” the threadset. –Any label L i in the threadset which “happens-before” the current access made by T j is removed –Any remaining accesses are considered “concurrent” Races are not considered to be a threat if the threadset is a singleton

Basic Algorithm - threads Each thread t has a lockset L t (locks-held) and a vector clock B t. –Lockset: contains currently held locks –Vector clock: most recent information about the logical clocks of t and all other threads Lock and unlock operations update L t Fork/join also update vector clocks. Local clock is set to 1 at thread creation, lockset is set to null

Basic Algorithm - variables Each variable x has a lockset C x and a threadset S x, where C x is the set of locks that are (potentially) currently protecting x and S x is the current set of concurrent accesses to x. –Initially, S x is the empty set { } and C x is intialized to the set of all possible locks

RaceTrack Approach Adjusts monitoring granularity from object level to field level based on program conditions. Issues warnings on-the-fly and then performs a more careful analysis during a post-mortem

RaceTrack Benefits Coverage: JIT compiler enables any code to be instrumented and monitored. Accuracy: Ability to monitor at a low granularity (field, individual array element) improves detection accuracy. Happens-before analysis filters out some false positives that would be flagged by lockset analysis alone. Performance: Monitoring is adaptive – reduce level when races are unlikely Scalability: good, due to low overhead and ease of instrumentation.

Future Work (RaceTrack) Add deadlock detection mechanisms to flag lock acquisitions that are ordered incorrectly.

Example of Potential Deadlock: global variables x, y; semaphores sx = sy = 1 Thread 1 P(sx); P(sy); x = f1(x,y); y = f2(x,y); V(sy); V(sx); Thread 2 P(sy); P(sx); x = p1(x,y); y = p2(x,y); V(sx); V(sy);