FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,

Verification of Multithreaded Object- Oriented Programs with Invariants Bart Jacobs, K. Rustan M. Leino, Wolfram Schulte.

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

Week 9, Class 3: Model-View-Controller Final Project Worth 2 labs Happens-Before ( SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

Asynchronous Assertions Eddie Aftandilian and Sam Guyer Tufts University Martin Vechev ETH Zurich and IBM Research Eran Yahav Technion.

Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.

An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

/ PSWLAB Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs By Cormac Flanagan, Stephen N. Freund 24 th April, 2008 Hong,Shin.

ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.

Concurrency and Thread Yoshi. Two Ways to Create Thread Extending class Thread – Actually, we need to override the run method in class Thread Implementing.

Cormac Flanagan and Stephen Freund PLDI 2009 Slides by Michelle Goodstein 07/26/10.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.

C. FlanaganSAS’04: Type Inference Against Races1 Type Inference Against Races Cormac Flanagan UC Santa Cruz Stephen N. Freund Williams College.

Efficient dynamic race detection int x; void * t1(){ x = 2; } void * t2(){ x = 3; } main(){ pthread_create( t1 ); pthread_create( t2 ); printf( “x is %d\n”,

CS533 Concepts of Operating Systems Class 3 Monitors.

Parallelizing Data Race Detection Benjamin Wester Facebook David Devecsery, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan.

Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.

Modern Concurrency Abstractions for C# by Nick Benton, Luca Cardelli & C´EDRIC FOURNET Microsoft Research.

Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

- 1 - Dongyoon Lee, Peter Chen, Jason Flinn, Satish Narayanasamy University of Michigan, Ann Arbor Chimera: Hybrid Program Analysis for Determinism * Chimera.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.

/ PSWLAB Type-Based Race Detection for J AVA by Cormac Flanagan, Stephen N. Freund 22 nd Feb 2008 presented by Hong,Shin Type-Based.

50.530: Software Engineering Sun Jun SUTD. Week 8: Race Detection.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.

Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.

Michael Bond Katherine Coons Kathryn McKinley University of Texas at Austin.

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

Detecting Atomicity Violations via Access Interleaving Invariants

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:

GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.

Correctness of parallel programs Shaz Qadeer Research in Software Engineering CSEP 506 Spring 2011.

Tuning Threaded Code with Intel® Parallel Amplifier.

Week 8, Class 3: Model-View-Controller Final Project Worth 2 labs Cleanup of Ducks Reducing coupling Finishing FactoryMethod Cleanup of Singleton SE-2811.

Using Escape Analysis in Dynamic Data Race Detection Emma Harrington `15 Williams College

IThreads A Threading Library for Parallel Incremental Computation Pramod Bhatotia Pedro Fonseca, Björn Brandenburg (MPI-SWS) Umut Acar (CMU) Rodrigo Rodrigues.

Incremental Parallel and Distributed Systems Pramod Bhatotia MPI-SWS & Saarland University April 2015.

Presenter: Godmar Back

Healing Data Races On-The-Fly

CS533 Concepts of Operating Systems Class 3

Online Subpath Profiling

10.3 Bubble Sort Chapter 10 - Sorting.

Jipeng Huang, Michael D. Bond Ohio State University

Threads and Memory Models Hal Perkins Autumn 2009

CSCI1600: Embedded and Real Time Software

CS533 Concepts of Operating Systems Class 3

CSCI1600: Embedded and Real Time Software

Dynamic Race Prediction in Linear Time

Maximizing Speedup through Self-Tuning of Processor Allocation

Eraser: A dynamic data race detector for multithreaded programs

Presentation transcript:

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha

[FlFr09] PLDI’09 2Contents  Introduction  Background  The FastTrack Algorithm  Implementation  Evaluation  Conclusions

[FlFr09] PLDI’09 3Introduction  Motivation  vector clocks are expensive  VC requires O(n) storage space and each VC operation requires O(n) time  motivated in part by the performance limitations of vector clocks  limitations  imprecise race detectors or static race detector can report false alarms  precise race detectors never produce false alarms, but it limited by the performance overhead of VC  vector clock’s full generality is not actually necessary in most cases  the vast majority of data in multithreaded programs is either thread local, lock protected, or read shared  can provide constant-time fast paths for common cases without any loss of precision or correctness in the general case

[FlFr09] PLDI’09 4  FastTract Overview  using ephoch  a pair of a clock and a thread identifier  for write accesses: records information only about the very last write to x  all write to x are totally ordered by the happens-before relation  for read accesses: records only the epoch of the last read to x  read operations on thread-local and lock-protected data are totally ordered  reduces overhead of almost all monitored operations  for analysis: from O(n)-time to O(1)-time  n is the number of threads in the program  for space: from O(n) to O(1)  only thread-local and lock-protected data

[FlFr09] PLDI’09 5Background  Multithreaded Programs Traces  a thread t has the set of operations  rd(t, x) and wr(t, x): read and write a value from x  acq(t, m) and rel(t, m): acquire and release a lock m  fork(t, u): forks a new thread u  join(t, u): blocks until thread u terminates  happens-before relation < α  the smallest transitively-closed relation over the operations in a trace α  a < α b: one of the states, Program order, Locking, Fork-join  race condition  two operations in a trace are not related by the happens-before relation  a trace has two concurrent conflicting accesses

[FlFr09] PLDI’09 6  Review: the DJIT + Algorithm  based on vector clocks  maintains an additional vector clock for each lock m  to identify conflicting accesses keeps two vector clock for read and write C 0 C 1 l m W x wr(0, x) rel(0, m) acq(1, m) wr(1, x)

[FlFr09] PLDI’09 7 The FastTrack Algorithm  Empirical data gathered from the action of race detection  full VC is not necessary in almost read and write operations  lightweight representation of the happens-before rel. can be used instead  only a small fraction of operations need full vector clock operations  How to catch each type of race condition?  each race condition is either  a read-write race: a read concurrent with a later write to the same variable  a write-read race: a write concurrent with a later read  a write-write race: involving two concurrent writes rd(0, x) wr(1, x) wr(0, x) rd(1, x) wr(0, x) wr(1, x) a read-write racea write-read racea write-write race

[FlFr09] PLDI’09 8  Detecting write-write races  all writes to x are totally ordered (no races have been detected)  an epoch a pair of a clock c and a thread t  epochs reduce the space and analysis overhead (write-write): O(1) C0C0 C1C1 LmWx ⊥e⊥e wr(0, x) rel(0, m) acq(1, m) wr(1, x)

[FlFr09] PLDI’09 9`  Detecting write-read races  uses epoch of Wx and current vector clock Ct  check that the read happens after the last write  need O(1)-time for comparison Wx ≤ Ct C0C0 C1C1 LmWx ⊥e⊥e wr(0, x) rel(0, m) acq(1, m) wr(1, x)rd(0, x)

[FlFr09] PLDI’09 10  Detecting read-write races  read-write race condition is more difficult  a write could potentially conflict with the last read performed by any other thread  need to record an entire VC of the last read from x by thread t  common situations for using epoch (totally ordered in practice)  Thread-local data: only one thread accesses a variable, and hence these accesses are totally ordered (program order)  Lock-protected data: a protecting lock is held on each access to a variable, and hence all access are totally ordered (program order or synch. order)  reads are typically unordered only when data is read-shared  uses an adaptive representation for tracking the read history

[FlFr09] PLDI’09 11  Analysis Details  an online algorithm that maintains an analysis state σ  σ = (Ct, Lm, Rx, Wx)  Rx: identifies either the epoch of the last read of x (all other read is ordered) or a vector clock that is the join of all reads of x  reads: 82.3% of all operations  requires O(n)-time for shared-read: 0.1% of reads  requires O(1)-time for other reads  writes: 14.5% of all operations  requires O(n)-time for shared-write: 0.1% of writes  requires O(1)-time for other writes

[FlFr09] PLDI’09 12  An Example of FT C0C0 C1C1 WxRx ⊥e⊥e wr(0, x) ⊥e⊥e fork(0, 1) rd(0, x) join(0, 1) wr(0, x) rd(0, x) rd(1, x) ⊥e⊥e ⊥e⊥e ⊥e⊥e

[FlFr09] PLDI’09 13Implementation  FT Instrumentation State and Code  represents an epoch as a 32-bit integer  the top 8 bits: store the thread identifier t  the bottom 24 bits: store the clock c  associates with each thread a ThreadState object  containing a unique thread identifier tid and a vector clock C  for instrumentation: t. C [t. tid ]  Granularity  supports two levels of granularity for analyzing memory locations  fine-grain analysis (default) and coarse-grain analysis  coarse-grain analysis reduces the memory footprint  but may produce false alarms if two fields of an object are protected by different locks

[FlFr09] PLDI’09 14  Extensions  supports additional synchronization primitives  wait and notify, volatile variables, and barriers  models a wait operation on lock m  does not need additional analysis rules  a notify operation can be ignored  guarantees that a write of vx happens before every subsequent read of vx  extends the L component to map volatile variables to the VC of the last write  volatile writes and reads modify the same way as lock acquire and release  consider release operation barrier_rel(T) for a barrier  the first post-barrier step happens after all pre-barrier steps  is unordered with respect to the next steps taken by other threads

[FlFr09] PLDI’09 15Evaluation  Precision and Performance  compares the precision and performance of 7 dynamic analyses  Empty, FastTrack, Eraser, DJIT +, MultiRace, GoldiLocks, and BasicVC  all tools were implemented on top of RoadRunner  Benchmark Configuration  performed experiments on 16 benchmarks  report at most one race for each field of each class and each array access  Summary of Results  FT outperforms other tools  provides almost a 10x speedup over BasicVC and a 2.3x speedup even over the DJIT+ algorithm  provides a substantial increase in precision over Eraser without loss in performance

[FlFr09] PLDI’09 16Conclusion  FastTrack is a new precise race detection algorithm  uses an adaptive lightweight representation for the happens-before relation that reduces both space and time overheads  despite its efficiency, it is a comparatively simple algorithm that is straightforward to implement  contains optimized constant-time fast paths that handle upwards of 96% of the operations in benchmarks  provides a 2.3x performance improvement over the DJIT + algorithm, and incurs less than half the memory overhead of DJIT +