An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

Slides:

Advertisements

Similar presentations

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software

Advertisements

System Integration and Performance

UW-Madison Computer Sciences Multifacet Group© 2011 Karma: Scalable Deterministic Record-Replay Arkaprava Basu Jayaram Bobba Mark D. Hill Work done at.

Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.

4 December 2001 SEESCOASEESCOA STWW - Programma Debugging of Real-Time Embedded Systems: Experiences from SEESCOA Michiel Ronsse RUG-ELIS.

Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

Precise Detection of Memory Leaks Jonas Maebe, Michiel Ronsse, Koen De Bosschere WODA May 2004 Dammit, Jim. I’m an Eiffel Tower, not a Star Trek.

Hastings Purify: Fast Detection of Memory Leaks and Access Errors.

Chapter 11: File System Implementation

S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.

Continuously Recording Program Execution for Deterministic Replay Debugging.

4 July 2005 overview Traineeship: Mapping of data structures in multiprocessor systems Nick de Koning

Memory Management (II)

Deterministic Logging/Replaying of Applications. Motivation Run-time framework goals –Collect a complete trace of a program’s user-mode execution –Keep.

Memory Redundancy Elimination to Improve Application Energy Efficiency Keith Cooper and Li Xu Rice University October 2003.

DTHREADS: Efficient Deterministic Multithreading

PRASHANTHI NARAYAN NETTEM.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs A. Nistor, D. Marinov and J. Torellas to appear.

Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.

Deterministic Replay of Java Multithreaded Applications Jong-Deok Choi and Harini Srinivasan slides made by Qing Zhang.

Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,

Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

GBT Interface Card for a Linux Computer Carson Teale 1.

SSGRR A Taxonomy of Execution Replay Systems Frank Cornelis Andy Georges Mark Christiaens Michiel Ronsse Tom Ghesquiere Koen De Bosschere Dept. ELIS.

AADEBUG MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Chapter 4 Storage Management (Memory Management).

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.

Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.

Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.

European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.

Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.

1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal Vijay K. Garg

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.

14.1/21 Part 5: protection and security Protection mechanisms control access to a system by limiting the types of file access permitted to users. In addition,

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.

Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.

© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.

Lamport's Scalar clocks and Singhal-Kshemkalyani’s VC Algorithms

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 31 Memory Management.

Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.

October 24, 2003 SEESCOASEESCOA STWW - Programma Debugging Components Koen De Bosschere RUG-ELIS.

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.

Using Escape Analysis in Dynamic Data Race Detection Emma Harrington `15 Williams College

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.

Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore

Explicitly Parallel Programming with Shared-Memory is Insane: At Least Make it Deterministic! Joe Devietti, Brandon Lucia, Luis Ceze and Mark Oskin University.

Detecting Data Races in Multi-Threaded Programs

Greetings. Those of you who don't yet know me... Today is... and

Program Design Introduction to Computer Programming By:

Parallel Exact Stochastic Simulation in Biochemical Systems

Presentation transcript:

An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and Information Systems, Ghent University, Belgium Computer Engineering Lab, Delft University of Technology, The Netherlands Parco2003, September 2-5, Dresden

2 Contents  Introduction  Non-determinism & data races  DIOTA  On-the-fly data race detection using DIOTA Method Implementation  Date Race Detection Example  Experimental Evaluation  Conclusions

3 Introduction  Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory  Data races occur when: two threads access the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable

4 Example code #include unsigned global=5; thread2(){ global=global+6; } thread3(){ global=global+7; } main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

5 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18)

6 Example code II #include unsigned global=5; thread2(){lock(); global=global+6; unlock();} thread3(){lock(); global=global+7; unlock();} main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

7 Detecting Data Races  Automatic data races detection is possible collect all memory references check parallel references  Static methods: checking the source code for all possible executions with all possible input values NP complete  not feasible  Dynamic methods: detects data races during one particular execution post mortem (not feasible) on-the-fly

8 Dynamic data race detection  Piece of code between two consecutive synchronisation operations: a segment  We collect two sets for all segments a of all threads: L(a) and S(a) with the addresses of all load and store operations  For all parallel segments a and b, gives the list of conflicting addresses. (L(a)  S(b))  (S(a)  L(b))  (S(a)  S(b))

9 Logical Clocks  A logical clock C( ) attaches a timestamp C(a) to an event a  Used for tracing the causal order of events  Clock condition:  Clocks are strongly consistent if

10 Scalar Clocks  Lamport Clocks  Simple and fast update algorithm:  Provides only limited information:

11 Scalar Clocks: example

12 Vector Clocks  A vector clock for a program using N processes consists of N scalar values  Such a clock is strongly consistent

13 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5

14 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5

15 DIOTA  DIOTA (Dynamic Instrumentation, Optimization and Transformation of Applications) is a generic instrumentation tool  Backends use DIOTA to instrument memory intercept synchronisation functions ….  Deals correctly with data in code, code in data, self- modifying code  Clones processes: the original process is used for the data and the instrumented clone is used for the code  No need for recompilation, relinking or instrumentation of files.

16 Execution replay  ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations  Attaches a scalar Lamport timestamp to each synchronisation operation  Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay  We only need to log a small subset of all operations

17 Collecting memory operations  We need two lists of addresses per segment a: L(a) and S(a)  A multilevel bitmap is used takes spatiality into account low memory consumption comparing two bitmaps is easy  We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection.

18 Multilevel Memory bitmap 9 bit 14 bit S(a)

19 Detecting parallel segments  A vector timestamp is attached to each segment.  All segment information (two bitmaps+vector timestamps) is kept on a list L.  Each new segment is compared against the segments on list L.

20 Detecting obsolete segments  Obsolete segments should be removed from list L as soon as possible.  An obsolete segment is a segment that can no longer be parallel with new segments.  We use snooped matrix clock in order to detect these segments.

21 Detecting obsolete segments segments on list L segments in execution point of execution the future

22 Detecting obsolete segments segments on list L obsolete segments segments in execution point of execution the future

23 Comparing parallel segments segments on list L obsolete segments segments in execution point of execution the future

24 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention race

25 Experimental Evaluation  Implementation for Linux running on Intel multiprocessors.  Tested on a dual 500MHz Celeron PC.  SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,...  Several data races were found, including in SPLASH-2.

26 Performance of RecPlay  Slowdown:  Memory consumption: <3.4x

27 Conclusions  DIOTA is a practical and efficient tool for detecting and removing data races.  Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory- efficient implementation.  Data races have been found.