Presentation is loading. Please wait.

Presentation is loading. Please wait.

An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

Similar presentations


Presentation on theme: "An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and."— Presentation transcript:

1 An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and Information Systems, Ghent University, Belgium Computer Engineering Lab, Delft University of Technology, The Netherlands Parco2003, September 2-5, Dresden

2 2 Contents  Introduction  Non-determinism & data races  DIOTA  On-the-fly data race detection using DIOTA Method Implementation  Date Race Detection Example  Experimental Evaluation  Conclusions

3 3 Introduction  Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory  Data races occur when: two threads access the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable

4 4 Example code #include unsigned global=5; thread2(){ global=global+6; } thread3(){ global=global+7; } main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

5 5 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18) +6 +7 +6 +7 +6 +7

6 6 Example code II #include unsigned global=5; thread2(){lock(); global=global+6; unlock();} thread3(){lock(); global=global+7; unlock();} main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }

7 7 Detecting Data Races  Automatic data races detection is possible collect all memory references check parallel references  Static methods: checking the source code for all possible executions with all possible input values NP complete  not feasible  Dynamic methods: detects data races during one particular execution post mortem (not feasible) on-the-fly

8 8 Dynamic data race detection  Piece of code between two consecutive synchronisation operations: a segment  We collect two sets for all segments a of all threads: L(a) and S(a) with the addresses of all load and store operations  For all parallel segments a and b, gives the list of conflicting addresses. (L(a)  S(b))  (S(a)  L(b))  (S(a)  S(b))

9 9 Logical Clocks  A logical clock C( ) attaches a timestamp C(a) to an event a  Used for tracing the causal order of events  Clock condition:  Clocks are strongly consistent if

10 10 Scalar Clocks  Lamport Clocks  Simple and fast update algorithm:  Provides only limited information:

11 11 Scalar Clocks: example 1057 11 12 15 13 14

12 12 Vector Clocks  A vector clock for a program using N processes consists of N scalar values  Such a clock is strongly consistent

13 13 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5

14 14 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5

15 15 DIOTA  DIOTA (Dynamic Instrumentation, Optimization and Transformation of Applications) is a generic instrumentation tool  Backends use DIOTA to instrument memory intercept synchronisation functions ….  Deals correctly with data in code, code in data, self- modifying code  Clones processes: the original process is used for the data and the instrumented clone is used for the code  No need for recompilation, relinking or instrumentation of files.

16 16 Execution replay  ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations  Attaches a scalar Lamport timestamp to each synchronisation operation  Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay  We only need to log a small subset of all operations

17 17 Collecting memory operations  We need two lists of addresses per segment a: L(a) and S(a)  A multilevel bitmap is used takes spatiality into account low memory consumption comparing two bitmaps is easy  We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection.

18 18 Multilevel Memory bitmap 9 bit 14 bit S(a)

19 19 Detecting parallel segments  A vector timestamp is attached to each segment.  All segment information (two bitmaps+vector timestamps) is kept on a list L.  Each new segment is compared against the segments on list L.

20 20 Detecting obsolete segments  Obsolete segments should be removed from list L as soon as possible.  An obsolete segment is a segment that can no longer be parallel with new segments.  We use snooped matrix clock in order to detect these segments.

21 21 Detecting obsolete segments segments on list L segments in execution point of execution the future

22 22 Detecting obsolete segments segments on list L obsolete segments segments in execution point of execution the future

23 23 Comparing parallel segments segments on list L obsolete segments segments in execution point of execution the future

24 24 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention race

25 25 Experimental Evaluation  Implementation for Linux running on Intel multiprocessors.  Tested on a dual 500MHz Celeron PC.  SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,...  Several data races were found, including in SPLASH-2.

26 26 Performance of RecPlay  Slowdown:  Memory consumption: <3.4x

27 27 Conclusions  DIOTA is a practical and efficient tool for detecting and removing data races.  Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory- efficient implementation.  Data races have been found.


Download ppt "An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and."

Similar presentations


Ads by Google