An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and Information Systems, Ghent University, Belgium Computer Engineering Lab, Delft University of Technology, The Netherlands Parco2003, September 2-5, Dresden
2 Contents Introduction Non-determinism & data races DIOTA On-the-fly data race detection using DIOTA Method Implementation Date Race Detection Example Experimental Evaluation Conclusions
3 Introduction Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory Data races occur when: two threads access the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable
4 Example code #include unsigned global=5; thread2(){ global=global+6; } thread3(){ global=global+7; } main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }
5 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18)
6 Example code II #include unsigned global=5; thread2(){lock(); global=global+6; unlock();} thread3(){lock(); global=global+7; unlock();} main(){ pthread_t t2,t3; pthread_create(&t2, NULL, thread1, NULL); pthread_create(&t3, NULL, thread2, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); printf(“global=%d\n”, global); }
7 Detecting Data Races Automatic data races detection is possible collect all memory references check parallel references Static methods: checking the source code for all possible executions with all possible input values NP complete not feasible Dynamic methods: detects data races during one particular execution post mortem (not feasible) on-the-fly
8 Dynamic data race detection Piece of code between two consecutive synchronisation operations: a segment We collect two sets for all segments a of all threads: L(a) and S(a) with the addresses of all load and store operations For all parallel segments a and b, gives the list of conflicting addresses. (L(a) S(b)) (S(a) L(b)) (S(a) S(b))
9 Logical Clocks A logical clock C( ) attaches a timestamp C(a) to an event a Used for tracing the causal order of events Clock condition: Clocks are strongly consistent if
10 Scalar Clocks Lamport Clocks Simple and fast update algorithm: Provides only limited information:
11 Scalar Clocks: example
12 Vector Clocks A vector clock for a program using N processes consists of N scalar values Such a clock is strongly consistent
13 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5
14 Vector Clocks: example 10,2,42,4,63,7,5 11,2,4 10,8,5 12,9,5 10,9,5 10,8,710,10,5
15 DIOTA DIOTA (Dynamic Instrumentation, Optimization and Transformation of Applications) is a generic instrumentation tool Backends use DIOTA to instrument memory intercept synchronisation functions …. Deals correctly with data in code, code in data, self- modifying code Clones processes: the original process is used for the data and the instrumented clone is used for the code No need for recompilation, relinking or instrumentation of files.
16 Execution replay ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations Attaches a scalar Lamport timestamp to each synchronisation operation Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay We only need to log a small subset of all operations
17 Collecting memory operations We need two lists of addresses per segment a: L(a) and S(a) A multilevel bitmap is used takes spatiality into account low memory consumption comparing two bitmaps is easy We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection.
18 Multilevel Memory bitmap 9 bit 14 bit S(a)
19 Detecting parallel segments A vector timestamp is attached to each segment. All segment information (two bitmaps+vector timestamps) is kept on a list L. Each new segment is compared against the segments on list L.
20 Detecting obsolete segments Obsolete segments should be removed from list L as soon as possible. An obsolete segment is a segment that can no longer be parallel with new segments. We use snooped matrix clock in order to detect these segments.
21 Detecting obsolete segments segments on list L segments in execution point of execution the future
22 Detecting obsolete segments segments on list L obsolete segments segments in execution point of execution the future
23 Comparing parallel segments segments on list L obsolete segments segments in execution point of execution the future
24 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention race
25 Experimental Evaluation Implementation for Linux running on Intel multiprocessors. Tested on a dual 500MHz Celeron PC. SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,... Several data races were found, including in SPLASH-2.
26 Performance of RecPlay Slowdown: Memory consumption: <3.4x
27 Conclusions DIOTA is a practical and efficient tool for detecting and removing data races. Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory- efficient implementation. Data races have been found.