Download presentation
Presentation is loading. Please wait.
Published byGladys Weaver Modified over 9 years ago
1
AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium
2
AADEBUG2000 - Munchen2 Contents Introduction Non-determinism & data races RecPlay Method Implementation Example Experimental Evaluation Conclusions
3
AADEBUG2000 - Munchen3 Introduction Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory: too much synchronisation: deadlock too little synchronisation: race condition cyclic debugging is impossible due to non- deterministic nature of most parallel programs program execution is not repeatable
4
AADEBUG2000 - Munchen4 Causes of non-determinism Sequential Programs: input (keyboard, disk, network), signals, interrupts, certain system calls ( gettimeofday(),…) Parallel programs: race conditions: two threads accessing the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable
5
AADEBUG2000 - Munchen5 Example code #include unsigned global=5; thread1(){ global=global+6; } thread2(){ global=global+7; } main(){ pthread_t t1,t2; pthread_create(&t1, NULL, thread1, NULL); pthread_create(&t2, NULL, thread2, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“global=%d\n”, global); }
6
AADEBUG2000 - Munchen6 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18) A A A A A A
7
AADEBUG2000 - Munchen7 Race conditions Two types: synchronisation races: doesn’t allow us to use cycli debugging is not a bug, is desired non-determinism data races: doesn’t allow us to use cyclic debugging is a bug, is undesired non-determinism distinction is a matter of abstraction Automatic of data races detection is possible collect all memory references check parallel references
8
AADEBUG2000 - Munchen8 Detecting data races Static methods: checking the source code for all possible executions with all possible input NP complete not feasible Dynamic methods: during an actual execution => only detects data races during this execution Removal requires cyclic debugging
9
AADEBUG2000 - Munchen9 Dynamic data race detection Piece of code between two consecutive synchronisation operations: a segment We collect two sets for all segments i of all thread: L(i) and S(i) with the addresses of all load and store operations For all parallel segments, gives the list of conflicting addresses.
10
AADEBUG2000 - Munchen10 Existing race detection methods Huge overhead causing probe effect and Heisenbugs Only detect the existence of a data race (and the variable), not the instructions involved. It is a bug, we need cyclic debugging!
11
AADEBUG2000 - Munchen11 RecPlay Synchronisation races: execution replay Data races: detect also enables cyclic debugging Allows you to detect/remove the first data race Three phases: record the order of the synchronisation operations replay the synchronisation operations and check for data races normal replay, without checking for data races
12
AADEBUG2000 - Munchen12 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention
13
AADEBUG2000 - Munchen13 Instrumentation JiTI (Just in Time Instrumentation) was developed especially for RecPlay, but it is a generic instrumentation tool Instruments memory and synchronisation operations Deals correctly with data in code, code in data, self- modifying code Clones processes: the original process is used for the data and the instrumented clone is used for the code No need for recompilation, relinking or instrumentation of files.
14
AADEBUG2000 - Munchen14 Execution replay ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations Attaches a scaler Lamport timestamp to each synchronisation operation Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay We only need to log a small subset of all operations
15
AADEBUG2000 - Munchen15 Collecting memory operations We need two lists of adresses per segment i: L(i) and S(i) A multilevel bitmap is used low memory consumption comparing two bitmaps is easy We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection
16
AADEBUG2000 - Munchen16 Memory bitmap 9 bit 14 bit
17
AADEBUG2000 - Munchen17 Detecting parallel segments A vectorclock is attached to each segment All segment information (two bitmaps+vector timestamps) is kept on a list L. Each new segment is compared against the segments on list L.
18
AADEBUG2000 - Munchen18 Detecting obsolete segments Obsolete segments should be removed from list L. We use snooped matrix clock in order to detect these segments
19
AADEBUG2000 - Munchen19 Detecting obsolete segments segment on list L obsolete segment segment in execution point of execution the future
20
AADEBUG2000 - Munchen20 Identification phase If a data race is detected, we know the address involved the type of operations involved (load or store) the threads involved the segments containing the racing instructions We need another replayed execution to find the racing instructions themselves (+ call stack, …) This replay executes at full speed till the racing segments start executing.
21
AADEBUG2000 - Munchen21 B2B2 An Example
22
AADEBUG2000 - Munchen22 B2B2 A1A1 C4C4P(S1) An Example
23
AADEBUG2000 - Munchen23 B2B2 A1A1 C4C4P(S1) An Example
24
AADEBUG2000 - Munchen24 B2B2 A1A1 C4C4P(S1) V(S1) An Example
25
AADEBUG2000 - Munchen25 B2B2 A1A1 C4C4P(S1) V(S1) An Example
26
AADEBUG2000 - Munchen26 B2B2 A1A1 C4C4P(S1) V(S1) An Example
27
AADEBUG2000 - Munchen27 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) An Example
28
AADEBUG2000 - Munchen28 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) An Example
29
AADEBUG2000 - Munchen29 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) P(S2) An Example
30
AADEBUG2000 - Munchen30 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) P(S2) An Example
31
AADEBUG2000 - Munchen31 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) P(S2) An Example
32
AADEBUG2000 - Munchen32 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 V(S2) P(S2) An Example
33
AADEBUG2000 - Munchen33 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) An Example
34
AADEBUG2000 - Munchen34 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) An Example
35
AADEBUG2000 - Munchen35 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
36
AADEBUG2000 - Munchen36 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
37
AADEBUG2000 - Munchen37 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
38
AADEBUG2000 - Munchen38 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
39
AADEBUG2000 - Munchen39 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
40
AADEBUG2000 - Munchen40 B2B2 A1A1 C4C4P(S1) V(S1) C A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example
41
AADEBUG2000 - Munchen41 Experimental Evaluation RecPlay has been implemented for Solaris running on SPARC multiprocessors Tested on a SUN SparcServer 1000 with 4 processors SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,... Several data races were found, including in SPLASH-2
42
AADEBUG2000 - Munchen42 Basic performance of RecPlay
43
AADEBUG2000 - Munchen43 Segments with memory accesses
44
AADEBUG2000 - Munchen44 Efficiency of the ROLT mechanism
45
AADEBUG2000 - Munchen45 Conclusions RecPlay is a practical and effictient tool for detecting and removing data races RecPlay also make cyclic debugging possible Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory-effictient implementation Data races have been found
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.