Presentation is loading. Please wait.

Presentation is loading. Please wait.

AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium.

Similar presentations


Presentation on theme: "AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium."— Presentation transcript:

1 AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium

2 AADEBUG2000 - Munchen2 Contents  Introduction  Non-determinism & data races  RecPlay Method Implementation  Example  Experimental Evaluation  Conclusions

3 AADEBUG2000 - Munchen3 Introduction  Developing parallel programs for multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared memory: too much synchronisation: deadlock too little synchronisation: race condition  cyclic debugging is impossible due to non- deterministic nature of most parallel programs  program execution is not repeatable

4 AADEBUG2000 - Munchen4 Causes of non-determinism  Sequential Programs: input (keyboard, disk, network), signals, interrupts, certain system calls ( gettimeofday(),…)  Parallel programs: race conditions: two threads accessing the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable

5 AADEBUG2000 - Munchen5 Example code #include unsigned global=5; thread1(){ global=global+6; } thread2(){ global=global+7; } main(){ pthread_t t1,t2; pthread_create(&t1, NULL, thread1, NULL); pthread_create(&t2, NULL, thread2, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); printf(“global=%d\n”, global); }

6 AADEBUG2000 - Munchen6 Possible executions L(5) global=12 global=18global=11 L(5) L(11) S(11) S(12) S(11) S(12) S(11) S(18) A A A A A A

7 AADEBUG2000 - Munchen7 Race conditions  Two types: synchronisation races: doesn’t allow us to use cycli debugging is not a bug, is desired non-determinism data races: doesn’t allow us to use cyclic debugging is a bug, is undesired non-determinism distinction is a matter of abstraction  Automatic of data races detection is possible collect all memory references check parallel references

8 AADEBUG2000 - Munchen8 Detecting data races  Static methods: checking the source code for all possible executions with all possible input NP complete  not feasible  Dynamic methods: during an actual execution => only detects data races during this execution  Removal requires cyclic debugging

9 AADEBUG2000 - Munchen9 Dynamic data race detection  Piece of code between two consecutive synchronisation operations: a segment  We collect two sets for all segments i of all thread: L(i) and S(i) with the addresses of all load and store operations  For all parallel segments, gives the list of conflicting addresses.

10 AADEBUG2000 - Munchen10 Existing race detection methods  Huge overhead causing probe effect and Heisenbugs  Only detect the existence of a data race (and the variable), not the instructions involved.  It is a bug, we need cyclic debugging!

11 AADEBUG2000 - Munchen11 RecPlay  Synchronisation races: execution replay  Data races: detect also enables cyclic debugging  Allows you to detect/remove the first data race  Three phases: record the order of the synchronisation operations replay the synchronisation operations and check for data races normal replay, without checking for data races

12 AADEBUG2000 - Munchen12 Overview Choose input Record Replay+ detect Replay+ ident. Replay+ debug Replay+ debug Choose new input The end AutomaticRequires user intervention

13 AADEBUG2000 - Munchen13 Instrumentation JiTI (Just in Time Instrumentation) was developed especially for RecPlay, but it is a generic instrumentation tool Instruments memory and synchronisation operations Deals correctly with data in code, code in data, self- modifying code Clones processes: the original process is used for the data and the instrumented clone is used for the code No need for recompilation, relinking or instrumentation of files.

14 AADEBUG2000 - Munchen14 Execution replay  ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations  Attaches a scaler Lamport timestamp to each synchronisation operation  Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay  We only need to log a small subset of all operations

15 AADEBUG2000 - Munchen15 Collecting memory operations  We need two lists of adresses per segment i: L(i) and S(i)  A multilevel bitmap is used low memory consumption comparing two bitmaps is easy  We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection

16 AADEBUG2000 - Munchen16 Memory bitmap 9 bit 14 bit

17 AADEBUG2000 - Munchen17 Detecting parallel segments  A vectorclock is attached to each segment  All segment information (two bitmaps+vector timestamps) is kept on a list L.  Each new segment is compared against the segments on list L.

18 AADEBUG2000 - Munchen18 Detecting obsolete segments  Obsolete segments should be removed from list L.  We use snooped matrix clock in order to detect these segments

19 AADEBUG2000 - Munchen19 Detecting obsolete segments segment on list L obsolete segment segment in execution point of execution the future

20 AADEBUG2000 - Munchen20 Identification phase  If a data race is detected, we know the address involved the type of operations involved (load or store) the threads involved the segments containing the racing instructions  We need another replayed execution to find the racing instructions themselves (+ call stack, …)  This replay executes at full speed till the racing segments start executing.

21 AADEBUG2000 - Munchen21 B2B2 An Example

22 AADEBUG2000 - Munchen22 B2B2 A1A1 C4C4P(S1) An Example

23 AADEBUG2000 - Munchen23 B2B2 A1A1 C4C4P(S1) An Example

24 AADEBUG2000 - Munchen24 B2B2 A1A1 C4C4P(S1) V(S1) An Example

25 AADEBUG2000 - Munchen25 B2B2 A1A1 C4C4P(S1) V(S1) An Example

26 AADEBUG2000 - Munchen26 B2B2 A1A1 C4C4P(S1) V(S1) An Example

27 AADEBUG2000 - Munchen27 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) An Example

28 AADEBUG2000 - Munchen28 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) An Example

29 AADEBUG2000 - Munchen29 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

30 AADEBUG2000 - Munchen30 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

31 AADEBUG2000 - Munchen31 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

32 AADEBUG2000 - Munchen32 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 V(S2) P(S2) An Example

33 AADEBUG2000 - Munchen33 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) An Example

34 AADEBUG2000 - Munchen34 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) An Example

35 AADEBUG2000 - Munchen35 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

36 AADEBUG2000 - Munchen36 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

37 AADEBUG2000 - Munchen37 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

38 AADEBUG2000 - Munchen38 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3)  An Example

39 AADEBUG2000 - Munchen39 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3)   An Example

40 AADEBUG2000 - Munchen40 B2B2 A1A1 C4C4P(S1) V(S1) C  A+B A3A3 P(S2) V(S3) V(S2) P(S3) An Example

41 AADEBUG2000 - Munchen41 Experimental Evaluation  RecPlay has been implemented for Solaris running on SPARC multiprocessors  Tested on a SUN SparcServer 1000 with 4 processors  SPLASH-2 was used as a benchmark number of multithreaded numeric applications, such as fast fourier transform, a raytracer,...  Several data races were found, including in SPLASH-2

42 AADEBUG2000 - Munchen42 Basic performance of RecPlay

43 AADEBUG2000 - Munchen43 Segments with memory accesses

44 AADEBUG2000 - Munchen44 Efficiency of the ROLT mechanism

45 AADEBUG2000 - Munchen45 Conclusions  RecPlay is a practical and effictient tool for detecting and removing data races  RecPlay also make cyclic debugging possible  Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory-effictient implementation  Data races have been found


Download ppt "AADEBUG 2000 - MUNCHEN Non-intrusive on-the-fly data race detection using execution replay Michiel Ronsse - Koen De Bosschere Ghent University - Belgium."

Similar presentations


Ads by Google