Presentation is loading. Please wait.

Presentation is loading. Please wait.

S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007.

Similar presentations


Presentation on theme: "S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007."— Presentation transcript:

1 S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007

2  Data Races hard to debug ◦ Difficult to detect ◦ Even more difficult to reproduce  Data Race Detectors help in detection ◦ LockSet, Happens-Before and Atomicity Violation  But they tend to overdo it ◦ Up to 90% false alarms  Especially with LockSet We need a tool that detects and reliably classifies all harmful Data Races 2

3  Offline Dynamic Happens-Before Data Race Detection ◦ Step 1: Trace Capturing ◦ Step 2: Offline Happens-Before Analysis ◦ Step 3: Replay Critical Segments ◦ Step 4: Auto Classify harmful vs. benign races 3

4  iDNA captures the execution of an application  Simply records initial state, ◦ Registers and PC  load values, ◦ Only those needed absolutely ◦ 1 st load after a store, DMA etc…  and a global clock (sequensers) ◦ Inserted in the thread’s replay log for  Synchronization events  System calls 4

5 5  Good old Happens- Before ◦ Two conflicting accesses  At least one write  Not ordered  Detects only the data races that happened

6  When a data race is detected replay the affected segments twice ◦ 1 st with the actual order  Given by the load values ◦ 2 nd reverse the racing accesses  Store the replay result ◦ No-State-Change: If all live-outs are the same ◦ State-Change: If at least 1 live-out changed ◦ Replay Failure: If disaster encountered  Load null or unencountered address  Branch someplace else 6

7 7 Replay Failure Potentially Harmful Data Race

8  Repeat step 3 for each instance of a data race  Potentially Benign Data Race ◦ every replay results to No-State-Change  Potentially Harmful Data Race ◦ ≥1 replay results in State-Change or Replay Failure  State-Change shows that something would be different if things took the other path  Replay Failure indicates that a program changed that much, so we cannot simulate the other state ◦ Concrete proof that something definitely changed  Easier for the programmers to accept it 8

9  18 different executions of various services in Windows Vista and Internet Explorer  Happens-Before returns 16,642 data races ◦ 68 unique  Trace capture ◦ 0.8 bits per instruction  96 MB per 1,000,000,000 instructions  Only 1 st loads and synchronizers captured ◦ 0.3 if compressed with zip 9

10  Results for Internet Explorer ◦ P4 Xeon 2.2 GHz, 1 GB of RAM  Start adding… ◦ 6x for capturing ◦ 10x for replaying (unnecessary) ◦ 45x offline Happens-Before Data Race Detection ◦ 280x replay analysis  2,196 dynamic data races 10

11 Potentially BenignPotentially Harmful To- tal Real BenignReal HarmfulReal BenignReal Harmful No-State- Change 320 State Change 15217 Replay Failure 14519 Total32029768 11 Impossible State Automatically Classified Manually Classified All harmful races identified correctly 0 false negatives All harmful races identified correctly 0 false negatives Half benign races identified correctly. Half still persist

12  32 Real Benign races classified as such ◦ Every instance must return No-State-Changed ◦ The more instances, the more confidence in the classification 12

13  7 Real bugs, correctly identified  At least 1 State- Change or Replay Failure required 13 Dangerous Zone

14  29 Benign races incorrectly classified as harmful ◦ Approximate Computation (23/29)  Statistics etc ◦ Replayer Limitation (6/29)  At least 1 instance caused replay failure  The final outcome is the same 14

15  User Constructed ◦ Garbage collector does not use locks  Double Checks ◦ If (a) {lock(…); if(a) {…}}  Both Values Valid ◦ Use cache? High Perf?  Redundant Writes ◦ Rewrite the same value  Disjoint bit manipulation ◦ Modify different bits in same variable # Races User Constructed Synchronization 8 Double Checks3 Both Values Valid5 Redundant Writes13 Disjoint bit manipulation 9 Approx. Computation 23 15 23 false positives that were not caused by replay failure

16  Interesting approach to identify benign races  It would be interesting to apply it to LockSet ◦ LockSet has far more false positives ◦ But it can detect bugs that did not happen in production runs  A grand total overhead is missing 16

17 17 Thank You!!!


Download ppt "S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, B. Calder UCSD and Microsoft PLDI 2007."

Similar presentations


Ads by Google