Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Race Prediction in Linear Time

Similar presentations


Presentation on theme: "Dynamic Race Prediction in Linear Time"— Presentation transcript:

1 Dynamic Race Prediction in Linear Time
Dileep Kini Umang Mathur Mahesh Viswanathan University of Illinois at Urbana Champaign

2 Debugging concurrent programs is a nightmare
Reasoning about all possible inter-leavings ! Data Races, Deadlocks, …… Fundamental Challenge ! Data Races

3 Data Races and How to Find Them
A data race occurs in an execution

4 Data Races and How to Find Them
A data race occurs in an execution when two concurrent threads consecutively accessing a shared memory location Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l)

5 Data Races and How to Find Them
A data race occurs in an execution when two concurrent threads consecutively accessing a shared memory location At least one of the events is a write Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l)

6 Data Races and How to Find Them
A data race occurs in an execution when two concurrent threads consecutively accessing a shared memory location At least one of the events is a write Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l)

7 Data Races and How to Find Them
Static Race Detection – Undecidable in general, false positives Dynamic Race Detection Lockset Based (Eraser) – false alarms Sound Predictive Analysis Happens Before (Lamport) Causally Precedes (Smaragdakis et al) Maximal Causal Models (Rosu et al) Our technique detects more races and scales too !

8 Predictive Analysis A given execution may not have a race, but can provide insights on other possible executions (by the same program) that exhibit race. Thread t1 Thread t2 1 w(x) 2 acq(l) 3 r(x) 4 rel(l) Thread t1 Thread t2 1 acq(l) 2 w(x) 3 r(x) 4 rel(l) PREDICT No data race

9 Predictive Analysis A given execution may not have a race, but can provide insights on other possible executions (by the same program) that exhibit race. Thread t1 Thread t2 1 w(x) 2 acq(l) 3 r(x) 4 rel(l) Thread t1 Thread t2 1 acq(l) 2 w(x) 3 r(x) 4 rel(l) PREDICT No data race Predictable data race Data race uncovered

10 Predictive Analysis ❌ ❌
Any program that generates σ can also generate all its correct reorderings Trace σ’ is a correct reordering of a trace σ if : σ’|t is a prefix of σ|t for every thread t Critical sections do not overlap All reads in σ’ see the same values as in σ – last w(x) before any r(x) is the same in both σ and σ’ Thread t1 Thread t2 1 acq(l) 2 r(x) 3 rel(l) 4 5 w(x) 6 Thread t1 Thread t2 1 w(x) 2 acq(l) 3 r(x) 4 rel(l) 5 6

11 Predictive Analysis ❌ ❌
Trace σ’ is a correct reordering of a trace σ if : σ’|t is a prefix of σ|t for every thread t Critical sections do not overlap All reads in σ’ see the same values as in σ – last w(x) before any r(x) is the same in both σ and σ’ Thread t1 Thread t2 1 acq(l) 2 r(x) 3 rel(l) 4 5 w(x) 6 Thread t1 Thread t2 1 w(x) 2 acq(l) 3 r(x) 4 rel(l) 5 6

12 Predictive Analysis Trace σ’ is a correct reordering of a trace σ if :
σ’|t is a prefix of σ|t for every thread t Critical sections do not overlap All reads in σ’ see the same values as in σ – last w(x) before any r(x) is the same in both σ and σ’ Predictive analysis techniques tend to check if there is a correct reordering of a given trace σ that exhibits a concurrency error, typically using partial orders over the events in σ

13 Predictive Analysis - HB
Thread t1 Thread t2 1 acq(l) 2 rel(l) 3 4 r(x) 5 6 w(x) 7 Pair of read/write events on same memory location performed by different threads, of which at least one is a write Order events in a trace (≤HB ) : Events inside a thread are ordered as seen in the trace. Order critical sections as they appear in the trace. Declare a race when two conflicting accesses are unordered. Admits an online linear time algorithm 1 2 Data Race

14 Predictive Analysis - HB
Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l) 5 6 7 8 Thread t1 Thread t2 1 w(y) 2 acq(l) 3 r(x) 4 rel(l) 5 6 7 8 r(y) 1 2 1 2 No HB race No HB race No predictable race

15 Predictive Analysis - HB
Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l) 5 6 7 8 Thread t1 Thread t2 1 w(y) 2 acq(l) 3 r(x) 4 rel(l) 5 6 7 8 r(y) No HB race No HB race No predictable race Predictable race

16 Predictive Analysis - HB
Thread t1 Thread t2 1 acq(l) 2 r(x) 3 w(x) 4 rel(l) 5 6 7 8 Thread t1 Thread t2 1 2 3 4 5 6 7 8 HB is too conservative ! w(y) acq(l) r(x) rel(l) w(y) acq(l) r(x) rel(l) r(y) No HB race No HB race No predictable race Predictable race r(y)

17 Predictive Analysis - CP
<CP is the smallest transitive relation that orders e1 and e2 if: e1 = rel(l), e2 = acq(l), and their critical sections have conflicting events e1 = rel(l), e2 = acq(l), and their critical sections have events ordered by <CP. ∃e3 such that e1 ≤HB e3 <CP e2 or e1 <CP e3 ≤HB e2 Partial order ≤CP = <CP ∪Thread-order Declare a race when two conflicting accesses are unordered. acq(l) r(x) rel(l) w(x) 1 acq(l) e rel(l) e’ CP 2 e1 e3 e2 HB 3 CP † Sound Predictive Race Detection in Polynomial Time, Y. Smaragdakis et al, POPL 2012

18 Predictive Analysis - CP
Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(x) 7 r(y) 8 Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(y) 7 r(x) 8 1 No CP race No predictable race

19 Predictive Analysis - CP
Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(x) 7 r(y) 8 Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(y) 7 r(x) 8 No CP race No predictable race

20 Predictive Analysis - CP
Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(x) 7 r(y) 8 Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(y) 7 r(x) 8 1 No CP race No CP race No predictable race

21 Predictive Analysis - CP
Thread t1 Thread t2 1 w(y) 2 acq(l) 3 w(x) 4 rel(l) 5 6 r(x) 7 r(y) 8 Thread t1 Thread t2 1 2 3 4 5 6 7 8 w(y) w(y) acq(l) w(x) rel(l) r(y) r(x) w(y) r(y) acq(l) r(y) No CP race No CP race No predictable race Predictable race missed by CP

22 Predictive Analysis - CP
CP misses races CP does not have a linear time algorithm – fails to scale to traces having millions of events Resort to windowing – can miss even HB races ! Remedy ? Detects races missed by CP WCP to the rescue Linear running time

23 Weak Causal Precedence
<WCP is the smallest transitive relation that orders e1 and e2 if: e1 = rel(l), e2 = r(x)/w(x) inside a critical section of lock l, and e2 conflicts with an event in crit. sec. of e1 e1 = rel(l), e2 = rel(l), and their critical sections have events ordered by <WCP ∃e3 such that e1 ≤HB e3 <WCP e2 or e1 <WCP e3 ≤HB e2 Partial order ≤WCP = <WCP ∪Thread-order Declare a race when two conflicting accesses are unordered. acq(l) r(x) rel(l) w(x) 1 acq(l) e rel(l) e’ WCP 2 e1 e3 e2 HB 3 WCP

24 WCP v/s CP acq(l) r(x) rel(l) w(x) acq(l) e rel(l) e’ WCP adds fewer orderings than CP, thus allowing for more correct re-orderings CP CP WCP WCP Rule 1. Rule 2.

25 WCP v/s CP 1 2 3 4 5 6 7 8 Thread t1 Thread t2 w(y) acq(l) w(x) rel(l)
r(y) r(x)

26 WCP v/s CP 1 2 3 Predictable race caught by WCP
Thread t1 Thread t2 1 2 3 4 5 6 7 8 w(y) acq(l) w(x) rel(l) r(y) r(x) Predictable race missed by CP Predictable race caught by WCP CP WCP

27 WCP v/s CP 1 2 3 Predictable race caught by WCP
Thread t1 Thread t2 1 2 3 4 5 6 7 8 w(y) w(y) acq(l) w(x) rel(l) r(y) r(x) Predictable race missed by CP Predictable race caught by WCP w(y) r(y) acq(l) r(y)

28 Assumption - Nested locking paradigm
WCP Soundness Assumption - Nested locking paradigm WCP is weakly sound Given any trace σ, if σ exhibits a WCP-race then σ exhibits a predictable race or a predictable deadlock Any program generating σ can generate an execution σ’ (correct reordering) exhibiting a race/deadlock Proof of soundness is non trivial. Soundness proof for CP was incorrect. conflicting events unordered by ≤WCP

29 Vector Clock Algorithm
Assigns timestamp Ce to each event e, similar to HB vector clock algorithm Timestamps are vector times (clocks) – thread indexed vectors, supporting various operations : comparison (⊑), join (⊔), update, etc., Ce ⊑ Ce’ iff e ≤WCP e’ – conflicting events with unordered timestamps imply a WCP race One pass online algorithm – detects races as they occur Processes events as they are generated, updates internal state (comprising of vector clocks and FIFO queues)

30 Vector Clock Algorithm
Linear running time – O(n) where n is the number of events Worst case space requirement – O(n) Empirically, the space overhead was observed to be small. Optimal space usage – any single pass algorithm for WCP takes Ω(n) space Optimal time/space tradeoff – For any algorithm computing WCP in time T(n) and space S(n), it must be the case that T(n)∙S(n) ∈Ω(n2)

31 Weak Causal Precedence
WCP detects all races (and deadlocks) detected by CP or HB, and even more WCP admits a linear time algorithm HB < CP < WCP CP < HB ≈ WCP

32 Experimental Evaluation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Program LOC #Events #Thrds #Locks #Races WCP Queue Length (%) Time WCP HB RV Max w=1K, s=60s w=10K, s=240s account 87 130 0.2s 0.3s 1s airline 83 128 0.8s 2s array 36 47 4.3 1.1s boundedbuffer 334 333 bubblesort 274 4K 2.4 0.7s 0.5s 3.6s 7m3s bufwriter 199 11.7M 47s 22.4s 4.1s 4.5s critical 63 55 1.7s 0.9s mergesort 298 3K 1.3 0.4s 1.4s pingpong 124 146 1.2s 1.3s moldyn 2.9K 164K 44 7.1s 2.4s 17.4s montecarlo 7.2M 23.4s 16.2s 5.7s raytracer 16K 14.7s derby 302K 1.3M 1112 23 - 0.6 7s 31.2s TO eclipse 560K 87M 8263 66 64 0.4 6m51s 4m18s 26.2s 15m10s ftpserver 32K 49K 304 2.2 2.1s 3.8s 3m jigsaw 101K 3M 280 18s 11.8s 2.8s lusearch 410K 216M 118 160 10m13s 6m48s 57.3s 46.7s xalan 180K 122M 2494 18 0.1 7m22s 4m46s 43.1s 7m11s

33 Experimental Evaluation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Program LOC #Events #Thrds #Locks #Races WCP Queue Length (%) Time WCP HB RV Max w=1K, s=60s w=10K, s=240s account 87 130 0.2s 0.3s 1s airline 83 128 0.8s 2s array 36 47 4.3 1.1s boundedbuffer 334 333 bubblesort 274 4K 2.4 0.7s 0.5s 3.6s 7m3s bufwriter 199 11.7M 47s 22.4s 4.1s 4.5s critical 63 55 1.7s 0.9s mergesort 298 3K 1.3 0.4s 1.4s pingpong 124 146 1.2s 1.3s moldyn 2.9K 164K 44 7.1s 2.4s 17.4s montecarlo 7.2M 23.4s 16.2s 5.7s raytracer 16K 14.7s derby 302K 1.3M 1112 23 - 0.6 7s 31.2s TO eclipse 560K 87M 8263 66 64 0.4 6m51s 4m18s 26.2s 15m10s ftpserver 32K 49K 304 2.2 2.1s 3.8s 3m jigsaw 101K 3M 280 18s 11.8s 2.8s lusearch 410K 216M 118 160 10m13s 6m48s 57.3s 46.7s xalan 180K 122M 2494 18 0.1 7m22s 4m46s 43.1s 7m11s 6 WCP 4 2 8 3 7 44 5 23 66 36 14 160 18 Detects more races than other techniques

34 Experimental Evaluation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Program LOC #Events #Thrds #Locks #Races WCP Queue Length (%) Time WCP HB RV Max w=1K, s=60s w=10K, s=240s account 87 130 0.2s 0.3s 1s airline 83 128 0.8s 2s array 36 47 4.3 1.1s boundedbuffer 334 333 bubblesort 274 4K 2.4 0.7s 0.5s 3.6s 7m3s bufwriter 199 11.7M 47s 22.4s 4.1s 4.5s critical 63 55 1.7s 0.9s mergesort 298 3K 1.3 0.4s 1.4s pingpong 124 146 1.2s 1.3s moldyn 2.9K 164K 44 7.1s 2.4s 17.4s montecarlo 7.2M 23.4s 16.2s 5.7s raytracer 16K 14.7s derby 302K 1.3M 1112 23 - 0.6 7s 31.2s TO eclipse 560K 87M 8263 66 64 0.4 6m51s 4m18s 26.2s 15m10s ftpserver 32K 49K 304 2.2 2.1s 3.8s 3m jigsaw 101K 3M 280 18s 11.8s 2.8s lusearch 410K 216M 118 160 10m13s 6m48s 57.3s 46.7s xalan 180K 122M 2494 18 0.1 7m22s 4m46s 43.1s 7m11s 12 WCP 0.2s 0.3s 0.7s 47s 0.4s 0.5s 7.1s 23.4s 2.4s 16.2s 6m51s 5.7s 18s 10m13s 7m22s Fast

35 Experimental Evaluation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Program LOC #Events #Thrds #Locks #Races WCP Queue Length (%) Time WCP HB RV Max w=1K, s=60s w=10K, s=240s account 87 130 0.2s 0.3s 1s airline 83 128 0.8s 2s array 36 47 4.3 1.1s boundedbuffer 334 333 bubblesort 274 4K 2.4 0.7s 0.5s 3.6s 7m3s bufwriter 199 11.7M 47s 22.4s 4.1s 4.5s critical 63 55 1.7s 0.9s mergesort 298 3K 1.3 0.4s 1.4s pingpong 124 146 1.2s 1.3s moldyn 2.9K 164K 44 7.1s 2.4s 17.4s montecarlo 7.2M 23.4s 16.2s 5.7s raytracer 16K 14.7s derby 302K 1.3M 1112 23 - 0.6 7s 31.2s TO eclipse 560K 87M 8263 66 64 0.4 6m51s 4m18s 26.2s 15m10s ftpserver 32K 49K 304 2.2 2.1s 3.8s 3m jigsaw 101K 3M 280 18s 11.8s 2.8s lusearch 410K 216M 118 160 10m13s 6m48s 57.3s 46.7s xalan 180K 122M 2494 18 0.1 7m22s 4m46s 43.1s 7m11s 3 #Events 130 128 47 333 4K 11.7M 55 3K 146 164K 7.2M 16K 1.3M 87M 49K 3M 216M 122M Scales to large traces

36 Experimental Evaluation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Program LOC #Events #Thrds #Locks #Races WCP Queue Length (%) Time WCP HB RV Max w=1K, s=60s w=10K, s=240s account 87 130 0.2s 0.3s 1s airline 83 128 0.8s 2s array 36 47 4.3 1.1s boundedbuffer 334 333 bubblesort 274 4K 2.4 0.7s 0.5s 3.6s 7m3s bufwriter 199 11.7M 47s 22.4s 4.1s 4.5s critical 63 55 1.7s 0.9s mergesort 298 3K 1.3 0.4s 1.4s pingpong 124 146 1.2s 1.3s moldyn 2.9K 164K 44 7.1s 2.4s 17.4s montecarlo 7.2M 23.4s 16.2s 5.7s raytracer 16K 14.7s derby 302K 1.3M 1112 23 - 0.6 7s 31.2s TO eclipse 560K 87M 8263 66 64 0.4 6m51s 4m18s 26.2s 15m10s ftpserver 32K 49K 304 2.2 2.1s 3.8s 3m jigsaw 101K 3M 280 18s 11.8s 2.8s lusearch 410K 216M 118 160 10m13s 6m48s 57.3s 46.7s xalan 180K 122M 2494 18 0.1 7m22s 4m46s 43.1s 7m11s 11 WCP Queue Length (%) 4.3 2.4 10 1.3 0.6 0.4 2.2 0.1 Low memory overhead

37 Conclusions WCP – generalizes the CP relation Linear time algorithm
Detects more races in practice Scales to large traces Future Work Further weakening Control flow information Epoch based optimizations

38 Thank You !

39

40 WCP Deadlock Thread t1 Thread t2 Thread t3 1 acq(l) 2 acq(m) 3 rel(m)
4 r(z) 5 rel(l) 6 7 acq(n) 8 sync(x) 9 rel(n) 10 11 12 13 14 w(z) 15 16 sync(y) 17 18

41 WCP Deadlock Thread t1 Thread t2 Thread t3 1 acq(l) 2 acq(m) 3 rel(m)
4 r(z) 5 rel(l) 6 7 acq(n) 8 sync(x) 9 rel(n) 10 11 12 13 14 w(z) 15 16 sync(y) 17 18 acq(xlock) r(xvar) w(xvar) rel(xlock)

42 WCP Deadlock Thread t1 Thread t2 Thread t3 WCP WCP 1 acq(l) 2 acq(m) 3
rel(m) 4 r(z) 5 rel(l) 6 7 acq(n) 8 sync(x) 9 rel(n) 10 11 12 13 14 w(z) 15 16 sync(y) 17 18 WCP WCP

43 WCP race, but no predictable race
WCP Deadlock Thread t1 Thread t2 Thread t3 1 acq(l) 2 acq(m) 3 rel(m) 4 r(z) 5 rel(l) 6 7 acq(n) 8 sync(x) 9 rel(n) 10 11 12 13 14 w(z) 15 16 sync(y) 17 18 WCP race, but no predictable race WCP WCP

44 WCP Deadlock Predictable deadlock Thread t1 Thread t2 Thread t3 acq(l)
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 acq(l) acq(l) acq(m) rel(m) r(z) rel(l) acq(n) sync(x) rel(n) w(z) sync(y) acq(m) Predictable deadlock acq(m) acq(n) acq(n) acq(l)


Download ppt "Dynamic Race Prediction in Linear Time"

Similar presentations


Ads by Google