Download presentation
Presentation is loading. Please wait.
Published byAngel Cook Modified over 9 years ago
1
Michael Bond Katherine Coons Kathryn McKinley University of Texas at Austin
2
Detecting data races in production
3
Overhead FastTrack [Flanagan & Freund ’09] 80x 8x
4
Overhead FastTrack [Flanagan & Freund ’09] c reads&writes + c sync n Number of threads
5
Overhead FastTrack [Flanagan & Freund ’09] c reads&writes + c sync n Problem in future Problem today
6
Overhead FastTrack [Flanagan & Freund ’09] c reads&writes + c sync n Pacer(c reads&writes + c sync n) r + c non-sampling (1 – r) Sampling rate
7
Overhead FastTrack [Flanagan & Freund ’09] c reads&writes + c sync n Pacer(c reads&writes + c sync n) r + c non-sampling (1 – r) Sampling periods Non-sampling periods
8
Overhead FastTrack [Flanagan & Freund ’09] c reads&writes + c sync n Pacer(c reads&writes + c sync n) r + c non-sampling (1 – r) Probability (detecting any race) FastTrack1 Pacerr
9
Detect race first access sampled
10
Sampling period Thread AThread B Non-sampling period Sampling period Non-sampling period
11
Thread AThread B write x read x read y write y
12
Insight #1: Stop tracking variable after non-sampled access
13
Thread A write x unlock m Thread B
14
Thread A write x unlock m Thread B lock m
15
Thread A write x unlock m Thread B lock m write x
16
Thread A write x unlock m read x Thread B lock m write x
17
Thread A write x unlock m read x Thread B lock m write x Race!
18
Thread A write x unlock m read x Thread B lock m write x Race!
19
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB Vector clocks
20
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB Vector clocks
21
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB Vector clocks
22
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB
23
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5@A
24
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 2 2 5@A ABAB
25
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 6 6 2 2 5 5 2 2 5@A Increment clock Increment clock ABAB
26
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 6 6 2 2 5 5 4 4 5 5 2 2 Join clocks Join clocks 5@A ABAB
27
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5 5 2 2 5@A 6 6 2 2 Happens before? ABAB
28
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5@A 5 5 2 2 6 6 2 2 ABAB
29
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5@A 5 5 2 2 6 6 2 2 ABAB
30
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5@A 5 5 2 2 6 6 2 2 No work performed ABAB
31
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5@A 5 5 2 2 6 6 2 2 Race uncaught ABAB
32
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5 5 2 2 6 6 2 2 4@B ABAB
33
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 5 5 4 4 5 5 2 2 6 6 2 2 4@B Happens before? Race! ABAB
34
Insight #2: We only care whether “A happens before B” if A is sampled
35
Thread AThread B Do these events happen before other events? We don’t care!
36
Increment clocks Thread AThread B Don’t increment clocks Increment clocks Don’t increment clocks Do these events happen before other events? We don’t care!
37
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 ABAB
38
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 5 5 4 4 5 5 2 2 No clock increment ABAB
39
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 5 5 4 4 5 5 2 2 5 5 2 2 ABAB
40
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 5 5 4 4 5 5 2 2 5 5 2 2 Unnecessary join ABAB
41
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 5 5 4 4 5 5 2 2 5 5 2 2 O(n) O(1) ABAB
42
http://jikesrvm.org/Research+Archive
43
1
44
Qualitative improvement in time & space
45
Probability (detecting any race) = r ?
47
LiteRace [Marino et al. ’09] Cold-region hypothesis [Chilimbi & Hauswirth ’04] Full analysis at synchronization operations
48
Accuracy, time, space sampling rate Detect race first access sampled
49
Accuracy, time, space sampling rate Detect race first access sampled Qualitative improvement
50
Accuracy, time, space sampling rate Detect race first access sampled Qualitative improvement Help developers fix difficult-to-reproduce bugs
51
Accuracy, time, space sampling rate Detect race first access sampled Qualitative improvement Help developers fix difficult-to-reproduce bugs Thank you
53
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 ABAB 5 5 4 4 v6 Vector clock versions
54
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 5 5 4 4 ABAB v6 5 5 2 2
55
Thread A unlock m1 … unlock m2 Thread B lock m1 … lock m2 5 5 2 2 3 3 4 4 ABAB v6 5 5 2 2 5 5 2 2 Join unnecessary 5 5 4 4 v6
60
Qualitative improvement
61
Core 2 Quad (4 cores) Multithreaded benchmarks (DaCapo & SPECjbb2000) Evaluating sampling-based race detection Need 100s of trials to evaluate Some races are rare Evaluate only frequent races
62
Two accesses to same variable (one is a write) One access doesn’t happen before the other Program order Synchronization order ▪ Acquire-release ▪ Wait-notify ▪ Fork-join ▪ Volatile read-write
63
Thread A write x unlock m Thread B Two accesses to same variable (one is a write) One access doesn’t happen before the other Program order Synchronization order ▪ Acquire-release ▪ Wait-notify ▪ Fork-join ▪ Volatile read-write
64
Thread A write x unlock m Thread B lock m write x Two accesses to same variable (one is a write) One access doesn’t happen before the other Program order Synchronization order ▪ Acquire-release ▪ Wait-notify ▪ Fork-join ▪ Volatile read-write
65
Thread A write x unlock m read x Thread B lock m write x Two accesses to same variable (one is a write) One access doesn’t happen before the other Program order Synchronization order ▪ Acquire-release ▪ Wait-notify ▪ Fork-join ▪ Volatile read-write
66
Thread A write x unlock m read x Thread B lock m write x Race! Two accesses to same variable (one is a write) One access doesn’t happen before the other Program order Synchronization order ▪ Acquire-release ▪ Wait-notify ▪ Fork-join ▪ Volatile read-write
67
Races indicate Atomicity violations Order violations
68
Races indicate Atomicity violations Order violations Races lead to Sequential consistency violations No races sequential consistency (Java/C++) Races writes observed out of order
69
Races indicate Atomicity violations Order violations Races lead to Sequential consistency violations No races sequential consistency (Java/C++) Races writes observed out of order Most races potentially harmful [Flanagan & Freund ’10]
70
class ProducerConsumer { boolean ready; int x; produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; }
71
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; }
72
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; }
73
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; }
74
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; } Can read old value
75
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { … = x; while (!ready) { } } Legal reordering by compiler or hardware
76
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; }
77
class ProducerConsumer { volatile boolean ready; int x; T1 T2 produce() { x = … ; ready = true; } consume() { while (!ready) { } … = x; } Happens- before edge
78
class LibraryBook { Set borrowers; }
79
class LibraryBook { Set borrowers; addBorrower(Person p) { if (borrowers == null) { borrowers = new HashSet (); } borrowers.add(p); }
80
class LibraryBook { Set borrowers; addBorrower(Person p) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet (); } borrowers.add(p); }
81
class LibraryBook { Set borrowers; addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet (); } borrowers.add(p); }
82
class LibraryBook { Set borrowers; addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet (); } borrowers.add(p); }
83
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet(); }... borrowers.add(p); } addBorrower(Person p) { if (borrowers == null) {... } borrowers.add(p); }
84
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; obj. (); borrowers = obj; }... borrowers.add(p); } addBorrower(Person p) { if (borrowers == null) {... } borrowers.add(p); }
85
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj; obj. (); }... borrowers.add(p); } addBorrower(Person p) { if (borrowers == null) {... } borrowers.add(p); }
86
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj; obj. (); }}}... borrowers.add(p); } addBorrower(Person p) { if (borrowers == null) {... } borrowers.add(p); }
88
33% base overhead ~50% overhead
89
Program alone FastTrackPacer Detection rate 0occurrence rateoccurrence rate × r Running time tt(c 1 + c 2 n)t[(c 1 + c 2 n)r + c 3 ] Evaluate only frequent races Evaluate scaling with r Don’t evaluate scaling with n
91
50 million people
92
Energy Management System Alarm and Event Processing Routine (1 MLOC) http://www.securityfocus.com/news/8412
93
Energy Management System Alarm and Event Processing Routine (1 MLOC) Post-mortem analysis: 8 weeks "This fault was so deeply embedded, it took them weeks of poring through millions of lines of code and data to find it.” –Ralph DiNicola, FirstEnergy http://www.securityfocus.com/news/8412
94
Race condition Two threads writing to data structure simultaneously Usually occurs without error Small window for causing data corruption http://www.securityfocus.com/news/8412
95
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)
96
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads) FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown
97
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads) FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown Problem today Problem in future
98
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads) FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown
99
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks
100
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks Thread A’s logical time Thread B’s logical time
101
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks Last logical time “received” from B Last logical time “received” from A
102
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 Increment clock
103
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 5 5 4 4 5 5 2 2 Join clocks Join clocks
104
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 5 5 4 4 n = # of threads O(n) time 5 5 2 2
105
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB
106
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5@A
107
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5@A
108
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5@A 5 5 2 2
109
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5@A 5 5 2 2
110
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5 5 4 4 5@A 5 5 2 2
111
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 5@A 6 6 2 2 Happens before? 5 5 2 2
112
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 4@B 5 5 2 2
113
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 Happens before? 4@B 5 5 2 2
114
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 Happens before? 4@B Race! 5 5 2 2
115
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Sampling rate
116
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Running timet(c 1 + c 2 n) No. of threads
117
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Running timet(c 1 + c 2 n) Reads & writesSynchronization
118
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Running timet(c 1 + c 2 n) Reads & writes Problem today Problem in future Synchronization
119
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Running timet(c 1 + c 2 n)t[(c 1 + c 2 n)r + c 3 ] Overhead in sampling periods
120
FastTrack [Flanagan & Freund ’09] Pacer Detection rateoccurrence rateoccurrence rate × r Running timet(c 1 + c 2 n)t[(c 1 + c 2 n)r + c 3 ] Overhead in sampling periods Overhead in non-sampling periods (small)
123
Data race occurs extremely rarely Data race occurs periodically Pre-deploymentDeployed
124
“We test exhaustively … we had in excess of three million online operational hours [342 years] in which nothing had ever exercised that bug.” –Mike Unum, manager of commercial solutions, GE Energy http://www.securityfocus.com/news/8412
125
Data race buggy execution
126
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks
127
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks Thread A’s logical time Thread B’s logical time
128
Thread AThread B 5 5 2 2 3 3 4 4 ABAB Vector clocks Last logical time “received” from B Last logical time “received” from A
129
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 Increment clock
130
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 5 5 4 4 5 5 2 2 Join clocks Join clocks
131
5 5 2 2 3 3 4 4 ABAB Thread A unlock m Thread B lock m 6 6 2 2 5 5 4 4 n = # of threads O(n) time 5 5 2 2
132
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB
133
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5@A
134
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5@A
135
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5@A 5 5 2 2
136
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5@A 5 5 2 2
137
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 6 6 2 2 5 5 4 4 5@A 5 5 2 2
138
Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 5@A 6 6 2 2 Happens before? 5 5 2 2
139
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 4@B 5 5 2 2
140
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 Happens before? 4@B 5 5 2 2
141
5@A Thread A write x unlock m read x Thread B lock m write x 5 5 2 2 3 3 4 4 ABAB 5 5 4 4 6 6 2 2 Happens before? 4@B Race! 5 5 2 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.