Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.

Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies for 1

Cooperative Concurrency Bug Isolation They are synchronization mistakes in multi- threaded programs. Several types: – Atomicity violation – Data race – Deadlock, etc. read(x) write(x) thread 1thread 2  write(x) read(x) thread 1thread 2 ？？ 2

Concurrency bugs are common in the fields Developers are poor at parallel programming Interleaving testing is inefficient Applications with concurrency bugs shipped to the users 3

Concurrency bug lead to failures in the field Disasters in the past – Therac-25, Northeastern Blackout 2003 More threats in multi-core era 4

Failure diagnosis is critical 5

 Concurrency Bug Failure Example Concurrency Bug from Apache HTTP Server 6

… memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … thread 1 Concurrency Bug Failure Example Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); idx thread 2 … return SUCCESS; … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … temp = idx; idx = temp + strlen(s); … return SUCCESS; 7

… return SUCCESS; … memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … thread 1  Concurrency Bug Failure Example Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); idx thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); 8

The failure is non-deterministic and rare – Programmers have trouble to repeat the failure The root cause involves more than one thread Diagnosing Concurrency Bug Failure is Challenging 9

Existing work and their limitations Failure replay – High runtime overhead – Developers need to manually locate faults Run-time bug detection – (mostly) High runtime overhead – Not guided by the failure Many false positives How to achieve low-overhead & accurate failure diagnosis? 10

Predicates Our work: CCI Program Source Compiler Counts & /  Statistical Debugging Predictors Sampler 11 Goal: diagnosing production run concurrency bug failures Major components: – predicates instrumentor – sampler – statistical debugging True in most failure runs, false in most correct runs.

CCI Overview Three different types of predicates. Each predicate has its supporting sampling strategy. Same statistical debugging as in CBI. Experiments show CCI is effective in diagnosing concurrency failures. 12 FunRe Havoc Prev

Motivation CCI Overview CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy – CCI-FunRe and its sampling strategy Evaluation Conclusion Outline Motivation CCI Overview CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy – CCI-FunRe and its sampling strategy Evaluation Conclusion 13

CCI-Prev Intuition read(x) write(x)  thread 1thread 2 read(x) write(x) thread 1thread 2 read(x) write(x)  thread 1thread 2 read(x) write(x) thread 1thread 2 Atomicity Violation Data Race Just record which thread accessed last time. read(x) write(x) read(x) write(x) read(x) 14

CCI-Prev Predicate 15 It tracks whether two successive accesses to a shared memory location were by two distinct threads or were by the same thread.

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … thread 1 CCI-Prev Predicate on the Correct Run Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); … return SUCCESS; I I I I Predicate  … remote I 00 local I 00 … Predicate  … remote I 00 local I 10 … Predicate  … remote I 00 local I 20 … 16

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … return SUCCESS; … log_writer() { … } … thread 1  CCI-Prev Predicate on the Failure Run Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); I I I I Predicate  … remote I 00 local I 20 … Predicate  … remote I 00 local I 21 … Predicate  … remote I 01 local I 21 … Predicate  … remote I 01 local I 21 … 17 Predicate  … remote I 01 local I 21 …

… memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … thread 1  CCI-Prev Predicate Instrumentation Concurrency Bug from Apache HTTP Server temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … Predicate  … remote I 00 local I 21 … Predicate  … remote I 01 local I 21 … 18 I I unlock(glock); remote = test_and_insert(& idx, curTid); record(I, remote); lock(glock); a global hash table address ThreadID …… & idx2 …… address ThreadID …… & idx1 …… address ThreadID …… & idx1 ……

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … return SUCCESS; … log_writer() { … } … thread 1 CCI-Prev Sampling Strategy … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); Does traditional sampling work? NO. Thread-coordinated Bursty I I 19

Motivation CCI Overview CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy – CCI-FunRe and its sampling strategy Evaluation Conclusion Motivation CCI Overview CCI Predicates and Sampling Strategies – CCI-Prev and its sampling strategy – CCI-Havoc and its sampling strategy – CCI-FunRe and its sampling strategy Evaluation Conclusion Outline 20

… memcpy(&buf[idx], s, strlen(s)); CCI-Havoc Intuition Just record what value was observed during last access. … memcpy(&buf[idx], s, strlen(s)); … return SUCCESS; … log_writer() { … } … thread 1 … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); I I 21

CCI-Havoc Predicate 22 It tracks whether the value of a given shared location changes between two consecutive accesses by one thread. Only uses thread local information

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … log_writer() { … } … thread 1 CCI-Havoc Predicate on the Correct Run Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); … return SUCCESS; I I I I Predicate  … unchanged I 00 changed I 00 … Predicate  … unchanged I 10 changed I 00 … Predicate  … unchanged I 20 changed I 00 … 23

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); … return SUCCESS; … log_writer() { … } … thread 1  CCI-Havoc Predicate on the Failure Run Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); I I I I Predicate  … unchanged I 20 changed I 00 … Predicate  … unchanged I 21 changed I 00 … Predicate  … unchanged I 21 changed I 01 … 24 Predicate  … unchanged I 21 changed I 01 … Predicate  … unchanged I 21 changed I 01 …

… memcpy(&buf[idx], s, strlen(s)); … log_writer() { …}… …}… thread 1  CCI-Havoc Predicate Instrumentation Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; Predicate  … unchanged I 21 changed I 00 … Predicate  … unchanged I 21 changed I 01 … 25 … log_writer() { … } … I I insert (& idx, temp); changed = test(& idx, temp); record(I, changed); hash table for thread1 address value …… & idxidx …… address value …… & idxidx+len2 ……

… memcpy(&buf[idx], s, strlen(s)); … return SUCCESS; … log_writer() { … } … thread 1 CCI-Havoc Sampling Strategy … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); Bursty Thread-independent … memcpy(&buf[idx], s, strlen(s)); 26

CCI-FunRe Predicate 28 It tracks whether the execution of one function overlaps with the execution of the same function from a different thread.

CCI-FunRe Predicate Example thread 1thread 2  thread 1thread 2 … log_writer() { … return SUCCESS; } … log_writer() { … return SUCCESS; } … log_writer() { … return SUCCESS; } … log_writer() { … return SUCCESS; } … Predicate  … NonReent log_writer 21 Reent log_writer 01 … Predicate  … NonReent log_writer 21 Reent log_writer 01 … 29

… log_writer() { oldCount = atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … CCI-FunRe Predicate Instrumentation 30 thread 1thread 2 … log_writer() { oldCount = atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } …  Predicate  … NonReent log_writer 20 Reent log_writer 00 … FuncName Counter …… log_writer0 …… FuncName Counter …… log_writer1 …… Predicate  … NonReent log_writer 21 Reent log_writer 00 … FuncName Counter …… log_writer2 …… Predicate  … NonReent log_writer 21 Reent log_writer 01 … Predicate  … NonReent log_writer 21 Reent log_writer 01 … FuncName Counter …… log_writer0 ……

CCI-FunRe Sampling Strategy  thread 1thread 2 … log_writer() { … return SUCCESS; } … Function execution accounting is not suitable for sampling, so this part is unconditional. 31 … log_writer() { oldCount = atomic_inc(Count); record(“log_writer”, oldCount); … atomic_dec(Count); return SUCCESS; } … FuncName Counter …… log_writer0 …… FuncName Counter …… log_writer0 …… FuncName Counter …… log_writer0 ……

CCI-FunRe Sampling Strategy Function execution accounting: – unconditional FunRe predicate recording: – thread-independent – non-bursty 32

Experimental Evaluation Implementation – Static instrumentor based on the CBI framework Real world concurrency bug failure from: – Apache HTTP server, Cherokee – Mozilla-JS, PBZIP2 – SPLASH-2: FFT, LU Parameter used – Roughly 1/100 sampling rate 34

Failure Diagnosis Evaluation Methodology – Using concurrency bug failures occurred in real-world – Each app. runs 3000 times on a multi-core machine Add random sleep to get some failure runs – Sampling is enabled – Statistical debugging then return a list of predictors Which predictor in the list can diagnose failure? 35

Failure Diagnosis Results (with sampling) ProgramCCI-PrevCCI-HavocCCI-FunRe Apache-1 top1 Apache-2 top1  Cherokee  top2  FFT top1  LU top1  Mozilla-JS-1  top2 top1 Mozilla-JS-2 top1 Mozilla-JS-3 top2 top1 PBZIP2 top1  FunRe Havoc Prev Capability 36

Runtime Overhead PrevHavocFunRe No Sampling SamplingNo Sampling SamplingNo Sampling Sampling Apache-162.6%27.4%1.1% Apache-28.4%4.2%0.2% Cherokee19.1%2.1%0.3% FFT169 %33.5%72.8% LU57857 %1693 %1682 % Mozilla-JS11311 %7587 %123 % PBZIP20.2% 0.3% FunRe Havoc Prev Overhead 37 PrevHavocFunRe No Sampling SamplingNo Sampling SamplingNo Sampling Sampling Apache-162.6%1.9%27.4%2.8%1.1%1.8% Apache-28.4%0.5%4.2%0.4%0.2% Cherokee19.1%0.3%2.1%0.0%0.3%0.4% FFT169 %24.0%33.5%5.5%72.8%30.0% LU57857 %949 %1693 %8.9%1682 %926 % Mozilla-JS11311 %606 %7587 %356 %123 %97.0% PBZIP20.2% 0.3%0.2%

Conclusion CCI is capable and suitable to diagnose many production-run concurrency bug failures. Future predicates can leverage our effective sampling strategies. Experiments confirm design tradeoff. 38

Questions about ? CCI 39

Questions about ? CCI 40

… memcpy(&buf[idx], s, strlen(s)); … memcpy(&buf[idx], s, strlen(s)); CBI on Concurrency Bug Failures … return SUCCESS; … log_writer() { … } … thread 1  Concurrency Bug from Apache HTTP Server … temp = idx; idx = temp + strlen(s); thread 2 … return SUCCESS; … log_writer() { … } … temp = idx; idx = temp + strlen(s); CBI does not work! idx To diagnose production-run concurrency bug failures, interleaving related events should be tracked!!! 41

CCI-Prev Predicate Instrumentation with Sampling if (gsample) { } else { temp = cnt; lock(glock); changed = test_and_insert(& cnt, curTid); record(I, changed); temp = cnt; unlock(glock); [[ gsample = true; iset = curTid; lLength=gLength=0;]]? } } 42

CCI-Prev Predicate Instrumentation with Sampling if (gsample) { } else { temp = cnt; lock(glock); changed = test_and_insert(& cnt, curTid); record(I, changed); temp = cnt; [[ gsample = true; iset = curTid; lLength=gLength=0;]]? } } unlock(glock); lLength++; gLength++; if (( iset == curTid && lLength > lMAX) || gLength > gMAX) { clear (); iset = unusedTid; gsample = false; } if (( iset == curTid && lLength > lMAX) || gLength > gMAX) { clear (); iset = unusedTid; gsample = false; } record(stale ? P1 : P2, changed); changed = test_and_insert(& cnt, curTid, &stale); 43

CCI-Havoc Predicate Instrumentation with Sampling record(stale ? P1 : P2, changed); changed = test(& cnt, cnt, &stale); if (sample) { } else { temp = cnt; [[ sample = true; length=0;]]? } } insert (& cnt, cnt); if (length > lMAX) { clear (); sample = false; } if (length > lMAX) { clear (); sample = false; } length++; No global lock used!!! 44

Failure Diagnosis Results (with sampling) ProgramCBICCI-PrevCCI-HavocCCI-FunRe Apache-1  top1 Apache-2  top1  Cherokee  top2  FFT  top1  LU  top1  Mozilla-JS-1  top2 top1 Mozilla-JS-2  top1 Mozilla-JS-3  top2 top1 PBZIP2  top1  FunRe Havoc Prev Capability 45

Failure diagnosis is critical 46

Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.

Similar presentations

Presentation on theme: "Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies.

Similar presentations

Presentation on theme: "Cooperative Concurrency Bug Isolation Guoliang Jin, Aditya Thakur, Ben Liblit, Shan Lu University of Wisconsin–Madison Instrumentation and Sampling Strategies."— Presentation transcript:

Similar presentations

About project

Feedback