Aditya Thakur Rathijit Sen Ben Liblit Shan Lu University of Wisconsin–Madison Workshop on Dynamic Analysis 2009 Cooperative Crug Isolation
Cooperative Crug Isolation read(x) write(x) Thread 1 Thread 2 Race ! read(x) write(x) Thread 1 write(x) Thread 2 Atomicity violation! (concurrency bug)
Cooperative Crug Isolation threaded.exe file.in threaded.exe file.in developer user Non-determinism! More cores More threads More crugs
Cooperative Crug Isolation
unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 Global variables are shown in bold. Simplified crug from PBZIP2
Cooperative Crug Isolation Global variables are shown in bold. Identify root cause of crug unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2
Cooperative Crug Isolation Not scalable, High overhead Report benign crugs Target specific type of crugs and synchronization Current techniques
Cooperative Crug Isolation Scalable, Low overhead Does not report benign crugs Multiple types of crugs and synchronization
Shipping Application Cooperative Crug Isolation Bug Isolation Program Source Compiler Sampler Predicates Counts & / Statistical Debugging Top bugs with likely causes
Cooperative Crug Isolation Bug Isolation unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 CBI predicates inadequate for crug isolation. Values of predicates same for successful and failing runs.
Cooperative Crug Isolation Bug Isolation unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 CBI sampling inadequate for crug isolation. Sampling thread-local, independent.
Cooperative Crug Isolation Bug Isolation CBI was unable to diagnose crugs in any of the benchmarks used. No bug predictors reported!
Cooperative Crug Isolation CCI extends the CBI framework to target crugs New predicate capturing interleaving events New cross-thread sampling scheme
Cooperative Crug Isolation Predicate Design unlock(mut); S: lock(mut); Thread 1 mut = NULL; Thread 2 remote S is true local S is true
Predicate Instrumentation At runtime, maintain hashtable which maps addresses to thread id which last accessed it AddressThread Id 0xb1ab1a1 0xf00f002 0xb1af001
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock);
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); curTid is thread id of currently executing thread
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Check if curTid was the thread which previously accessed x
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Set differs to true if it was not
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Update the hashtable
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Increment counter for predicate at S
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Execute block atomically
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); Handles accesses through pointers. No need for static pointer analysis.
Predicate Instrumentation access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); curTid is thread id of currently executing thread Check if curTid was the thread which previously accessed x Set differs to true if it was not Increment counter for predicate at S Execute block atomically Update the hashtable
Sampling Mechanism access(x); record(S, differs); differs = test_and_insert(&x, curTid); lock(glock); unlock(glock); If(gsample == 0) access(x); gsample = curTid; insert(&x, curTid); else if(gsample == curTid) gsample = 0; clear(); Is sampling on? Turn on sampling Update hashtable Stop sampling, clear hashtable Did current thread initiate sampling Sampling not on Sampling already on
Sampling Mechanism lock(mut); Thread 1 AddressThread Id Hashtable gsample = 0
Sampling Mechanism lock(mut); Thread 1Thread 2 AddressThread Id &mut1 Hashtable gsample = 1
Sampling Mechanism lock(mut); Thread 1 mut = NULL; Thread 2 AddressThread Id &mut2 Hashtable gsample = 1
Sampling Mechanism unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 S: AddressThread Id &x2 Hashtable gsample = 1 Record remote S is true
Sampling Mechanism unlock(mut); lock(mut); Thread 1 mut = NULL; Thread 2 S: AddressThread Id Hashtable gsample = 0 Stop sampling
Experimental Evaluation Benchmarks used Apache HTTP server, PBZIP2 SPLASH-2: FFT, LU Machine used dual-core Intel P4 Questions answered Runtime overhead Accuracy of predictors
Runtime Overhead BenchmarkNo samplingSampling Apache25%2% PBZIP2200%7% FFT650%25% LU1,300%800% Overhead compared to uninstrumented code Low overheads for both real-world applications Large difference between no sampling and sampling.
Predictor Accuracy PredictorFunction R: buf->outcnt += len ap_buffered_log_writer() Apache PredictorFunction R : pthread_mutex_unlock(fifo->mut); consumer_decompress() PBZIP2 remote predicate
Predictor Accuracy PredictorFunction R: G lobal->finishtime=finish SlaveStart() R: G lobal->initdonetime=initdone SlaveStart() R: printf(“..”,Global->transtime[0]…) main() L: malloc(2*(rootN-1)*sizeof(double)); SlaveStart() FFT PredictorFunction R: G lobal->rf=rf OneSolve() L: (Global->start).gsense=-lsense; OneSolve() LU local predicate
Conclusion CCI is a low-overhead, scalable approach for root cause analysis of crugs Effective on two widely-deployed applications Simple predicates are effective because of the use of statistical models
Next time on What other events are useful for crug isolation? Scope for static analysis to help? Other cross-thread sampling mechanisms (e.g. bursty sampling)? Crug isolation to crug tolerance? Thank you!