Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc
Motivation Today’s concurrent programs are rife with deadlocks 6,500/198,000 (~ 3%) of bug reports in Sun’s bug database at are deadlocks Deadlocks are difficult to detect Usually triggered non-deterministically, on specific thread schedules Fixing other concurrency bugs like races can introduce new deadlocks Our past experience with reporting races: developers often ask for deadlock checker
Motivation Most of previous deadlock detection work has focused on resource deadlocks Example // Thread 1 // Thread 2 sync(L1) { sync(L2) { sync(L2) { sync(L1) { …. …. } } L1L1 T1T1 L2L2 T2T2
Motivation Other kinds of deadlocks, e.g. communication deadlocks, are equally notorious Example // Thread 1 // Thread 2 if(!b) { b = true; sync(L) { sync(L) { L.wait(); L.notify(); } } } T2T1 if(!b) wait L b = true notify L b is initially false
Goal Build a dynamic analysis based tool that detects communication deadlocks scales to large programs has low false positive rate
Our Initial Effort Take cue from existing dynamic analyses for other concurrency errors Existing dynamic analyses check for the violation of an idiom Races ○ every shared variable is consistently protected by a lock Resource deadlocks ○ no cycle in lockgraph Atomicity violations ○ atomic blocks should have the pattern (R+B)*N(L+B)*
Our Initial Effort Which idiom should we check for communication deadlocks?
Our Initial Effort Recommended usage of condition variables // F1 // F2 sync (L) { sync (L) { while (!b) b = true; L.wait ();L.notifyAll (); assert (b == true); } }
Our Initial Effort Recommended usage pattern (or idiom) based checking does not work Example // Thread 1 // Thread 2 sync (L1) sync (L2) while (!b) L2.wait (); sync (L1) sync (L2) L2.notifyAll (); No violation of idiom, but still there is a deadlock!
Revisiting existing analyses Relax the dependencies between relevant events from different threads verify all possible event orderings for errors use data structures to check idioms (vector clocks, lock-graphs etc.) to implicitly verify all event orderings
Revisiting existing analyses Idiom based checking does not work for communication deadlocks But, we can still explicitly verify all orderings of relevant events for deadlocks
Trace Program // Thread 1 // Thread 2 if (!b) { b = true; sync (L) { L.wait (); L.notify (); } } } b is initially false lock L wait L unlock L lock L unlock L notify L T1T2
Trace Program lock L wait L unlock L lock L unlock L notify L T1T2 Thread t1 { lock L; wait L; unlock L; } Thread t2 { lock L; notify L; unlock L; }
Trace Program lock L wait L unlock L lock L unlock L notify L T1T2 Thread t1 { Thread t2 { lock L; wait L; || notify L; unlock L; }
Trace Program Built out of only a subset of events usually much smaller than original program Throws away a lot of dependencies between threads could give false positives but increases coverage
Trace Program : Add Dependencies // Thread 1 // Thread 2 if (!b) { b = true; sync (L) { L.wait (); L.notify (); } } } b is initially false lock L wait L unlock L lock L unlock L notify L T1T2 if (!b) b = true
lock L wait L unlock L lock L unlock L notify L T1T2 if (!b) b = true Thread t1 { if (!b) { lock L; wait L; unlock L; } Thread t2 { b = true; lock L; notify L; unlock L; } Trace Program : Add Dependencies
Trace Program : Add Power Use static analysis to add to the predictive power of the trace program // Thread 1 // Thread !b => L.wait() if (!b) { b = true; sync (L) { L.wait (); L.notify (); } } } b is initially false Thread t1 { if (!b) { lock L; wait L; unlock L; }
Trace Program : Other Errors Effective for concurrency errors that cannot be detected using an idiom communication deadlocks, deadlocks because of exceptions, … // Thread 1 // Thread 2 while (!b) { try{ sync (L) { foo(); L.wait (); b = true; } sync (L) { L.notify(); } } } catch (Exception e) {…} b is initially false can throw an exception
Implementation and Evaluation Implemented for deadlock detection both communication and resource deadlocks Built a prototype tool for Java called CHECKMATE Experimented with a number of Java libraries and applications log4j, pool, felix, lucene, jgroups, jruby.... Found both previously known and unknown deadlocks (17 in total)
Conclusion CHECKMATE is a novel dynamic analysis for finding deadlocks both resource and communication deadlocks Effective on a number of real-world Java benchmarks Trace program based approach is generic can be applied to other errors, e.g. deadlocks because of exceptions