Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared in ASPLOS’08 Presented by Michelle Goodstein LBA Reading Group 3/27/08
Introduction Multi-core computers are common More programmers are having to write concurrent programs Concurrent programs have different bugs than sequential programs However, without a study, hard to know what those bugs are First real-world study of concurrency bugs
Introduction Knowing the types of concurrent bugs that actually occur in software will: ◦ Help create better bug detection schemes ◦ Inform the testing process software goes through ◦ Provide information to program language designers
Introduction Current state of affairs ◦ Repeating concurrent bugs is difficult ◦ Test cases are critical to being able to diagnose a bug ◦ Most detection research focuses: data races deadlock bugs some new work on detecting atomicity violations Few studies on real world concurrency bugs ◦ Most use programs that were buggy by design for the study Most studies on bug characteristics focus on non- concurrent bugs
Methodology 4 representative open-source applications: ◦ MySQL ◦ Apache ◦ Mozilla ◦ OpenOffice Each application has ◦ 9-13 years of development history ◦ 1-4 million lines of code
Methodology Randomly selected bugs from bug databases that contained at least one keyword related to concurrency (eg “race”, “concurrency”, “deadlock”, “synchronization”, etc.) From these, randomly choose 500 bugs that have ◦ Root causes explained well and in detail ◦ Source code available ◦ Bug fix info available
Methodology Remove any bugs not truly caused by concurrency Result: 105 concurrency bugs Separate study of deadlock and non- deadlock bugs
Methodology Evaluated bugs in 3 dimensions ◦ Bug pattern: {atomicity-violation, order- violation, other} ◦ Manifestation: required conditions for bug to occur, # threads involved, # variables, # accesses ◦ Bug fix strategy: Look at final patch, mistakes in intermediate patches, and whether TM can help Results organized as a collection of findings
Motivation 34/105 concurrency bugs cause program crashes 37/105 concurrency bugs cause programs to hang Concurrency bugs are important
Bug Patterns
Findings: Bug Patterns Atomicity Violation Order Violation
Findings: Bug Patterns Most (72/74) of the examined non- deadlock concurrency bugs are either atomicity-violations or order-violations Focusing on atomicity and order-violations should detect most non-deadlock concurrency bugs In fact, 24/74 are order violations Since current tools don’t address order- violation, new tools must be developed
Bug Manifestations
Findings: Bug Manifestations Most (101/105) bugs involved ≤ 2 threads Most communication among a small number of threads Enforcing certain partial orderings among a small number of threads can expose bugs Heavy workloads can increase competition for resources, and make it more likely to observe a partial ordering that causes a bug Pairwise Testing can find many bugs
Findings: Bug Manifestations Some (7/31) bugs experience deadlock bugs with only 1 thread! Easy to detect/avoid
Findings: Bug Manifestations Many (49/74) non-deadlock bugs involve 1 variable. However, 34% involve ≥ 2 variables Focusing on 1 variable is a good simplification However, new tools also necessary to discover multivariable concurrency bugs
Findings: Bug Manifestations Most (30/31 ) deadlock bugs involved ≤ 2 resources Pairwise testing of order among obtained and released resources should help reveal deadlocks
Findings: Bug Manifestations Most (92%) bugs manifested if enforced certain partial orderings among ≤ 4 memory accesses Testing small groups of accesses will be polynomial time and expose most bugs
Bug Fixes
Findings: Bug Fixes Adding/changing locks only helps minority (20/74) non-deadlock concurrency bug fixes Locks aren’t enough to fix all concurrency bugs. Locks don’t promise ordering, just atomicity Addition of locks can hurt performance or create new, deadlock bugs
Findings: Bug Fixes Most common fix (19/31) to deadlock bugs allows 1 thread to ignore acquiring a resource, like a lock This may get rid of deadlock bugs, but create other non-deadlock bugs Code may no longer be correct
Bug fixes: Buggy Patches 17/57 Mozilla bugs have ≥ 1 buggy patch On average, release.4 buggy patches for every final correct patch Of 23 distinct buggy patches for the 17 bugs: ◦ 6 decrease probability of occurrence but do not eliminate original bug ◦ 5create new concurrency bugs ◦ 12 create new non-concurrency bugs
Findings: Bug fixes In many (41/105) cases, TM can help avoid concurrency bugs
Findings: Bug fixes Also in many cases (44/105), TM might be able to help with concurrency bugs ◦ Need to allow long regions, rollback of I/O, strange “nature” of the code
Findings: Bug fixes In 20/105 cases, TM provides little help ◦ TM cannot help with many order-violation bugs While TM could be useful in preventing concurrency bugs, it will not fix all of them
Conclusion First real-world concurrent bug study Multiple findings on ◦ Type of concurrency bugs ◦ Conditions for manifestation ◦ Techniques for fixing concurrent bugs Several heuristics proposed for: ◦ Bug detection ◦ Testing ◦ Language Design (ie, TM) Future work can focus on detecting common types of errors ◦ Multi-variable bugs ◦ Order violation bugs ◦ Multiple-access bugs