Extensible Shape Analysis by Designing with the User in Mind Bor-Yuh Evan Chang Bor-Yuh Evan Chang, Xavier Rival, and George Necula University of California, Berkeley OSQ Retreat May 16, 2008
2 Motivation Analyses find many kinds of bugs For example, –Reading from a closed file: –Reacquiring a locked lock: But often struggle when objects are put into data structures read( ); acquire( ); Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind … Shape Analysis is Data Structure Analysis
3 … code … in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state analysis state What’s hard about data structures? Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind x xx or … x abstract For decidability, must abstract (e.g., merge objects) not precise Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) mislabels good code as buggy
4 To address the precision challenge Traditional Traditional program analysis mentality: specifications for our analysis “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” default abstractions “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” Our approach Our approach: adapt the analysis “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind
5 Overview of contributions Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance Turns testing code into a specification for static analysis Efficient Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind
6 Shape analysis by example: Removing duplicates // l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; Example/Testing Code Review/Static Analysis “no duplicates” l “sorted dl list” l program-specific l 2244 l 244 cur l 24 “sorted dl list” l “segment with no duplicates” cur intermediate state more complicated Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind
7 Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Parametric in high-level, developer-oriented predicates + +Extensible + +Targeted to developers Xisa Built-in high-level predicates - -Hard to extend + +No additional user effort (if precise enough) Parametric in low-level, analyzer-oriented predicates + +Very general and expressive - -Hard for non-expert 89 Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Traditional approaches Traditional approaches: Our approach Our approach: Space Invader [Distefano et al.] TVLA [Sagiv et al.]
8 Key insight for being developer-friendly and efficient checking code Utilize “run-time checking code” as specification for static analysis. assert(sorted_dll(l,…)); for each node cur in list l { remove cur if duplicate; } assert(sorted_dll_nodup(l,…)); ll cur l Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) checker Contribution: Automatically generalize checkers for complicated intermediate states Contribution: Build the abstraction for analysis out of developer-specified checking code Contribution: Build the abstraction for analysis out of developer-specified checking code p specifies where prev should point
9 Our framework is … Extensible and targeted for developers –Parametric in developer-supplied checkers Precise yet compact abstraction for efficiency –Data structure-specific based on properties of interest to the developer shape analysis invariant checkers An automated shape analysis with a precise memory abstraction based around invariant checkers. shape analyzer dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind
10 Splitting Splitting of summaries (materialization) To reflect updates precisely (strong updates) summarizing And summarizing for termination (widening) Shape analysis is an abstract interpretation on memory descriptions with … cur l l l l l l Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind
11 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Learn information about the checker to use it as an abstraction
12 Overview: Split summaries to interpret updates precisely l cur l Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. l cur split at cur update cur purple to red l cur Challenge: How does the analysis “split” summaries and know where to “split”? Challenge: How does the analysis “split” summaries and know where to “split”?
13 Split summaries by unfolding induction Ç dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind l cur get: cur ! next l cur null p dll(cur, p) l cur p dll(n, cur) n Analysis doesn’t forget the empty case “dll segment l to cur” Technical Details: What about unfolding segments? How are segments unfolded? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do unfolding? Technical Details: What about unfolding segments? How are segments unfolded? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do unfolding?
14 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Contribution: Turns testing code into specification for static analysis How do we decide where to unfold? Derives additional information to guide unfolding dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers
15 Types for deciding where to unfold Instance dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) If it exists, where is: cur ! next ? p ! next ? If it exists, where is: cur ! next ? p ! next ? Checker Definition Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Summary dll(l,null) dll(p,l) dll(cur,p) dll(n,cur) dll(null,n) Checker “Run” Checker “Run” (call tree/derivation) l p dll(cur, p)“dll segment l to cur” cur Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h
16 Types make the analysis robust with respect to how checkers are written Instance dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind Instance dll 0 (h) = if (h ! next = null) then true else h ! next ! prev = h and dll 0 (h ! next) Alternative doubly-linked list checker Doubly-linked list checker (as before) Different types for different unfolding cur
17 Summary of checker parameter types wherewhich Tell where to unfold for which fields robust Make analysis robust with respect to how checkers are written Learn where in summaries unfolding won’t help Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind inferred automatically Can be inferred automatically with a fixed- point computation on the checker definitions
18 Results: Performance Benchmark Max. Num. Graphs at a Program Pt ms Analysis Time (ms) singly-linked list reverse10.6 doubly-linked list reverse11.4 doubly-linked list copy25.3 doubly-linked list remove56.5 doubly-linked list remove and back56.8 search tree with parent insert58.3 search tree with parent insert and back547.0 two-level skip list rebalance687.0 Linux scull driver (894 loc) (char arrays ignored, functions inlined) Times negligible for data structure operations (often in sec or 1 / 10 sec) Expressiveness Expressiveness: Different data structures Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind TVLA: 850 ms TVLA: 290 ms Space Invader only analyzes lists (built-in) Space Invader only analyzes lists (built-in)
19 Conclusion Key Insight: Checkers as specifications Developer View:Global, Expressed in a familiar style Analysis View:Capture developer intent, Not arbitrary inductive definitions Constructing the program analysis Generalized segment Intermediate states: Generalized segment predicates types with levels Splitting: Checker parameter types with levels Bor-Yuh Evan Chang - Extensible Shape Analysis by Designing with the User in Mind “dll segment l to cur”
What can inductive shape analysis do for you?