End-User Shape Analysis National Taiwan University – August 11, 2009 Xavier Rival INRIA/ENS Paris Bor-Yuh Evan Chang 張博聿 U of Colorado, Boulder If some of the symbols are garbled, try either installing TexPoint ( or the TeX fonts ( George C. Necula U of California, Berkeley
Programming Languages Research at the University of Colorado, Boulder
3 Software errors cost a lot $60 billion ~$60 billion annually (~0.5% of US GDP) –2002 National Institute of Standards and Technology report total annual revenue of> 10x annual budget of> Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
4 But there’s hope in program analysis Microsoft Microsoft uses and distributes Static Driver Verifier the Static Driver Verifier Airbus Airbus applies Astrée Static Analyzer the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
5 Because program analysis can eliminate entire classes of bugs For example, –Reading from a closed file: –Reacquiring a locked lock: How? –Systematically examine the program –Simulate running program on “all inputs” –“Automated code review” read( ); acquire( ); Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
6 …code … // x now points to an unlocked lock acquire(x); … code … analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Simulate running program on “all inputs” x acquire(); acquire(x); … code …
7 in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Simulate running program on “all inputs” x xx or …
8 …code … in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state analysis state Must abstract Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis x xx or … x abstract For decidability, must abstract—“model all inputs” (e.g., merge objects) not precise Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) mislabels good code as buggy
9 To address the precision challenge Traditional Traditional program analysis mentality: specifications for our analysis “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” default abstractions “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” End-user approach End-user approach: adapt the analysis “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
10 Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) End-user approach End-user approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
11 Overview of contributions Extensible Inductive Shape Analysis (Xisa) Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Extensible Inductive Shape Analysis Precise Precise inference of data structure properties End-user End-user approach …
13 Shape analysis is a fundamental analysis Data structures are at the core of – Traditional languages (C, C++, Java) – Emerging web scripting languages Improves verifiers that try to – Eliminate resource usage bugs (locks, file handles) – Eliminate memory errors (leaks, dangling pointers) – Eliminate concurrency errors (data races) – Validate developer assertions Enables program transformations – Compile-time garbage collection – Data structure refactorings … Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
14 Shape analysis by example: Removing duplicates // l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; Example/Testing Code Review/Static Analysis “no duplicates” l “sorted dl list” l program-specific l 2244 l 244 cur l 24 “sorted dl list” l “segment with no duplicates” cur intermediate state more complicated Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
15 Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Parametric in high-level, developer-oriented predicates + +Extensible + +Targeted at developers Xisa Built-in high-level predicates - -Harder to extend + +No additional user effort (if precise enough) Parametric in low-level, analyzer-oriented predicates + +Very general and expressive - -Harder for non-expert 89 Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Some representative approaches Some representative approaches: End-user approach End-user approach: Space Invader [Distefano et al.] TVLA [Sagiv et al.]
16 Our approach: Executable specifications checking code Utilize “run-time checking code” as specification for static analysis. assert(sorted_dll(l,…)); for each nodecurinlistl { removecurif duplicate; } assert(sorted_dll_nodup(l,…)); ll cur l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis h.dll(p) = if (h = null) then true else h ! prev = p and h ! next.dll(h) checker Contribution: Automatically generalize checkers for complicated intermediate states Contribution: Build the abstraction for analysis out of developer-specified checking code Contribution: Build the abstraction for analysis out of developer-specified checking code p specifies where prev should point
17 Xisa is … Extensible and targeted for developers –Parametric in developer-supplied checkers—viewed as inductive definitions in separation logic Precise yet compact abstraction for efficiency –Data structure-specific based on properties of interest to the developer shape analysis invariant checkers An automated shape analysis with a precise memory abstraction based around invariant checkers. Xisa h.dll(p) = if (h = null) then true else h ! prev = prev and h ! next.dll(h) checkers Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
18 Splitting Splitting of summaries To reflect updates precisely summarizing And summarizing for termination Shape analysis is an abstract interpretation on abstract memory descriptions with … cur l l l l l l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
19 Roadmap: Components of Xisa Xisa shape analyzer abstract interpretation splitting and interpreting update summarizing level-type inference on checker definitions h.dll(p) = if (h = null) then true else h ! prev = prev and h ! next.dll(h) checkers Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Learn information about the checker to use it as an abstraction Compare and contrast manual code review and our automated shape analysis
20 Overview: Split summaries to interpret updates precisely l cur l Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. l cur split at cur update cur purple to red l cur Challenge: How does the analysis “split” summaries and know where to “split”? Challenge: How does the analysis “split” summaries and know where to “split”?
21 “Split forward” by unfolding inductive definition Ç h.dll(p) = if(h = null) then true else h ! prev = p and h ! next.dll(h) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis l cur get: cur ! next l cur null p dll(cur, p) l cur p dll(n, cur) n Analysis doesn’t forget the empty case
22 “Split backward” also possible and necessary h.dll(p) = if (h = null) then true else h ! prev = p and h ! next.dll(h) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis l cur p dll(n, cur) n for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; “dll segment” l cur p0p0 dll(n, cur) n “dll segment” cur ! prev ! next = cur ! next; l cur dll(n, cur) n null get: cur ! prev ! next Ç Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding?
23 Roadmap: Components of Xisa Xisa shape analyzer abstract interpretation splitting and interpreting update summarizing level-type inference on checker definitions Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis Contribution: Turns testing code into specification for static analysis How do we decide where to unfold? Derives additional information to guide unfolding h.dll(p) = if (h = null) then true else h ! prev = prev and h ! next.dll(h) checkers … to be discussed this afternoon
24 Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter type analysis Useful for guiding unfolding in difficult cases, for example, “back pointer” traversals Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
25 Results: Performance Benchmark Max. Num. Graphs at a Program Pt ms Analysis Time (ms) singly-linked list reverse11.0 doubly-linked list reverse11.5 doubly-linked list copy25.4 doubly-linked list remove517.9 doubly-linked list remove and back518.1 search tree with parent insert316.6 search tree with parent insertand back564.7 two-level skip list rebalance111.7 Linux scull driver (894 loc) (char arrays ignored, functions inlined) Times negligible for data structure operations (often in sec or 1 / 10 sec) Expressiveness Expressiveness: Different data structures Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis TVLA: 850 ms TVLA: 290 ms Space Invader only analyzes lists (built-in) Space Invader only analyzes lists (built-in)
26 Demo: Doubly-linked list reversal Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
27 Experience with the tool Checkers are easy to write Checkers are easy to write and try out – Enlightening (e.g., red-black tree checker in 6 lines) – Harder to “reverse engineer” for someone else’s code – Default checkers based on types useful Future expressiveness and usability improvements –Pointer arithmetic and arrays –Pointer arithmetic and arrays (in progress) – More generic checkers: polymorphic“element kind unspecified” higher-orderparameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
28 Near-term future work: Exploiting common specification framework Scenario Scenario:Code instrumented with lots of checker calls (perhaps automatically with object invariants) assert( mychecker(x) ); // … operation on x … assert( mychecker(x) ); parts Can we prove parts statically? Static Analysis View:Hybrid checking Testing View:Incrementalize invariant checking Example Example: Insert in a sorted list l v wu Preservation of sortedness shown statically Emit run-time check for new element: u · v · w Very slow to execute Hard to prove statically (in general) Very slow to execute Hard to prove statically (in general) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
29 Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis:Focused on what’s important to the developer Practical precise tools for better software with an end-user approach! Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder - End-User Shape Analysis
Programming Languages Research at the University of Colorado, Boulder
31 Who we are Faculty Ph.D. Students Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder Amer DiwanJeremy SiekSriram SankaranarayananBor-Yuh Evan Chang
32 Outline Gradual Programming –A new collaborative project involving Amer Diwan, Jeremy Siek, and myself Brief Sketches of Other Activities Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
Gradual Programming: Bridging the Semantic Gap
34 Have you noticed a time where your program is not optimized where you expect? Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder “I need a map data structure” Load class file Run class initialization Create hashtable Load class file Run class initialization Create hashtable Problem Problem: Tools (IDEs, checkers, optimizers) have no knowledge of what the programmer cares about … hampering programmer productivity, software reliability, and execution efficiency semantic gap Observation Observation: A disconnect between programmer intent and program meaning
35 Example: Iteration Order class OpenArray extends Object { private Double data[]; public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) { if (data[i].equals(lookFor)) return true; } return false; } } Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder parallel Compiler cannot choose a different iteration order (e.g., parallel) Must specify an iteration order Must specify an iteration order even when it should not matter
36 Wild and Crazy Idea: Use Non-Determinism Programmer starts with a potentially non-deterministic program Analysis identifies instances of “under- determinedness” Programmer eliminates “under- determinedness” Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder class OpenArray extends Object { private Double data[]; public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) { if (data[i].equals(lookFor)) return true; } return false; } class OpenArray extends Object { private Double data[]; public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) { if (data[i].equals(lookFor)) return true; } return false; } class OpenArray extends Object { private Double data[]; public boolean contains(Object lookFor) { i 0.. data.length-1 { if (data[i].equals(lookFor)) return true; } return false; } class OpenArray extends Object { private Double data[]; public boolean contains(Object lookFor) { i 0.. data.length-1 { if (data[i].equals(lookFor)) return true; } return false; } “over-determined” “under-determined” just right starting point Question Question: What does this mean? Is it “under-determined”? Response Response: Depends, is the iteration order important? Question Question: What does this mean? Is it “under-determined”? Response Response: Depends, is the iteration order important?
37 Let’s try a few program variants Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder public boolean contains(Object lookFor) { for (i = data.length-1; i >= 0; i--) { if(data[i].equals(lookFor)) return true; } return false; } public boolean contains(Object lookFor) { for (i = data.length-1; i >= 0; i--) { if(data[i].equals(lookFor)) return true; } return false; } public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) { if(data[i].equals(lookFor)) return true; } return false; } public boolean contains(Object lookFor) { for (i = 0; i < data.length; i++) { if(data[i].equals(lookFor)) return true; } return false; } public boolean contains(Object lookFor) { parallel_for (0, data.length-1) i => { if(data[i].equals(lookFor)) return true; } return false; } public boolean contains(Object lookFor) { parallel_for (0, data.length-1) i => { if(data[i].equals(lookFor)) return true; } return false; } Do they compute the same result? Approach Approach: Try to verify equivalence of program variants up to a specification Yes Pick any one No Ask user Yes Pick any one No Ask user What about here?
38 Surprisingly, analysis says no. Why? Exceptions! Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder Need user interaction to refine specification that captures programmer intent null a.data= a.contains( ) left-to-right iteration returns true right-to-left iteration throws NullPointerException
39 Proposal Summary “Fix semantics per program”: Abstract constructs with many possible concrete implementations Apply program analysis to find inconsistent implementations Interact with the user to refine the specification Language designer role can enumerate the possible implementations Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
40 Bridging the Semantic Gap Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder “I need a map data structure” “Looks like iterator order matters for your program” “Yes, I need iteration in sorted order” “Let’s use a balanced binary tree (TreeMap)”
Other Activities
42 Formal Methods Sriram Sankaranarayanan Prof. Sriram Sankaranarayanan (CS) Cyber-physical systems verification –hybrid automata theory, control systems verification, analysis of Simulink and Stateflow diagrams –advanced mathematical techniques: convex optimization: linear and semi-definite differential equations: set-valued analysis SMT solvers over non-linear theories –applications to automotive software (with NEC labs and GM labs) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder Aaron Bradley Prof. Aaron Bradley (ECEE) Decision procedures, Model checking Fabio Somenzi Prof. Fabio Somenzi (ECEE)
43 Programming Languages and Analysis Amer Diwan Prof. Amer Diwan (CS) Performance analysis of computer systems How do we know that we have not perturbed our data? Using machine learning and statistical techniques to reason about data Tool-assisted program transformations Algorithmic optimizations for performance Program metamorphosis for improving code quality Jeremy Siek Prof. Jeremy Siek (ECEE/CS) Gradual type checking: static (Java) dynamic (Python) Meta-programming: programs that write programs Compilers for optimizing scientific codes Bor-Yuh Evan Chang Prof. Bor-Yuh Evan Chang (CS) End-user program analysis Precise analysis (shape, collections) Interactive analysis refinement (type checking + symbolic evaluation) Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder
44 Applying to Colorado Computer Science Department information Deadlines Graduate Advisor: Nicholas Vocatura Talk to me about application fee waiver Bor-Yuh Evan Chang 張博聿, University of Colorado at Boulder Dec 1 for Fall (Sep 1 for Spring)