End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator:

Slides:



Advertisements
Similar presentations
Chapter 22 Implementing lists: linked implementations.
Advertisements

1 Parametric Heap Usage Analysis for Functional Programs Leena Unnikrishnan Scott D. Stoller.
Using Checkers for End-User Shape Analysis National Taiwan University – August 11, 2009 Bor-Yuh Evan Chang 張博聿 University of Colorado, Boulder If some.
Pointers.
Shape Analysis with Structural Invariant Checkers Bor-Yuh Evan Chang Xavier Rival George C. Necula May 10, 2007 OSQ Retreat.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Extensible Shape Analysis by Designing with the User in Mind Bor-Yuh Evan Chang Bor-Yuh Evan Chang, Xavier Rival, and George Necula University of California,
Semantics Static semantics Dynamic semantics attribute grammars
Predicate Abstraction and Canonical Abstraction for Singly - linked Lists Roman Manevich Mooly Sagiv Tel Aviv University Eran Yahav G. Ramalingam IBM T.J.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Chapter 4: Trees Part II - AVL Tree
Program Representations. Representing programs Goals.
Background for “KISS: Keep It Simple and Sequential” cs264 Ras Bodik spring 2005.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
Relational Inductive Shape Analysis Bor-Yuh Evan Chang University of California, Berkeley Xavier Rival INRIA POPL 2008.
Lecture 8 CS203. Implementation of Data Structures 2 In the last couple of weeks, we have covered various data structures that are implemented in the.
Reduction in End-User Shape Analysis Dagstuhl - Typing, Analysis, and Verification of Heap-Manipulating Programs – July 24, 2009 Xavier Rival INRIA and.
End-User Shape Analysis National Taiwan University – August 11, 2009 Xavier Rival INRIA/ENS Paris Bor-Yuh Evan Chang 張博聿 U of Colorado, Boulder If some.
Specialized Reference Counting Garbage Collection using Data Structure Annotations By Eric Watkins and Dzin Avots for CS 343 Spring 2002.
Counterexample-Guided Focus TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Thomas Wies Institute of.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Program analysis Mooly Sagiv html://
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
Overview of Software Requirements
Describing Syntax and Semantics
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
Precise Program Analysis with Data Structures Collaborators: George Necula, Xavier Rival (INRIA) Bor-Yuh Evan Chang University of California, Berkeley.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
OOP Languages: Java vs C++
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 15: Linked data structures.
CSCA48 Course Summary.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
Automatic Verification of Pointer Programs using Grammar-based Shape Analysis Hongseok Yang Seoul National University (Joint Work with Oukseh Lee and Kwangkeun.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Checking Reachability using Matching Logic Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Semantics In Text: Chapter 3.
Materialization in Shape Analysis with Structural Invariant Checkers Bor-Yuh Evan Chang Xavier Rival George C. Necula University of California, Berkeley.
Object Oriented Software Development 4. C# data types, objects and references.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.
Adaptive Shape Analysis Thomas Wies joint work with Josh Berdine Cristiano Calcagno TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Chapter 17: Linked Lists. Objectives In this chapter, you will: – Learn about linked lists – Learn the basic properties of linked lists – Explore insertion.
1 Chapter 4 Unordered List. 2 Learning Objectives ● Describe the properties of an unordered list. ● Study sequential search and analyze its worst- case.
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Arrays, Link Lists, and Recursion Chapter 3. Sorting Arrays: Insertion Sort Insertion Sort: Insertion sort is an elementary sorting algorithm that sorts.
LINKED LISTS.
Spring 2016 Program Analysis and Verification
Seminar in automatic tools for analyzing programs with dynamic memory
Reduction in End-User Shape Analysis
CSE 303 Concepts and Tools for Software Development
Introduction to Data Structure
Presentation transcript:

End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator: Xavier Rival (INRIA)

2 Software errors cost a lot $60 billion ~$60 billion annually (~0.5% of US GDP) –2002 National Institute of Standards and Technology report total annual revenue of> 10x annual budget of> Bor-Yuh Evan Chang - End-User Program Analysis

3 But there’s hope in program analysis Microsoft Microsoft uses and distributes Static Driver Verifier the Static Driver Verifier Airbus Airbus applies Astrée Static Analyzer the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang - End-User Program Analysis

4 Because program analysis can eliminate entire classes of bugs For example, –Reading from a closed file: –Reacquiring a locked lock: How? –Systematically examine the program –Simulate running program on “all inputs” –“Automated code review” read( ); acquire( ); Bor-Yuh Evan Chang - End-User Program Analysis

5 … code … // x now points to an unlocked lock acquire(x); … code … analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang - End-User Program Analysis Simulate running program on “all inputs” x acquire(); acquire(x); … code …

6 in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang - End-User Program Analysis Simulate running program on “all inputs” x xx or …

7 … code … in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state analysis state Must abstract Bor-Yuh Evan Chang - End-User Program Analysis x xx or … x abstract For decidability, must abstract—“model all inputs” (e.g., merge objects) not precise Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) mislabels good code as buggy

8 To address the precision challenge Traditional Traditional program analysis mentality: specifications for our analysis “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” default abstractions “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” End-user approach End-user approach: adapt the analysis “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang - End-User Program Analysis

9 Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) End-user approach End-user approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang - End-User Program Analysis

10 Overview of contributions Extensible Inductive Shape Analysis [POPL’08,SAS’07] Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance  Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches  Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang - End-User Program Analysis

Extensible Inductive Shape Analysis Precise Precise inference of data structure properties End-user End-user approach [POPL’08, SAS’07] …

12 Shape analysis is a fundamental analysis Data structures are at the core of – Traditional languages (C, C++, Java) – Emerging web scripting languages Improves verifiers that try to – Eliminate resource usage bugs (locks, file handles) – Eliminate memory errors (leaks, dangling pointers) – Eliminate concurrency errors (data races) – Validate developer assertions Enables program transformations – Compile-time garbage collection – Data structure refactorings … Bor-Yuh Evan Chang - End-User Program Analysis

13 Shape analysis by example: Removing duplicates // l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; Example/Testing Code Review/Static Analysis “no duplicates” l “sorted dl list” l program-specific l 2244 l 244 cur l 24 “sorted dl list” l “segment with no duplicates” cur intermediate state more complicated Bor-Yuh Evan Chang - End-User Program Analysis

14 Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Parametric in high-level, developer-oriented predicates + +Extensible + +Targeted to developers Xisa Built-in high-level predicates - -Hard to extend + +No additional user effort (if precise enough) Parametric in low-level, analyzer-oriented predicates + +Very general and expressive - -Hard for non-expert 89 Bor-Yuh Evan Chang - End-User Program Analysis Traditional approaches Traditional approaches: End-user approach End-user approach: Space Invader [Distefano et al.] TVLA [Sagiv et al.]

15 Key insight for being developer-friendly and efficient checking code Utilize “run-time checking code” as specification for static analysis. assert(sorted_dll(l,…)); for each node cur in list l { remove cur if duplicate; } assert(sorted_dll_nodup(l,…)); ll cur l Bor-Yuh Evan Chang - End-User Program Analysis dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) checker Contribution: Automatically generalize checkers for complicated intermediate states Contribution: Build the abstraction for analysis out of developer-specified checking code Contribution: Build the abstraction for analysis out of developer-specified checking code p specifies where prev should point

16 Our framework is … Extensible and targeted for developers –Parametric in developer-supplied checkers Precise yet compact abstraction for efficiency –Data structure-specific based on properties of interest to the developer shape analysis invariant checkers An automated shape analysis with a precise memory abstraction based around invariant checkers. shape analyzer dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - End-User Program Analysis

17 Splitting Splitting of summaries To reflect updates precisely summarizing And summarizing for termination Shape analysis is an abstract interpretation on abstract memory descriptions with … cur l l l l l l Bor-Yuh Evan Chang - End-User Program Analysis

18 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - End-User Program Analysis Learn information about the checker to use it as an abstraction Compare and contrast manual code review and our automated shape analysis

19 Overview: Split summaries to interpret updates precisely l cur l Bor-Yuh Evan Chang - End-User Program Analysis Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. l cur split at cur update cur purple to red l cur Challenge: How does the analysis “split” summaries and know where to “split”? Challenge: How does the analysis “split” summaries and know where to “split”?

20 “Split forward” by unfolding inductive definition Ç dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l cur get: cur ! next l cur null p dll(cur, p) l cur p dll(n, cur) n Analysis doesn’t forget the empty case

21 “Split backward” also possible and necessary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l cur p dll(n, cur) n for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly- linked with no duplicates; “dll segment” l cur p0p0 dll(n, cur) n “dll segment” cur ! prev ! next = cur ! next; l cur dll(n, cur) n null get: cur ! prev ! next Ç Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding?

22 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis Contribution: Turns testing code into specification for static analysis How do we decide where to unfold? Derives additional information to guide unfolding dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

23 memory cell (points-to: ° ! next = ± ) Abstract memory as graphs dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l ® dll(null)dll( ¯ ) cur ° dll( ° ) ¯ prev next ± Make endpoints and segments explicit, yet high-level l dll( ±, ° ) ± “dll segment” cur ° ® segment summary checker summary (inductive pred) memory address (value) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Some number of memory cells (thin edges) Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯ ! next (cur ! prev ! next)? ¯ next

24 Types for deciding where to unfold ® dll(null) dll( ¯ ) ° dll( ®,null) dll( ¯, ® ) dll( °, ¯ ) dll( ±, ° ) dll(null, ± ) Checker “Run” Checker “Run” (call tree/derivation) Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) If it exists, where is: ° ! next ? ¯ ! next ? If it exists, where is: ° ! next ? ¯ ! next ? Checker Definition Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Bor-Yuh Evan Chang - End-User Program Analysis

25 Types make the analysis robust with respect to how checkers are written ¯ dll( ® )dll( ¯ ) ° Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis Instance ¯ dll 0 ° Summary dll 0 (h) = if (h ! next = null) then true else h ! next ! prev = h and dll 0 (h ! next) Alternative doubly-linked list checker Doubly-linked list checker (as before) Different types for different unfolding

26 Summary of checker parameter types wherewhich Tell where to unfold for which fields robust Make analysis robust with respect to how checkers are written Learn where in summaries unfolding won’t help Bor-Yuh Evan Chang - End-User Program Analysis inferred automatically Can be inferred automatically with a fixed- point computation on the checker definitions

27 Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter types Enable, for example, “back pointer” traversal without blindly guessing where to unfold Bor-Yuh Evan Chang - End-User Program Analysis

28 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

29 Summarize by folding into inductive predicates last = l; cur = l ! next; while (cur != null) { // … cur, last … if (…) last = cur; cur = cur ! next; } list l, last next cur list l next curlast list l next curlast summarize list last list next cur list l Challenge: Precision (e.g., last, cur separated by at least one step) Previous approaches guess where to fold for each graph. Bor-Yuh Evan Chang - End-User Program Analysis Contribution: Determine where by comparing graphs across history

30 Summary: Given checkers, everything is automatic shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

31 Results: Performance Benchmark Max. Num. Graphs at a Program Pt ms Analysis Time (ms) singly-linked list reverse10.6 doubly-linked list reverse11.4 doubly-linked list copy25.3 doubly-linked list remove56.5 doubly-linked list remove and back56.8 search tree with parent insert58.3 search tree with parent insert and back547.0 two-level skip list rebalance687.0 Linux scull driver (894 loc) (char arrays ignored, functions inlined) Times negligible for data structure operations (often in sec or 1 / 10 sec) Expressiveness Expressiveness: Different data structures Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang - End-User Program Analysis TVLA: 850 ms TVLA: 290 ms Space Invader only analyzes lists (built-in) Space Invader only analyzes lists (built-in)

32 Demo: Doubly-linked list reversal Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list Bor-Yuh Evan Chang - End-User Program Analysis

33 Experience with the tool Checkers are easy to write Checkers are easy to write and try out – Enlightening (e.g., red-black tree checker in 6 lines) – Harder to “reverse engineer” for someone else’s code – Default checkers based on types useful Future expressiveness and usability improvements – Pointer arithmetic and arrays – More generic checkers: polymorphic“element kind unspecified” higher-orderparameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang - End-User Program Analysis

34 Summary of Extensible Inductive Shape Analysis Key Insight: Checkers as specifications Developer View:Global, Expressed in a familiar style Analysis View:Capture developer intent, Not arbitrary inductive definitions Constructing the program analysis Generalized segment Intermediate states: Generalized segment predicates types with levels Splitting: Checker parameter types with levels History-guided Summarizing: History-guided approach next list ®¯ c( ° )c0(°0)c0(°0) Bor-Yuh Evan Chang - End-User Program Analysis

35 Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis:Focused on what’s important to the developer Practical precise tools for better software with an end-user approach! Bor-Yuh Evan Chang - End-User Program Analysis

What can inductive shape analysis do for you?