Precise Program Analysis with Data Structures Collaborators: George Necula, Xavier Rival (INRIA) Bor-Yuh Evan Chang University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Chapter 22 Implementing lists: linked implementations.
Advertisements

Using Checkers for End-User Shape Analysis National Taiwan University – August 11, 2009 Bor-Yuh Evan Chang 張博聿 University of Colorado, Boulder If some.
Shape Analysis with Structural Invariant Checkers Bor-Yuh Evan Chang Xavier Rival George C. Necula May 10, 2007 OSQ Retreat.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
Extensible Shape Analysis by Designing with the User in Mind Bor-Yuh Evan Chang Bor-Yuh Evan Chang, Xavier Rival, and George Necula University of California,
Semantics Static semantics Dynamic semantics attribute grammars
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Program Representations. Representing programs Goals.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Assembly Code Verification Using Model Checking Hao XIAO Singapore University of Technology and Design.
Type-Based Verification of Assembly Language for Compiler Debugging Bor-Yuh Evan ChangAdam Chlipala George C. NeculaRobert R. Schneck University of California,
ISBN Chapter 3 Describing Syntax and Semantics.
Ross Tate, Juan Chen, Chris Hawblitzel. Typed Assembly Languages Compilers are great but they make mistakes and can introduce vulnerabilities Typed assembly.
Relational Inductive Shape Analysis Bor-Yuh Evan Chang University of California, Berkeley Xavier Rival INRIA POPL 2008.
Reduction in End-User Shape Analysis Dagstuhl - Typing, Analysis, and Verification of Heap-Manipulating Programs – July 24, 2009 Xavier Rival INRIA and.
End-User Shape Analysis National Taiwan University – August 11, 2009 Xavier Rival INRIA/ENS Paris Bor-Yuh Evan Chang 張博聿 U of Colorado, Boulder If some.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator:
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
Describing Syntax and Semantics
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Extensible Untrusted Code Verification Robert Schneck with George Necula and Bor-Yuh Evan Chang May 14, 2003 OSQ Retreat.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
OOP Languages: Java vs C++
Precision Going back to constant prop, in what cases would we lose precision?
Programming Languages and Paradigms Object-Oriented Programming.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
11 Getting Started with C# Chapter Objectives You will be able to: 1. Say in general terms how C# differs from C. 2. Create, compile, and run a.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
Automatic Verification of Pointer Programs using Grammar-based Shape Analysis Hongseok Yang Seoul National University (Joint Work with Oukseh Lee and Kwangkeun.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Checking Reachability using Matching Logic Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Chapter 1 Object Oriented Programming. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
Semantics In Text: Chapter 3.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Design - programming Cmpe 450 Fall Dynamic Analysis Software quality Design carefully from the start Simple and clean Fewer errors Finding errors.
Materialization in Shape Analysis with Structural Invariant Checkers Bor-Yuh Evan Chang Xavier Rival George C. Necula University of California, Berkeley.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
Object Oriented Software Development 4. C# data types, objects and references.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Adaptive Shape Analysis Thomas Wies joint work with Josh Berdine Cristiano Calcagno TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Interfaces & Sub-types Weiss sec Scenario Instructor says: “Implement a class IntegerMath with two methods pow and fact with the following signatures:
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Arrays, Link Lists, and Recursion Chapter 3. Sorting Arrays: Insertion Sort Insertion Sort: Insertion sort is an elementary sorting algorithm that sorts.
LINKED LISTS.
Seminar in automatic tools for analyzing programs with dynamic memory
Reduction in End-User Shape Analysis
CSE 303 Concepts and Tools for Software Development
Presentation transcript:

Precise Program Analysis with Data Structures Collaborators: George Necula, Xavier Rival (INRIA) Bor-Yuh Evan Chang University of California, Berkeley February-April 2008

Precise Program Analysis with Data Structures by Designing with the User in Mind Collaborators: George Necula, Xavier Rival (INRIA) Bor-Yuh Evan Chang University of California, Berkeley February-April 2008

3 Software errors cost a lot $60 billion ~$60 billion annually (~0.5% of US GDP) –2002 National Institute of Standards and Technology report total annual revenue of> 10x annual budget of> Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

4 But there’s hope in program analysis Microsoft Microsoft uses and distributes Static Driver Verifier the Static Driver Verifier Airbus Airbus applies Astrée Static Analyzer the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

5 Because program analysis can eliminate entire classes of bugs For example, –Reading from a closed file: –Reacquiring a locked lock: How? –Systematically examine the program –Simulate running program on “all inputs” –“Automated code review” read( ); acquire( ); Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

6 … code … // x now points to an unlocked lock acquire(x); … code … analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Simulate running program on “all inputs” x acquire(); acquire(x); … code …

7 in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Simulate running program on “all inputs” x xx or …

8 … code … in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state analysis state Must abstract Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures x xx or … x abstract For decidability, must abstract—“model all inputs” (e.g., merge objects) not precise Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) mislabels good code as buggy

9 To address the precision challenge Traditional Traditional program analysis mentality: specifications for our analysis “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” default abstractions “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” My approach My approach: adapt the analysis “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

10 Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) My approachMust involve the user in abstraction My approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

11 Overview of contributions Extensible Inductive Shape Analysis [POPL’08,SAS’07] Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance  Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches  Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

Extensible Inductive Shape Analysis Precise Precise inference of data structure properties Developer-oriented Developer-oriented approach [POPL’08, SAS’07] … Part 1

13 Shape analysis is a fundamental analysis Data structures are at the core of – Traditional languages (C, C++, Java) – Emerging web scripting languages Improves verifiers that try to – Eliminate resource usage bugs (locks, file handles) – Eliminate memory errors (leaks, dangling pointers) – Eliminate concurrency errors (data races) – Validate developer assertions Enables program transformations – Compile-time garbage collection – Data structure refactorings … Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

14 Shape analysis by example: Removing duplicates // l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; Example/Testing Code Review/Static Analysis “no duplicates” l “sorted dl list” l program-specific l 2244 l 244 cur l 24 “sorted dl list” l “segment with no duplicates” cur intermediate state more complicated Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

15 Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Parametric in high-level, developer-oriented predicates + +Extensible + +Targeted to developers Xisa Built-in high-level predicates - -Hard to extend + +No additional user effort (if precise enough) Parametric in low-level, analyzer-oriented predicates + +Very general and expressive - -Hard for non-expert 89 Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Traditional approaches Traditional approaches: My approach My approach: Space Invader [Distefano et al.] TVLA [Sagiv et al.]

16 Key insight for being developer-friendly and efficient checking code Utilize “run-time checking code” as specification for static analysis. assert(sorted_dll(l,…)); for each node cur in list l { remove cur if duplicate; } assert(sorted_dll_nodup(l,…)); ll cur l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) checker Contribution: Automatically generalize checkers for complicated intermediate states Contribution: Build the abstraction for analysis out of developer-specified checking code Contribution: Build the abstraction for analysis out of developer-specified checking code p specifies where prev should point

17 Our framework is … Extensible and targeted for developers –Parametric in developer-supplied checkers Precise yet compact abstraction for efficiency –Data structure-specific based on properties of interest to the developer shape analysis invariant checkers An automated shape analysis with a precise memory abstraction based around invariant checkers. shape analyzer dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

18 Splitting Splitting of summaries (materialization) To reflect updates precisely (strong updates) summarizing And summarizing for termination (widening) Shape analysis is an abstract interpretation on abstract memory descriptions with … cur l l l l l l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

19 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Learn information about the checker to use it as an abstraction Compare and contrast manual code review and our automated shape analysis

20 Overview: Split summaries to interpret updates precisely l cur l Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. l cur split at cur update cur purple to red l cur Challenge: How does the analysis “split” summaries and know where to “split”? Challenge: How does the analysis “split” summaries and know where to “split”?

21 “Split forward” by unfolding inductive definition Ç dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures l cur get: cur ! next l cur null p dll(cur, p) l cur p dll(n, cur) n Analysis doesn’t forget the empty case

22 “Split backward” also possible and necessary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures l cur p dll(n, cur) n for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; “dll segment” l cur p0p0 dll(n, cur) n “dll segment” cur ! prev ! next = cur ! next; l cur dll(n, cur) n null get: cur ! prev ! next Ç Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding?

23 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Contribution: Turns testing code into specification for static analysis How do we decide where to unfold? Derives additional information to guide unfolding dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

24 memory cell (points-to: ° ! next = ± ) Abstract memory as graphs dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures l ® dll(null)dll( ¯ ) cur ° dll( ° ) ¯ prev next ± Make endpoints and segments explicit, yet high-level ldll( ±, ° ) ± “dll segment” cur ° ® segment summary checker summary (inductive pred) memory address (value) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Some number of memory cells (thin edges) Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯ ! next (cur ! prev ! next)? ¯ next

25 Types for deciding where to unfold ® dll(null) dll( ¯ ) ° dll( ®,null) dll( ¯, ® ) dll( °, ¯ ) dll( ±, ° ) dll(null, ± ) Checker “Run” Checker “Run” (call tree/derivation) Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) If it exists, where is: ° ! next ? ¯ ! next ? If it exists, where is: ° ! next ? ¯ ! next ? Checker Definition Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

26 Types make the analysis robust with respect to how checkers are written ¯ dll( ® )dll( ¯ ) ° Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Instance ¯ dll 0 ° Summary dll 0 (h) = if (h ! next = null) then true else h ! next ! prev = h and dll 0 (h ! next) Alternative doubly-linked list checker Doubly-linked list checker (as before) Different types for different unfolding

27 Summary of checker parameter types wherewhich Tell where to unfold for which fields robust Make analysis robust with respect to how checkers are written Learn where in summaries unfolding won’t help Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures inferred automatically Can be inferred automatically with a fixed- point computation on the checker definitions

28 Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter types Enable, for example, “back pointer” traversal without blindly guessing where to unfold Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

29 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

30 Summarize by folding into inductive predicates last = l; cur = l ! next; while (cur != null) { // … cur, last … if (…) last = cur; cur = cur ! next; } list l, last next cur list l next curlast list l next curlast summarize list last list next cur list l Challenge: Precision (e.g., last, cur separated by at least one step) Previous approaches guess where to fold for each graph. Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Contribution: Determine where by comparing graphs across history

31 Summary: Given checkers, everything is automatic shape analyzer abstract interpretation splitting and interpreting update summarizing type “pre-analysis” on checker definitions Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

32 Results: Performance Benchmark Max. Num. Graphs at a Program Pt ms Analysis Time (ms) singly-linked list reverse10.6 doubly-linked list reverse11.4 doubly-linked list copy25.3 doubly-linked list remove56.5 doubly-linked list remove and back56.8 search tree with parent insert58.3 search tree with parent insert and back547.0 two-level skip list rebalance687.0 Linux scull driver (894 loc) (char arrays ignored, functions inlined) Times negligible for data structure operations (often in sec or 1 / 10 sec) Expressiveness Expressiveness: Different data structures Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures TVLA: 850 ms TVLA: 290 ms Space Invader only analyzes lists (built-in) Space Invader only analyzes lists (built-in)

33 Demo: Doubly-linked list reversal Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

34 Experience with the tool Checkers are easy to write Checkers are easy to write and try out – Enlightening (e.g., red-black tree checker in 6 lines) – Harder to “reverse engineer” for someone else’s code – Default checkers based on types useful Future expressiveness and usability improvements – Pointer arithmetic and arrays – More generic checkers: polymorphic“element kind unspecified” higher-orderparameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

35 Short-term future work: Exploiting common specification framework Scenario Scenario:Code instrumented with lots of checker calls (perhaps automatically with object invariants) assert( mychecker(x) ); // … operation on x … assert( mychecker(x) ); parts Can we prove parts statically? Static Analysis View:Hybrid checking Testing View:Incrementalize invariant checking Example Example: Insert in a sorted list l v wu Preservation of sortedness shown statically Emit run-time check for new element: u · v · w Very slow to execute Hard to prove statically (in general) Very slow to execute Hard to prove statically (in general) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

36 Summary of Extensible Inductive Shape Analysis Key Insight: Checkers as specifications Developer View:Global, Expressed in a familiar style Analysis View:Capture developer intent, Not arbitrary inductive definitions Constructing the program analysis Generalized segment Intermediate states: Generalized segment predicates types with levels Splitting: Checker parameter types with levels History-guided Summarizing: History-guided approach next list ®¯ c( ° )c0(°0)c0(°0) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

37 Are there other kinds of program analysis users? Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

38 Two kinds of users of program analysis software developerend-user Wants Wants: Precise program analysis for development tools Wants Wants: Precise program analysis for development tools Wants Wants: Program analysis to certify software is ok Wants Wants: Program analysis to certify software is ok Analysis of Low-Level Code Using Decompilers [SAS’06, TLDI’05] Analysis of Low-Level Code Using Decompilers [SAS’06, TLDI’05] Extensible Inductive Shape Analysis [POPL’08, SAS’07] Extensible Inductive Shape Analysis [POPL’08, SAS’07] 1 1 Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures 2 2

Analysis of Low-Level Code Using Cooperating Decompilers [SAS’06, TLDI’05] Part 2

40 End-users want low-level code analysis code to be executedWant analyses to check code to be executed is ok –E.g., won’t crash, good wrt Static Driver Verifier Do not know any details about the program fully automatic –Analysis must be fully automatic But can demand additional information from the developer –To make analysis automatic Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures source code source-level But most program analyses operate at the source-level! executable code executable code end-user for low-level code analyze r

41 Analyzers for low-level code are more difficult and tedious to build Porting source-level analyses is error prone –one statement becomes many instructions –dependencies between instructions must be carefully tracked Key Insight: Low-level complexity –deals with compilation idioms –mostly independent of the analysis –can be captured with intermediate languages Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures executable code executable code end-user source code for low-level code analyze r for low-level code analyzer

42 Decompile code rather than port analysis cooperating decompilers Framework of small, cooperating decompilers that gradually lift the level of the program for program analysis Decompilation for program analysis –Need not get to original, nor be human understandable safety, not performance –Only concern is safety, not performance Unlike, e.g., Java VM platform (JIT compiler) –Can use additional meta-data (e.g., source-level types) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures executable code executable code end-user source code for low-level code analyze r

43 Summary of results Flexibility and usability –3 compilers (gcc/C, gcj/Java, coolc/Cool) –2 architectures (x86, MIPS) –With 6 decompiler modules –Basic Java type-checking for gcj output implemented in 3-4 hours, 500 lines of code Benefits of modularity heavily used deployed in the classroom –decompiler-based re-implementation of a low-level analysis uncovered 8 bugs in the original implementation (heavily used, deployed in the classroom) Applicability of existing source-level tools –applied C code tools, BLAST and Cqual, on decompiled benchmarks (size: ~10,000 lines of C) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

Future Research

45 Long-term and outreach Theme: Overcome decidability issues in program analysis by tailoring it to the user “Programs” are no longer only written to be executed on computers –E.g., computational models of biological pathways in systems biology Need new “program” analysis tools –Validate models (e.g., pathway model produces only expected products) –Reason about models How do these users work? Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

46 Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis:Focused on what’s important to the developer Cooperating Decompilers adapt program analyses to code end-users run Practical precise tools for better software! Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

What can inductive shape analysis do for you?

Bonus Slides: Extensible Inductive Shape Analysis

49 Intuition: Checkers and types global specification (i.e., per data structure) more precise (typically) only in “steady-state” holds only in “steady-state” need generalization global specification (i.e., per data structure) less precise (typically) always holds always doesn’t need generalization l.sorteddll(prev, min) = if (l = null) then true else l ! prev = prev and min · l ! val and l ! next.sorteddll(l,l ! val) struct Dll { intval; Dll*prev; Dll*next; }; x. sorteddll(…)x : Dll Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

50 Segments as partial checkers ®.dll(null) ¯.dll( ® ) °.dll( ¯ ) ±.dll( ° ) null.dll( ± ) Checker “Run” Instance Summary c0(¯,°0)c0(¯,°0) c( ®, ° ) …… ……… ®¯ c( ° )c0(°0)c0(°0) i i i i = 0 ii 00 c = c 0 ® = ¯ ° = ° 0 ® = ° ¯ = null null next ° ± prev null Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

51 To unfold backward, split the segment and then unfold forward cur = l ! next; while (cur != null) { if (cur ! prev ! val == cur ! val) { cur = cur ! prev; remove_after(cur); } cur = cur ! next; } := 9 ´. ¼ dll( ½ ) Ç ¼  null emp ¼  null ¼ next dll( ¼ ) ´ ½ prev materialize: cur ! prev ! next l ® dll(null)dll( ° ) cur ° ± prev dll( ± ) next " dll( ± ) next " dll( ± ) next " Ç l, cur ° ± prev ® = ± ° = null ° 0 dll( ¯ ) 1 = unfold Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

52 Backward unfolding by forward unfolding ¯ dll(null) dll( ° ) i+1 ° prev split (lemma) dll(null) dll(  ) i 1 ±¯ ° prev dll( ° ) dll(  ) i unfold forward at ± dll(null) dll(  ) i 0 ± ° prev dll( ° )  prev ¯´ next dll( ± ) next dll(null) dll(  ) ± prev  ¯ reduce ´ = ¯, ± = ° Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

53 Chang, Rival, Necula - Shape Analysis with Structural Invariant Checkers History-guided folding listnext listnext listnextlist l, last last cur l l lastcur l, list ? v ? Yes last = l; cur = l ! next; while (cur != null) { if (…) last = cur; cur = cur ! next; } Match edges to identify where to fold Apply local folding rules next llast l l, last

Bonus Slides: Analysis of Low-Level Code

55 Porting source-level analyses is error prone class C extends P { void m() { … } } P p = new P(); P c = … ? new C() : p; … c.m(); r c := m[r sp ] if (r c = 0) L exc r 1 := m[r c ] r 1 := m[r 1 +28] r sp := r sp - 4 m[r sp ] := m[r sp +4] - m[r sp ] := r c icall [r 1 ] Analyzers for low-level code are more difficult and tedious to build Example: Java Type Analysis h r c : P, … i h r c : nonnull P, … i h r 1 : disp(P), … i h r 1 : meth(P,28), … i nonnull h m[r sp ] : nonnull P, … i Type analysis intermixed with low-level reasoning (e.g., args on stack) Type analysis intermixed with low-level reasoning (e.g., args on stack) Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures

56 Porting source-level analyses is error prone class C extends P { void m() { … } } P p = new P(); P c = … ? new C() : p; … c.m(); r c := m[r sp ] if (r c = 0) L exc r 1 := m[r c ] r 1 := m[r 1 +28] r sp := r sp - 4 r p m[r sp ] := r p icall [r 1 ] Analyzers for low-level code are more difficult and tedious to build Example: Java Type Analysis h r c : P, … i h r c : nonnull P, … i h r 1 : disp(P), … i h r 1 : meth(P,28), … i h m[r sp ] : nonnull P, … iunsound Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures Dependencies must be carefully tracked

57 f: … r c := m[r sp +12] if (r c = 0) L exc r 1 := m[r c ] r 1 := m[r 1 +28] r sp := r sp - 4 m[r sp ] := m[r sp +16] icall [r 1 ] … f(t c ): r c := t c if (r c = 0) L exc r 1 := m[r c ] r 1 := m[r 1 +28] t 1 := t c icall [r 1 ](t 1 ) f(c): if (c = 0) L exc icall [m[m[c]+28]] (c) f(obj c): if (c = 0) L exc invokevirtual [c, 28] () f(C c): if (c = 0) L exc c.m() Framework of small, reusable cooperating decompiler modules static void f(C c) { c.m(); } LocalsSymEval OO JavaTypes Local Variables Symbolic Evaluation Dynamic Dispatch your analyzer Bor-Yuh Evan Chang, UC Berkeley - Precise Program Analysis with Data Structures