Pointer Analysis – Part II CS 6340. 1 Unification vs. Inclusion Earlier scalable pointer analysis was context- insensitive unification-based [Steensgaard.

Slides:



Advertisements
Similar presentations
Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs Mayur Naik Intel Research CS294 Lecture March 19, 2009 (Slides courtesy of John Whaley)
Advertisements

Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
CS1Q Computer Systems Lecture 12 Simon Gay. Lecture 12CS1Q Computer Systems - Simon Gay 2 Design of Sequential Circuits The systematic design of sequential.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Faculty of Computer Science LCPC 2007 Using ZBDDs in Points-to Analysis Stephen Curial Jose Nelson Amaral Department of Computing Science University of.
Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Set Constraint-Based Program Analysis Manuel Fähndrich CS590 UW Spring 2001.
©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. COMPSCI 125 Introduction to Computer Science I.
Previous finals up on the web page use them as practice problems look at them early.
Using Datalog with Binary Decision Diagrams for Program Analysis John Whaley, Dzintars Avots, Michael Carbin, Monica S. Lam Stanford University November.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Swerve: Semester in Review. Topics  Symbolic pointer analysis  Model checking –C programs –Abstract counterexamples  Symbolic simulation and execution.
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.
Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs John Whaley Monica Lam Stanford University June 10, 2004.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
MIPS coding. SPIM Some links can be found such as:
Lesson 1b: Computer Systems and Program Development CPS118.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University.
Computer and Programming. Computer Basics: Outline Hardware and Memory Programs Programming Languages and Compilers.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
ReIm & ReImInfer: Checking and Inference of Reference Immutability and Method Purity Wei Huang 1, Ana Milanova 1, Werner Dietl 2, Michael D. Ernst 2 1.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs.
Binary Decision Diagrams Prof. Shobha Vasudevan ECE, UIUC ECE 462.
Hello world !!! ASCII representation of hello.c.
CS 6340 Constraint-Based Analysis. Motivation “What” No null pointer is dereferenced along any path in the program. Program Analysis = Specification +
Introduction to Optimization
Efficient Evaluation of XQuery over Streaming Data
Introduction to Optimization
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
SAT-Based Area Recovery in Technology Mapping
Algorithms and Problem Solving
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
Introduction to Optimization
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Presentation transcript:

Pointer Analysis – Part II CS 6340

1 Unification vs. Inclusion Earlier scalable pointer analysis was context- insensitive unification-based [Steensgaard ’96] –Pointers are either un-aliased or point to the same set of objects –Near-linear, but very imprecise Inclusion-based pointer analysis –Can point to overlapping sets of objects –Closure calculation is O(n 3 ) –Various optimizations [Fahndrich, Su, Heintze,…] –BDD formulation, simple, scalable [Berndl, Zhu]

2 Context Sensitivity Context sensitivity is important for precision –Unrealizable paths Object id(Object x) { return x; } a = id(b);c = id(d);

3 Object id(Object x) { return x; } Context Sensitivity Context sensitivity is important for precision –Unrealizable paths –Conceptually give each caller its own copy a = id(b);c = id(d);

4 Summary-Based Analysis Popular method for context sensitivity Two kinds: –Bottom-up –Top-down Problems: –Difficult to summarize pointer analysis –Summary-based analysis using BDD: not shown to scale [Zhu’02]

5 Cloning-Based Analysis Simple brute force technique –Clone every path through the call graph –Run context-insensitive algorithm on the expanded call graph The catch: exponential blowup

6 Cloning is exponential!

7 Recursion Actually, cloning is unbounded in the presence of recursive cycles Solution: treat all methods in a strongly- connected component as a single node

8 Recursion A G BCD EF A G BCD EF EFEF GG

9 Top 20 Sourceforge Java Apps

10 Cloning is infeasible (?) Typical large program has ~10 14 paths If you need 1 byte to represent a clone, would require 256 terabytes of storage –Registered ECC 1GB DIMMs: $98.6 million Power: 96.4 kilowatts = Power for 128 homes –300 GB hard disks: 939 x $250 = $234,750 Time to read (sequential): 70.8 days Seems unreasonable!

11 BDD comes to the rescue There are many similarities across contexts –Many copies of nearly-identical results BDDs can represent large sets of redundant data efficiently –Need a BDD encoding that exploits the similarities

12 Contribution (1) Can represent context-sensitive call graph efficiently with BDDs and a clever context numbering scheme –Inclusion-based pointer analysis contexts, 19 minutes –Generates all answers

13 Contribution (2) BDD hacking is complicated  bddbddb (BDD-based deductive database) Pointer analysis in 6 lines of Datalog Automatic translation into efficient BDD implementation 10x performance over hand-tuned solver (2164 lines of Java)

14 Contribution (3) bddbddb: General Datalog solver –Supports simple declarative queries –Easy use of context-sensitive pointer results Simple context-sensitive analyses: –Escape analysis –Type refinement –Side effect analysis –Many more presented in the paper

15 Call Graph Relation Call graph expressed as a relation –Five edges: Calls(A,B) Calls(A,C) Calls(A,D) Calls(B,D) Calls(C,D) B D C A

Call Graph Relation 16 Relation expressed as a binary function. –A=00, B=01, C=10, D=11 x1x1 x2x2 x3x3 x4x4 f B D C A

Binary Decision Diagrams Graphical encoding of a truth table 17 x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x1x1 0 edge 1 edge

Binary Decision Diagrams Collapse redundant nodes 18 x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x x1x

Binary Decision Diagrams Collapse redundant nodes 19 x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x4 x2x2 x4x4 x3x3 x3x3 x4x4 x4x4 x4x4 0 x1x1 1

Binary Decision Diagrams Collapse redundant nodes 20 x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x3x3 x4x4 x4x4 0 x1x1 1

Binary Decision Diagrams Collapse redundant nodes 21 x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x4x4 x4x4 0 x1x1 1

Binary Decision Diagrams Eliminate unnecessary nodes 22 x2x2 x4x4 x3x3 x3x3 x2x2 x3x3 x4x4 x4x4 0 x1x1 1

Binary Decision Diagrams Eliminate unnecessary nodes 23 x2x2 x3x3 x2x2 x3x3 x4x4 0 x1x1 1

Binary Decision Diagrams Size is correlated to amount of redundancy, NOT size of relation –As the set gets larger, the number of don’t-care bits increases, leading to fewer necessary nodes 24

25 Expanded Call Graph A DBC E FG H 012 A DBC E FG H EE FFGG HHHHH

26 Numbering Clones A DBC E FG H A DBC E FG H EE FFGG HHHHH

27 Pointer Analysis Example h 1 : v 1 = new Object(); h 2 : v 2 = new Object(); v 1.f = v 2 ; v 3 = v 1.f; Input Relations vPointsTo(v 1, h 1 ) vPointsTo(v 2, h 2 ) Store(v 1, f, v 2 ) Load(v 1, f, v 3 ) Output Relations hPointsTo(h 1, f, h 2 ) vPointsTo(v 3, h 2 ) v1v1 h1h1 v2v2 h2h2 f v3v3

28 hPointsTo(h 1, f, h 2 ) :- Store(v 1, f, v 2 ), vPointsTo(v 1, h 1 ), vPointsTo(v 2, h 2 ). v1v1 h1h1 v2v2 h2h2 f Inference Rule in Datalog v 1.f = v 2 ; Heap Writes:

29 Context-sensitive pointer analysis Compute call graph with context-insensitive pointer analysis –Datalog rules for: assignments, loads, stores discover call targets, bind parameters type filtering –Apply rules until fix-point reached Compute expanded call graph relation Apply context-insensitive algorithm to the expanded call graph

30 Datalog Declarative logic programming language designed for databases –Horn clauses –Operates on relations Datalog is expressive –Relational algebra: Explicitly specify relational join, project, rename –Relational calculus: Specify relations between variables; operations are implicit –Datalog: Allows recursively-defined relations

31 Datalog  BDD Join, project, rename are directly mapped to built-in BDD operations Automatically optimizes: –Rule application order –Incrementalization –Variable ordering –BDD parameter tuning –Many more…

32 Experimental Results Top 20 Java projects on SourceForge –Real programs with 100K+ users each Using automatic bddbddb solver –Each analysis only a few lines of code –Easy to try new algorithms, new queries Test system: –Pentium 4 2.2GHz, 1GB RAM –RedHat Fedora Core 1, JDK 1.4.2_04, javabdd library, Joeq compiler

33

34

35 Multi-type variables A variable is multi-type if it can point to objects of different types –Measure of analysis precision –One line in Datalog Two ways of handling context sensitivity: –Projected: Merge all contexts together –Full: Keep each context separate

36

37 Conclusion The first scalable context-sensitive inclusion- based pointer analysis –Achieves context sensitivity by cloning bddbddb: Datalog  efficient BDD Easy to query results, develop new analyses Very efficient! –< 19 minutes, < 600mb on largest benchmark Complete system is publicly available at: