Pointer Analysis B. Steensgaard: Points-to Analysis in Almost Linear Time. POPL 1996 M. Hind: Pointer analysis: haven't we solved this problem yet? PASTE.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley.
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Techniques for proving programs with pointers A. Tikhomirov.
Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Pointer Analysis.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Demand-driven Alias Analysis Implementation Based on Open64 Xiaomi An
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Pointer Analysis.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
1 Practical Object-sensitive Points-to Analysis for Java Ana Milanova Atanas Rountev Barbara Ryder Rutgers University.
Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Previous finals up on the web page use them as practice problems look at them early.
Run-Time Storage Organization
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Intraprocedural Points-to Analysis Flow functions:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Projects. Dataflow analysis Dataflow analysis: what is it? A common framework for expressing algorithms that compute information about a program Why.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Names Variables Type Checking Strong Typing Type Compatibility 1.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India & K. V. Raghavan.
Points-To Analysis in Almost Linear Time Josh Bauman Jason Bartkowiak CSCI 3294 OCTOBER 9, 2001.
Pointer and Escape Analysis for Multithreaded Programs Alexandru Salcianu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan.
Procedure Definitions and Semantics Procedures support control abstraction in programming languages. In most programming languages, a procedure is defined.
1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs.
Manuel Fahndrich Jakob Rehof Manuvir Das
Names and Attributes Names are a key programming language feature
Dataflow analysis.
Harry Xu University of California, Irvine & Microsoft Research
Pointer Analysis Lecture 2
Topic 17: Memory Analysis
Hongtao Yu Wei Huo ZhaoQing Zhang XiaoBing Feng
Pointer Analysis Lecture 2
Pointer analysis.
UNIT V Run Time Environments.
Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG.
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Pointer Analysis B. Steensgaard: Points-to Analysis in Almost Linear Time. POPL 1996 M. Hind: Pointer analysis: haven't we solved this problem yet? PASTE 2001 Presented by Ronnie Barequet

Outline Introduction – What is “Pointer Analysis” – Motivation – Initial approach – Dimension of problem Steensgaard’s Algorithm – Algorithm and its complexity – Pros and Cons Alternatives and additional features: – Andersen’s algorithm – More dimensions – Measures of precision – Hybrid algorithms Conclusions Discussion

Introduction: Pointer Analysis At which locations does a pointer refer to? Example: X = 5 Ptr = &x Ptr2 = ptr X X ptr ptr2

Points-to graph Nodes are program variables Edge (a,b): variable ‘a’ points-to variable ‘b‘ Out-degree of node may be more than one – May point to analysis – Depending on path taken x ptr y

Motivation Compute accesses of portions of code Liveness – Garbage collection – Complier Optimizations – Further static analysis improvement in efficiency and accuracy – Improve scalability of pointer analysis! [Khedker-Mycroft- Rawat 12] Optimization of runtime – Locality of pages – parallelization – profiling

Motivation (cont.) Data Flow: – Call graph construction – Basis for increasingly sensitive analysis Alias Analysis – when are two expression equal? – Want to be able to know this also when code uses multiple dereferences. Typestates – Constant propagation – Use before define [Yahav et al 06] Formal verification

Dimensions Intraprocedural / interprocedural Flow-sensitive / flow-insensitive Context-sensitive / context-insensitive May versus must …

Flow Sensitivity Flow-sensitive analysis: compute state (points-to graph in our case) for each program code\basic block Flow-insensitive analysis: compute least upper bound of states at any time in program execution Persistent data structure are vital for flow sensitive analyses.

Context Sensitivity Is calling context considered? Context-sensitive approach: – treat each function call separately just like real program execution would – problem: what do we do for recursive functions? approximate – Can be exponential Context-insensitive approach: – merge information from all call sites of a particular function – in effect, inter-procedural analysis problem is reduced to intra-procedural analysis problem

Context-insensitive approach x1 = &a y1 = &b swap(&x1,&y1) x2 = &a y2 = &b swap(&x2,&y2) swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; }

Context-sensitive approach x1 = &a y1 = &b swap(&x1,&y1) x2 = &a y2 = &b swap(x2,&y2) swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; } swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; }

Dimensions (cont.) Memory modeling – For global\automatic variables, easy: use a single “node” – For dynamic memory – problem, multiple allocations. suggestions: For each alloc – one node – Context\path sensitivity? One node for each type Create abstract shape of heap (shape analysis)

Abstraction Undecidable problems Abstract problem to decidable overapproximation or underapproximation Sound static analysis overapproximates – Identifies all properties but may report some false alarms that cannot actually occur. – No false negatives. Complete static analysis underapproximates – Any property reported corresponds to actual one, but no guarantee that all actual properties will be reported

Sound Analysis All true properties Complete Analysis We will discuss (mainly) sound analysis

Steensgaard’s Algorithm

Problem Statement Consider may points-to analysis. Flow insensitive – Compute just one graph for the entire program – Consider all statements regardless of control-flow – SSA form makes flow sensitivity less worthwhile Context insensitive (for now) Pointer actions consist of following types: – p = &a (address of, includes allocation statements) – p = q – p = *q – *p = q Want to compute relation pts : P ∪ A → 2 A

Steensgaard’s Algorithm View memory slots as abstract types (roles) in the program. Each pointer can point to one type only – Strong intuition for strongly typed language, such as Java. What about C\x8086? View pointer assignments as constraints on types Features – Also functions are considered memory slots

Formulation as constraints Constraint type AssignmentConstraintMeaning Base (address)a = &b a ⊇ {b}loc(b) ∈ pts(a) Simple (copy)a = b pts(a) = pts(b) Complex (load)a = *b ∀ v ∈ pts(b). pts(a) = pts(v) Complex (store) *a = b ∀ v ∈ pts(a). pts(v) = pts(b)

Steensgaar’d Algorithm - Example z = &A X Y W Z A V B

Steensgaar’d Algorithm - Example x = *y X Y W Z A V B

Steensgaar’d Algorithm - Example x = *y X Y W A ZVZV B ABAB

Original Formulation as Typing rules pts(X) = pts(Y) loc(Y) ∈ pts(X) ∀ v ∈ pts(Y). pts(X) = pts(v) ∀ v ∈ pts(X). pts(v) = pts(Y)

Steensgaard’s Algorithm - Implementation Can be efficiently implemented using Union - Find algorithm – Nearly linear time: O(nα(n)) – α(2 80 ) ≤ 4 Each constraint is processed just once!

Steensgaar’d Algorithm - Advantages Very efficient Interprocedural – each invocation provides a copy constraints on formal parameters Modification for context sensitivity is possible – Due to high efficiency of context insensitive version – Summaries for non-recursive function easy Fast preliminary understanding of where data flows, where the “interesting logic” is.

Steensgaard’s Algorithm – Shortcoming Imprecise analysis Example: City NY City LA NY->building = Empire-State If (query == ‘NY’) {c = &NY} Else {c = &LA} NY LA Empire State C C NY, LA Merge

Alternative Algorithms and Additional Features

A different variation: Andersen’s Algorithm Pointer assignments are subset constraints Constraint type AssignmentConstraintMeaning Base (address)a = &b a ⊇ {b}loc(b) ∈ pts(a) Simple (copy)a = b a ⊇ bpts(a) ⊇ pts(b) Complex (load)a = *b a ⊇ *b ∀ v ∈ pts(b). pts(a) ⊇ pts(v) Complex (store) *a = b *a ⊇ b ∀ v ∈ pts(a). pts(v) ⊇ pts(b)

Andersen’s Algorithm Compute minimal point under constraints – Work queue algorithm. Complexity O(n 3 ). – In practice much better. Improvements – Copy edge cycle detection & contraction [Hardekopf-Lin 07] – Strong updates if out-degree of x in points-to graph is 1, we can do a strong update even for *x := … Representation might require sensitivity to flow. [Lhotak-Chung 11] – Analysis of Linux kernel in under two minutes

Summary of Steensgard, Andersen x := &z ptr := &x y := &w ptr := &y ptr x z yw w ptr x,y z,w Flow-sensitive algorithm Andersen’s algorithm Steensgard’s algorithm

More Practical Issues Imprecise analysis of structs, arrays. – neglecting offsets in analysis make it crude “Field Sensitivity” Easier in Java, harder in low-level languages – What about variable offsets? Arrays, heaps… Consider a nested packet parser “star” procedures cause massive spurious dataflow Imported procedure require stabs

Measures of Precision Average size of aliased sets – careful… depends on memory model. Compare to dynamic (fuzzer) reports Clients’ precision

Hybrid Algorithms D- level flow [Das 00] – Use Andersen’s rules at top d-levels of PTG, Steensgaard’s elsewhere – Assuming not many d-deep dereferences are taken – Idea: Adapt to approach context sensitive algorithm K-limiting points-to set [Shapiro-Horwitz97] – Steensgaard’s algorithm can be thought as restricting the out-degree of the graph produced by Andersen’s algorithm to 1, by merging nodes when that is exceeded Object sensitivity [Milanova et. al 02] – Inspired by context sensitivity – For context, use stack of method-invoking objects

Heuristics Limited area of enhanced precision – [Ryder et. al ’99] Probabilistic pointer analysis – Seems more useful for optimization

Conclusions Pointer analysis is a complex and developed field in static analysis. Highly useful in program analysis and optimization Many dimensions – heap modeling, offset classes, flow sensitivity, context sensitivity –  different clients with different needs – Tradeoffs

Conclusions (cont.) Flow insensitive analysis as a first step is feasible nowadays Challenges in pointer analysis: – Context (Object) sensitivity – Path sensitivity – Modeling heap (Shape analysis)

Questions?

Discussion

References B. Steensgaard: Points-to Analysis in Almost Linear Time. POPL 1996: M. Hind: Pointer analysis: haven't we solved this problem yet? PASTE 2001: U. P. Khedker, A. Mycroft, P. Singh Rawat: Liveness-Based Pointer Analysis. SAS 2012: S. J. Fink, E. Yahav, N. Dor, G. Ramalingam, E. Geay: Effective typestate verification in the presence of aliasing. ACM Trans. Softw. Eng. Methodol. 17(2) (2008) B. Hardekopf, C. Lin: The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. PLDI 2007: O. Lhoták, K. Chung: Points-to analysis with efficient strong updates. POPL 2011: 3-16 A. Milanova, A. Rountev, B. Ryder: Parameterized object sensitivity for points-to and side-effect analyses for Java. ISSTA 2002: 1-11 M. Das: Unification-based pointer analysis with directional assignments. PLDI 2000: M. Shapiro, S. Horwitz: Fast and Accurate Flow-Insensitive Points-To Analysis. POPL 1997: 1-14 A. Rountev, B. Ryder, W. Landi: Data-Flow Analysis of Program Fragments. ESEC / SIGSOFT FSE 1999: