Context-Sensitive Flow Analysis Using Instantiation Constraints CS 343, Spring 01/02 John Whaley Based on a presentation by Chris Unkel.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Type-Based Flow Analysis: From Polymorphic Subtyping to CFL-Reachability Jakob Rehof and Manuel Fähndrich Microsoft Research.
Flow-Insensitive Points-to Analysis with Term and Set Constraints Presentation by Kaleem Travis Patrick.
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.
Static Data Race detection for Concurrent Programs with Asynchronous Calls Presenter: M. Amin Alipour Software Design Laboratory
Program Analysis with Set Constraints Ravi Chugh.
Program Analysis with Set Constraints Ravi Chugh.
Interprocedural analysis © Marcelo d’Amorim 2010.
Aliases in a bug finding tool Benjamin Chelf Seth Hallem June 5 th, 2002.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Interprocedural pointer analysis for C We’ll look at Wilson & Lam PLDI 95, and focus on two problems solved by this paper: –how to represent pointer information.
Set Constraint-Based Program Analysis Manuel Fähndrich CS590 UW Spring 2001.
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
“Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs” CMSC 838Z – Spring 2004 V. Benjamin Livshits and Monica S. Lam presented.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
A Context-Sensitive Pointer Analysis Phase in Open64 Compiler Tianwei Sheng, Wenguang Chen, Weimin Zheng Tsinghua University.
Pointer Analysis for CASH Compiler Framework Deepak Garg Himanshu Jain Spring 2005.
Intraprocedural Points-to Analysis Flow functions:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Swerve: Semester in Review. Topics  Symbolic pointer analysis  Model checking –C programs –Abstract counterexamples  Symbolic simulation and execution.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Pointer and Shape Analysis Seminar Mooly Sagiv Schriber 317 Office Hours Thursday
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.
Type Inference: CIS Seminar, 11/3/2009 Type inference: Inside the Type Checker. A presentation by: Daniel Tuck.
Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs John Whaley Monica Lam Stanford University June 10, 2004.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Points-to Analysis in Almost Linear Time paper by Bjarne Steensgaard 23rd ACM Symposium on Principles of Programming Languages (POPL'96) Microsoft Research.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Imperative Programming. Heart of program is assignment statements Aware that memory contains instructions and data values Commands: variable declarations,
CSE 332: C++ templates This Week C++ Templates –Another form of polymorphism (interface based) –Let you plug different types into reusable code Assigned.
Introduction to Software Testing Chapter 2.4 Graph Coverage for Design Elements Paul Ammann & Jeff Offutt
 Let A and B be any sets A binary relation R from A to B is a subset of AxB Given an ordered pair (x, y), x is related to y by R iff (x, y) is in R. This.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
CPS 506 Comparative Programming Languages Syntax Specification.
Introduction to Software Testing (2nd edition) Chapter 7.4 Graph Coverage for Design Elements Paul Ammann & Jeff Offutt
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Overview of C++ Templates
Algorithms Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin,
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Points-To Analysis in Almost Linear Time Josh Bauman Jason Bartkowiak CSCI 3294 OCTOBER 9, 2001.
CS 343 presentation Concrete Type Inference Department of Computer Science Stanford University.
More About Data Types & Functions. General Program Structure #include statements for I/O, etc. #include's for class headers – function prototype statements.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Polymorphic Type-Based Flow Analysis Jakob Rehof Microsoft Research Redmond, WA, USA.
COMP 412, FALL Type Systems C OMP 412 Rice University Houston, Texas Fall 2000 Copyright 2000, Robert Cartwright, all rights reserved. Students.
Inter-procedural analysis
Manuel Fahndrich Jakob Rehof Manuvir Das
CS314 – Section 5 Recitation 9
Paul Ammann & Jeff Offutt
Functions.
Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
More About Data Types & Functions
Graph Coverage for Design Elements CS 4501 / 6501 Software Testing
Pointer analysis.
C Pointers Another ref:
Presentation transcript:

Context-Sensitive Flow Analysis Using Instantiation Constraints CS 343, Spring 01/02 John Whaley Based on a presentation by Chris Unkel

Instantiation Constraints  Flow-insensitive  Context-sensitive  Handles higher-order functions (function pointers) smoothly  “Flow” analysis (“provenance” analysis)  Inspired by Henglein’s Type Inference with Polymorphic Recursion 1993  Constraint-based analysis

Constraint-Based Analysis  Pattern of program analysis  Program is read to produce abstract representation—set of constraints Graph System of equations  Abstract representation is processed  Result is examined to tell us about the program

Types a2a2 ptr 3 a4a4 Alpha type Pointer type pointer pointee func 4 a5a5 a6a6 Function type function inputreturn

Equality Constraint  Result of an assignment  Values flow both ways From *x to *y From *y to *x  Handle with unification *x = *y; x: ptr 1 *x: a 2 y: ptr 3 *y: a 4

Instantiation Constraint  Used to make connections across procedures  Values flow one direction only  Generated by naming functions  Identified with labels int foo(int x) { … } … foo 1 (b); foo: func 4 x: a 5 a6a6 foo 1 : func 1 b: a 2 a3a3 )1)1

Call Id Twice Example int *id(int *x) { return x; } void main(void) { int *a, *b, *c, *d; int e, f; b = &e; d = &f; a = id 1 (b); c = id 2 (d); }

Generated Graph id: func x: a 5 a: ptrb: ptrc: ptrd: ptr fe *a*c id 2 : func id 1 : func )1)1 )2)2

Processing Rules (1) foo: func 4 x: a 5 a6a6 foo 1 : func 1 b: a 2 a3a3 )1)1 foo: func 4 x: a 5 a6a6 foo 1 : func 1 b: a 2 a3a3 )1)1 )1)1 (1(1

Processing Rules (2) ptr 1 a2a2 ptr 3 a4a4 )1)1 (1(1 ptr 1 a2a2 ptr 3 a4a4 )1)1 (1(1 )1)1 ptr 1 a2a2 ptr 3 a4a4 ptr 1 a2a2 ptr 3 a4a4 )1)1 (1(1 (1(1

Processing Rules (3) a1a1 a3a3 )1)1 (1(1 a2a2 a1a1 a3a3 )1)1 (1(1 a2a2 a 1, a 3 )1)1 (1(1 a2a2 “Closure rule”

Generated Graph id: func x: a 5 a: ptrb: ptrc: ptrd: ptr fe *a*c id 2 : func id 1 : func )1)1 )2)2

Processing Graph (1) id: func x: a 5 a: ptrb: ptrc: ptrd: ptr fe *a*c id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2

Processing Graph (2) id: func x: a 5 a: ptrb: ptrc: ptrd: ptr fe *a*c id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2

Processing Graph (3) id: func x: a 5 a: ptrb: ptrc, d: ptr fe *a id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2

Processing Graph (4) id: func x: a 5 a: ptrb: ptrc, d: ptr fe *a id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2 (1(1 )1)1

Processing Graph (5) id: func x: a 5 a: ptrb: ptrc, d: ptr fe *a id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2 (1(1 )1)1

Result id: func x: a 5 a, b: ptr c, d: ptr fe id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2 (1(1 )1)1

Result id: func x: a 5 a, b: ptr c, d: ptr fe id 2 : func id 1 : func )2)2 )2)2 )1)1 (2(2 (1(1 )1)1

Nuts and Bolts  Build the constraints for a procedure  Simplify those constraints  Instantiate (copy) those constraints to the callers  Polarity: Distinguish between function inputs and outputs

Using the Results  A value in on variable can end up in a second if: There is a path in the graph consisting of 0 or more red/close paren edges followed by 0 or more green/open paren edges  From a to x: yes  From a to c: no ** ( )

An Application  Format String Vulnerability Some format strings can cause data to be overwritten E.g. “%n%n%n%n%n%n%n%n” Malicious format string can gain control of program  Problem formulation Values should not flow from an unsafe source to a format string Source, network in recv Format string, first argument to printf

Inst. Constraints Summary  Result shows provenance—source of values  Can be used as a pointer analysis  Algorithm is actually undecidable But seems to run quickly in practice  Context-sensitive Paren-matching through closure rule  Interesting base analysis to build tools on

The End

Some Other Pointer Work  Andersen Flow-insensitive, with directional assignments Assignment copies points-to set of rhs to lhs Cycles make it slow; other work to find and collapse cycles  Das (“one level flow”) One pointer level of directionality Handles the common cases in C well Multiple return values, altering parameters  Wilson and Lam Flow-sensitive, context-sensitive

Further Reading Manuvir Das. Unification-Based Pointer Analysis with Directional Assignments. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, Vancouver, BC, Canada, June Manuel Fahndrich, Jakob Rehof, Manuvir Das. Scalable context-sensitive flow analysis using instantiation constraints. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, Vancouver, BC, Canada, June R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, June L.O. Andersen. Program analysis and specialization for the C programming language. Technical Report 94-19, University of Copenhagen, 1994.

Overview  Steensgaard ’95 Flow-insensitive, context-insensitive alias analysis  Liang and Harrold ’99 Straightforward context-sensitive extension of Steensgaard  Fahndrich, Rehof, Das ’00 Context-sensitive extension of Steensgaard that handles function pointers and higher-order functions smoothly

Steensgaard Preliminaries  “Type inference” Types here do not refer to integer, char, etc! “Non-standard” or “extended” types Two objects sharing the same type have some property in common Point to the same things Each variable in the program has a type Type rules describe a consistent typing  Points-to sets vs. alias pairs  Begin by ignoring the conditional join stuff

Steensgaard  Basic operation After an assignment x=y, x and y point to the same set of things (*x and *y are the same) Non-directional: x=y has same effect as y=x Also implies that *x and *y point to the same set of things (and **x and **y, and so on) Alias relation (not points-to relation) is symmetric, reflexive, transitive  Types can be grouped into equivalence classes of objects that point to the same things  Assignment joins equivalence classes of pointees together

Steensgaard Example (1) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c x y z &x &y &z *a *c *b *y

Steensgaard Example (2) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c x y z &x &y &z *a *c *b *y

Steensgaard Example (3) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c y z &x &y &z *a, x *c *b *y

Steensgaard Example (4) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c y z &x &y &z *a, x *c *b *y

Steensgaard Example (5) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b cz &x &y &z *a, x *c *b, y *y

Steensgaard Example (6) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b cz &x &y &z *a, x *c *b, y *y

Steensgaard Example (7) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c &x &y &z *a, x *c *b, y *y, z

Steensgaard Example (8) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c &x &y &z *a, x *c *b, y *y, z

Steensgaard Example (9) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c &x &y &z *a, x, *y, z *c *b, y

Steensgaard Example (10) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c &x &y &z *a, x, *y, z *c *b, y

Steensgaard Example (Result) a = &x; b = &y; if (p) y = &z; else y = &x; c = &y; a b c &x &y &z *a, x, *y, z *b, y, *c

 Observation: forcing pointees to join is sometimes too strong an action Assignment is really directional After x=y, x points to everything y points to Using a directional assignment prevents using equivalence classes/union find But, if y’s points-to set is null, OK to do nothing When we see x=y for y not a pointer (bottom type in this notation), don’t join immediately, but record the fact that if y later is found to point to something, we must join it with x.

A Typing Rule GIVEN x : ref(a) x has type pointer to type a y : ref(b)y has type pointer to type b b  ab is not a pointer type, or a and b are the same type we conclude that welltyped (x = y)our types are well-typed for statement x = y (we have a consistent points-to graph) Reading downward verifies consistency; upward gives constraints.

Steensgaard Summary  Fast union-find allows solution in “near- linear” (linear times inverse Ackerman’s function) time  Does not handle structs (but see later paper by same author)  Flow-insensitive  Context-insensitive  Non-directional assignments

Liang & Harrold Operation  Do Steensgaard within each procedure to build a summary  Then do bottom-up, top-down propagation of results Bottom-up: aliases from callees to callers Top-down: from callers to callees  “FICS”=flow-insensitive context-sensitive

L & H Example (1) int *id(int *x) { return x; } int e, f; void main(void) { int *a, *b, *c, *d; b = &e; d = &f; a = id(b); c = id(d); } main id

L & H Example (2) (Phase 1) aidret *x, *idret xbcd fe This is the result of applying Steensgaard to each procedure individually; the result is a summary of the pointer behavior of each function. mainid

L & H Example (3) (Phase 2) aidret *x, *idret xbcd fe Propagate pointer information from callees to callers (apply summaries of called functions.) Bind formals to actuals, and returns to where they are assigned. (Bottom-up phase.) mainid bindings induced edge First call site

L & H Example (4) (Phase 2) aidret *x, *idret xbcd fe mainid Second call site

L & H Example (5) (Phase 3) aidret *x, *idret, e xbcd fe Propagate pointer information from callers to callees. (Top-down phase.) mainid First call site

L & H Example (6) (Phase 3) aidret *x, *idret, e, f xbcd fe mainid Second call site

L & H Summary  Cycles in call graph In BU/TD phases, iterate among procedures in each SCC to find a fixpoint  Presumes call graph pre-exists With function pointers, need pointer analysis to provide call graph! Algorithm as expressed doesn’t handle function pointers