Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.

Slides:



Advertisements
Similar presentations
Linked Lists Linked Lists Representation Traversing a Linked List
Advertisements

Data Structure Lecture-3 Prepared by: Shipra Shukla Assistant Professor Kaziranga University.
 Pointers, Arrays, Destructors Is this random stuff or are these somehow connected?
An introduction to pointers in c
Stack & Queues COP 3502.
Dynamic Memory Allocation in C.  What is Memory What is Memory  Memory Allocation in C Memory Allocation in C  Difference b\w static memory allocation.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Carnegie Mellon 1 Dynamic Memory Allocation: Basic Concepts : Introduction to Computer Systems 17 th Lecture, Oct. 21, 2010 Instructors: Randy Bryant.
Chapter 6 Path Testing Software Testing
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Program Memory MIPS memory operations Getting Variable Addresses Advanced structures.
Chapter 9 Pointers and Dynamic Arrays. Overview 9.1 Pointers 9.2 Dynamic Arrays.
Data Structures (Second Part) Lecture 2 : Pointers Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
© Janice Regan, CMPT 128, February CMPT 128: Introduction to Computing Science for Engineering Students Pointers.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
Control Flow Analysis. Construct representations for the structure of flow-of-control of programs Control flow graphs represent the structure of flow-of-control.
1 Introduction to Data Flow Analysis. 2 Data Flow Analysis Construct representations for the structure of flow-of-data of programs based on the structure.
The University of Adelaide, School of Computer Science
Informática II Prof. Dr. Gustavo Patiño MJ
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
Pointers and Dynamic Variables. Objectives on completion of this topic, students should be able to: Correctly allocate data dynamically * Use the new.
Insertion into a B+ Tree Null Tree Ptr Data Pointer * Tree Node Ptr After Adding 8 and then 5… 85 Insert 1 : causes overflow – add a new level * 5 * 158.
Introduction to C Programming CE Lecture 19 Linear Linked Lists.
Stacks CS-240 & CS-341 Dick Steflik. Stacks Last In, First Out operation - LIFO As items are added they are chronologically ordered, items are removed.
Pointers Applications
1 Procedural Concept The main program coordinates calls to procedures and hands over appropriate data as parameters.
Self Referential Structure. A structure may not contain a member of its own type. struct check { int item; struct check n; // Invalid };
The Data Element. 2 Data type: A description of the set of values and the basic set of operations that can be applied to values of the type. Strong typing:
Comp 245 Data Structures Linked Lists. An Array Based List Usually is statically allocated; may not use memory efficiently Direct access to data; faster.
Stack and Heap Memory Stack resident variables include:
Computer Science Department Data Structure & Algorithms Lecture 8 Recursion.
Pointers OVERVIEW.
Chapter 1 Object Oriented Programming. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
Lists II. List ADT When using an array-based implementation of the List ADT we encounter two problems; 1. Overflow 2. Wasted Space These limitations are.
Pointers in C++. 7a-2 Pointers "pointer" is a basic type like int or double value of a pointer variable contains the location, or address in memory, of.
Introduction to C Programming Lecture 6. Functions – Call by value – Call by reference Arrays Today's Lecture Includes.
Review 1 Queue Operations on Queues A Dequeue Operation An Enqueue Operation Array Implementation Link list Implementation Examples.
CSCI 115 Chapter 8 Topics in Graph Theory. CSCI 115 §8.1 Graphs.
Chapter 16 – Data Structures and Recursion. Data Structures u Built-in –Array –struct u User developed –linked list –stack –queue –tree Lesson 16.1.
11 PART 2 ARRAYS. 22 PROCESSING ARRAY ELEMENTS Reassigning Array Reference Variables The third statement in the segment below copies the address stored.
Circular linked list A circular linked list is a linear linked list accept that last element points to the first element.
Linked list: a list of items (nodes), in which the order of the nodes is determined by the address, called the link, stored in each node C++ Programming:
Data Structure & Algorithms
 Data Type is a basic classification which identifies different types of data.  Data Types helps in: › Determining the possible values of a variable.
1 CSC103: Introduction to Computer and Programming Lecture No 17.
White-Box Testing Statement coverage Branch coverage Path coverage
CSC 143 P 1 CSC 143 Recursion [Chapter 5]. CSC 143 P 2 Recursion  A recursive definition is one which is defined in terms of itself  Example:  Compound.
© Oxford University Press All rights reserved. Data Structures Using C, 2e Reema Thareja.
Pointers as arrays C++ Programming Technologies. Pointers vs. Arrays Pointers and arrays are strongly related. In fact, pointers and arrays are interchangeable.
Recap Resizing the Vector Push_back function Parameters passing Mechanism Primitive Arrays of Constants Multidimensional Arrays The Standard Library string.
CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.
Lectures linked lists Chapter 6 of textbook
Graph-Based Operational Semantics
Week 7 Part 2 Kyle Dewey.
Yung-Hsiang Lu Purdue University
Here is the graph of a function
Dynamic Memory Allocation
Program Slicing Baishakhi Ray University of Virginia
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
Pointers & Objects.
C Programming Lecture-8 Pointers and Memory Management
Data Structures and Algorithms Introduction to Pointers
C Programming Pointers
Dynamic Memory And Objects
Variables and Computer Memory
Linked Lists.
Lecture 4: Instruction Set Design/Pipelining
Arrays and Pointers.
CS 201 Compiler Construction
Presentation transcript:

Analysis of programs with pointers

Simple example What are the dependences in this program? Problem: just looking at variable names will not give you the correct information –After statement S2, program names “x” and “*ptr” are both expressions that refer to the same memory location. –We say that ptr points-to x after statement S2. In a C-like language that has pointers, we must know the points-to relation to be able to determine dependences correctly x := 5 ptr *ptr := 9 y := x S1 S2 S3 S4 dependencesprogram

Program model For now, only types are int and int* No heap –All pointers point to only to stack variables No procedure or function calls Statements involving pointer variables: –address: x := &y –copy: x := y –load: x := *y –store: *x := y Arbitrary computations involving ints

Points-to relation Directed graph: –nodes are program variables –edge (a,b): variable a points-to variable b Can use a special node to represent NULL Points-to relation is different at different program points x ptr y

Out-degree of node may be more than one –if points-to graph has edges (a,b) and (a,c), it means that variable a may point to either b or c –depending on how we got to that point, one or the other will be true –path-sensitive analyses: track how you got to a program point (we will not do this) Points-to graph if (p) then x := &y else x := &z ….. p x := &y x := &z What does x point to here?

Ordering on points-to relation Subset ordering: for a given set of variables –Least element is graph with no edges –G1 <= G2 if G2 has all the edges G1 has and maybe some more Given two points-to relations G1 and G2 –G1 U G2: least graph that contains all the edges in G1 and in G2

Overview We will look at three different points-to analyses. Flow-sensitive points-to analysis –Dataflow analysis –Computes a different points-to relation at each point in program Flow-insensitive points-to analysis –Computes a single points-to graph for entire program –Andersen’s algorithm Natural simplification of flow-sensitive algorithm –Steensgard’s algorithm Nodes in tree are equivalence classes of variables –if x may point-to either y or z, put y and z in the same equivalence class Points-to relation is a tree with edges from children to parents rather than a general graph Less precise than Andersen’s algorithm but faster

Example x := &z ptr y ptr ptr x z yw w ptr x,y z,w Flow-sensitive algorithm Andersen’s algorithm Steensgard’s algorithm

Notation Suppose S and S1 are set-valued variables. S  S1: strong update –set assignment S U  S1: weak update –set union: this is like S  S U S1

Flow-sensitive algorithm

Dataflow equations Forward flow, any path analysis Confluence operator: G1 U G2 Statements x := &y G G’ = G with pt’(x)  {y} x := y G G’ = G with pt’(x)  pt(y) x := *y G G’ = G with pt’(x)  U pt(a) for all a in pt(y) *x := y G G’ = G with pt’(a) U  pt(y) for all a in pt(x)

Dataflow equations (contd.) x := &y G G’ = G with pt’(x)  {y} x := y G G’ = G with pt’(x)  pt(y) x := *y G G’ = G with pt’(x)  U pt(a) for all a in pt(y) *x := y G G’ = G with pt’(a) U  pt(y) for all a in pt(x) strong updates weak update (why?)

Strong vs. weak updates Strong update: –At assignment statement, you know precisely which variable is being written to –Example: x := …. –You can remove points-to information about x coming into the statement in the dataflow analysis. Weak update: –You do not know precisely which variable is being updated; only that it is one among some set of variables. –Example: *x := … –Problem: at analysis time, you may not know which variable x points to (see slide on control-flow and out-degree of nodes) –Refinement: if out-degree of x in points-to graph is 1 and x is known not be nil, we can do a strong update even for *x := …

Structures Structure types –struct cell {int value; struct cell *left, *right;} –struct cell x,y; Use a “field-sensitive” model –x and y are nodes –each node has three internal fields labeled value, left, right This representation permits pointers into fields of structures –If this is not necessary, we can simply have a node for each structure and label outgoing edges with field name

Example int main(void) { struct cell {int value; struct cell *next; }; struct cell x,y,z,*p; int sum; x.value = 5; x.next = &y; y.value = 6; y.next = &z; z.value = 7; z.next = NULL; p = &x; sum = 0; while (p != NULL) { sum = sum + (*p).value; p = (*p).next; } return sum; } x y z p next value next value next value x y z p next value next value next value NULL

Flow-insensitive algorithms

Flow-insensitive analysis Flow-sensitive analysis computes a different graph at each program point. This can be quite expensive. One alternative: flow-insensitive analysis –Intuition:compute a points-to relation which is the least upper bound of all the points-to relations computed by the flow- sensitive analysis Approach: –Ignore control-flow –Consider all assignment statements together replace strong updates in dataflow equations with weak updates –Compute a single points-to relation that holds regardless of the order in which assignment statements are actually executed

Andersen’s algorithm Statements x := &y G G = G with pt(x) U  {y} x := y G G = G with pt(x) U  pt(y) x := *y G G = G with pt(x) U  pt(a) for all a in pt(y) *x := y G G = G with pt(a) U  pt(y) for all a in pt(x) weak updates only

Example int main(void) { struct cell {int value; struct cell *next; }; struct cell x,y,z,*p; int sum; x.value = 5; x.next = &y; y.value = 6; y.next = &z; z.value = 7; z.next = NULL; p = &x; sum = 0; while (p != NULL) { sum = sum + (*p).value; p = (*p).next; } return sum; } x.next = &y; y.next = &z; z.next = NULL; p = &x; p = (*p).next; Assignments for flow-insensitive analysis G......

Solution to flow-insensitive equations x y z p next value next value next value NULL - Compare with points-to graphs for flow-sensitive solution - Why does p point-to NULL in this graph?

Andersen’s algorithm formulated using set constraints Statements x := &y x := y x := *y *x := y

Steensgard’s algorithm Flow-insensitive Computes a points-to graph in which there is no fan-out –In points-to graph produced by Andersen’s algorithm, if x points-to y and z, y and z are collapsed into an equivalence class –Less accurate than Andersen’s but faster We can exploit this to design an O(N*  (N)) algorithm, where N is the number of statements in the program.

Steensgard’s algorithm using set constraints Statements x := &y x := y x := *y *x := y No fan-out

Trick for one-pass processing Consider the following equations When first equation on left is processed, x and y are not pointing to anything. Once second equation is processed, we need to go back and reprocess first equation. Trick to avoid doing this: when processing first equation, if x and y are not pointing to anything, create a dummy node and make x and y point to that –this is like solving the system on the right It is easy to show that this avoids the need for revisiting equations.

Algorithm Can be implemented in single pass through program Algorithm uses union-find to maintain equivalence classes (sets) of nodes Points-to relation is implemented as a pointer from a variable to a representative of a set Basic operations for union find: –rep(v): find the node that is the representative of the set that v is in –union(v1,v2): create a set containing elements in sets containing v1 and v2, and return representative of that set

Auxiliary methods rec_union(var v1, var v2) { p1 = pt(rep(v1)); p2 = pt(rep(v2)); t1 = union(rep(v1), rep(v2)); if (p1 == p2) return; else if (p1 != null && p2 != null) t2 = rec_union(p1, p2); else if (p1 != null) t2 = p1; else if (p2 != null) t2 = p2; else t2 = null; t1.set_pt(t2); return t1; } pt(var v) { //v does not have to be representative t = rep(v); return t.get_pt(); //always returns a representative element } class var { //instance variables points_to: var; name: string; //constructor; also creates singleton set in union-find data structure var(string); //class method; also creates singleton set in union-find data structure make-dummy-var():var; //instance methods get_pt(): var; set_pt(var);//updates rep }

Algorithm Initialization: make each program variable into an object of type var and enter object into union-find data structure for each statement S in the program do S is x := &y: {if (pt(x) == null) x.set-pt(rep(y)); else rec-union(pt(x),y); } S is x := y: {if (pt(x) == null and pt(y) == null) x.set-pt(var.make-dummy-var()); y.set-pt(rec-union(pt(x),pt(y))); } S is x := *y:{if (pt(y) == null) y.set-pt(var.make-dummy-var()); var a := pt(y); if(pt(a) == null) a.set-pt(var.make-dummy-var()); x.set-pt(rec-union(pt(x),pt(a))); } S is *x := y:{if (pt(x) == null) x.set-pt(var.make-dummy-var()); var a := pt(x); if(pt(a) == null) a.set-pt(var.make-dummy-var()); y.set-pt(rec-union(pt(y),pt(a))); }

Inter-procedural analysis What do we do if there are function calls? x1 = &a y1 = &b swap(x1, y1) x2 = &a y2 = &b swap(x2, y2) swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; }

Two approaches Context-sensitive approach: –treat each function call separately just like real program execution would –problem: what do we do for recursive functions? need to approximate Context-insensitive approach: –merge information from all call sites of a particular function –in effect, inter-procedural analysis problem is reduced to intra-procedural analysis problem Context-sensitive approach is obviously more accurate but also more expensive to compute

Context-insensitive approach x1 = &a y1 = &b swap(x1, y1) x2 = &a y2 = &b swap(x2, y2) swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; }

Context-sensitive approach x1 = &a y1 = &b swap(x1, y1) x2 = &a y2 = &b swap(x2, y2) swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; } swap (p1, p2) { t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1; }

Context-insensitive/Flow- insensitive Analysis For now, assume we do not have function parameters –this means we know all the call sites for a given function Set up equations for binding of actual and formal parameters at each call site for that function –use same variables for formal parameters for all call sites Intuition: each invocation provides a new set of constraints to formal parameters

Swap example x1 = &a y1 = &b p1 = x1 p2 = y1 x2 = &a y2 = &b p1 = x2 p2 = y2 t1 = *p1; t2 = *p2; *p1 = t2; *p2 = t1;

Heap allocation Simplest solution: –use one node in points-to graph to represent all heap cells More elaborate solution: –use a different node for each malloc site in the program Even more elaborate solution: shape analysis –goal: summarize potentially infinite data structures –but keep around enough information so we can disambiguate pointers from stack into the heap, if possible

Summary Less preciseMore precise Equality-basedSubset-based Flow-insensitiveFlow-sensitive Context-insensitiveContext-sensitive No consensus about which technique to use Experience: if you are context-insensitive, you might as well be flow-insensitive

History of points-to analysis from Ryder and Rayside