Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Presented By: Krishna Balasubramanian
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Ashish Kundu CS590F Purdue 02/12/07 Language-Based Information Flow Security Andrei Sabelfield, Andrew C. Myers Presentation: Ashish Kundu
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
Securing software by enforcing data-flow integrity Manuel Costa Joint work with: Miguel Castro, Tim Harris Microsoft Research Cambridge University of Cambridge.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Introduction.
Program analysis Mooly Sagiv html://
1 Achieving Trusted Systems by Providing Security and Reliability (Research Project #22) Project Members: Ravishankar K. Iyer, Zbigniew Kalbarczyk, Jun.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Introduction.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Overview of program analysis Mooly Sagiv html://
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
Applying Data Copy To Improve Memory Performance of General Array Computations Qing Yi University of Texas at San Antonio.
Chapter 13: Pointers, Classes, Virtual Functions, and Abstract Classes
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 13: Pointers, Classes, Virtual Functions, and Abstract Classes.
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 14: Pointers, Classes, Virtual Functions, and Abstract Classes.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Secure Web Applications via Automatic Partitioning Stephen Chong, Jed Liu, Andrew C. Meyers, Xin Qi, K. Vikram, Lantian Zheng, Xin Zheng. Cornell University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
Section 4-1: Introduction to Linear Systems. To understand and solve linear systems.
Review Introduction to Searching External and Internal Searching Types of Searching Linear or sequential search Binary Search Algorithms for Linear Search.
Storage Allocation for Embedded Processors By Jan Sjodin & Carl von Platen Present by Xie Lei ( PLS Lab)
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
C++ Programming Language Lecture 2 Problem Analysis and Solution Representation By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Pointer Analysis as a System of Linear Equations. Rupesh Nasre (CSA). Advisor: Prof. R. Govindarajan. Jan 22, 2010.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
CSE 20: Discrete Mathematics for Computer Science Prof. Shachar Lovett.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.
Pointer Analysis for Multithreaded Programs Radu Rugina and Martin Rinard M I T Laboratory for Computer Science.
C Programming Lecture 16 Pointers. Pointers b A pointer is simply a variable that, like other variables, provides a name for a location (address) in memory.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
The Hashemite University Computer Engineering Department
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008.
D A C U C P Speculative Alias Analysis for Executable Code Manel Fernández and Roger Espasa Computer Architecture Department Universitat Politècnica de.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs.
INFORMATION-FLOW ANALYSIS OF ANDROID APPLICATIONS IN DROIDSAFE JARED YOUNG.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
CS 412/413 Spring 2005Introduction to Compilers1 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 30: Loop Optimizations and Pointer Analysis.
Introduction to Optimization
Static Slicing Static slice is the set of statements that COULD influence the value of a variable for ANY input. Construct static dependence graph Control.
Harry Xu University of California, Irvine & Microsoft Research
runtime verification Brief Overview Grigore Rosu
Machine-Independent Optimization
Introduction to Optimization
Introduction to Abstract Data Types
Introduction to Optimization
Presentation transcript:

Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010

What is Pointer Analysis? a = &x; b = a; if (b == *p) { … } else { … } Is this condition always satisfied? Pointer Analysis is a mechanism to statically find out run-time values of a pointer. a and b are aliases. a points to x.

Why Pointer Analysis? For Parallelization.  fun(p) || fun(q); For Optimization.  a = p + 2;  b = q + 2; For Bug-Finding. For Program Understanding.... Clients of Pointer Analysis.

Placement of Pointer Analysis. Pointer Analysis. Parallelizing compiler. String vulnerability finder. Program slicer. Data flow analyzer. Lock synchronizer. Affine expression analyzer. Memory leak detector. Type analyzer. Improved runtime. Secure code. Better debugging. Better compile time.

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store pq

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store pq

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp

Why as a Linear System? Scalability.  Code sizes going into billions. Scalability.  Analyses trade off at least one of i. memory requirement, ii. analysis time, iii.precision. Scalability.  Linear algebra is a mature topic.

Outline. Introduction.  First-cut approach. Prime-factorization approach. Evaluation.

First-cut Approach: Transformations p = &q p = q – 1 p = q p = q p = *q p = q + 1 *p = q p + 1 = q Each address-taken variable (&v) would be assigned a unique value.

First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 a points to x. Solve. a, b, c point to x. p points to a.

First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 b points to x. Solve. a, b, c point to x. p points to a.

First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 c points to x. Solve. a, b, c point to x. p points to a.

First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 p points to a. Solve. a, b, c point to x. p points to a.

First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 a, b, c point to x. p points to a. p points to b. p points to c. Solve. a, b, c point to x. p points to a. Imprecise analysis..

Issues with First-cut Approach. Dereferencing.  a = &x versus *a = x. a = &x*a = x a = x-1 a+1 = x Semantically different. Mathematically same.

Issues with First-cut Approach. Dereferencing.  a = &x versus *a = x. Multiple assignments.  a = &x, a = &y; a = &x; a = &y; Transform. a = x-1; a = y-1; Solve. No solution.

Issues with First-cut Approach. Dereferencing.  a = &x versus *a = x. Multiple assignments.  a = &x, a = &y; Cyclic assignments.  a = &a; a = &a; Transform. a = a-1 Solve. No solution.

Issues with First-cut Approach. Dereferencing.  a = &x versus *a = x. Multiple assignments.  a = &x, a = &y; Cyclic assignments.  a = &a; Symmetry of assignment.  a = b implies b = a.

Outline. Introduction. First-cut approach.  Prime-factorization approach. Evaluation.

Important Ideas. Address of a variable as a prime number. Points-to set as a multiplication of primes. Variable renaming to avoid inconsistency.

Prime-factorization Approach: Transformations p = &q p i * (p = prime(&q)) p = q p i * (p = q) p = *q p i * (p = q + 1) *p = q handled separately Each address-taken variable (&v) would be assigned a unique prime number.

Points-to Information Lattice … … 3*5*73*5*113*7*115*7*11… 3*5*7*11*… 1 Precision increases We start with larger primes to avoid composition gap problem.

Algorithm Outline. do { equations = Linearize(constraints); solution = LinSolve(equations); points-to = Interpret(solution); constraints += AddConstraints(store-constraints, points-to); } while points-to information changes;

Example. a = &x; p = &a; b = *p; c = b; a = a0*17 p = p0*101 b = b0*(p+1) c = c0*b Transform. Solve. &x = 17 &a = 101 a0 = 1 b0 = 1 c0 = 1 p0 = 1 a = 17 p = 101 b = 102 c = 102 Interpret. a = 17 p = 101 b = 17 c = => => 1 dereference on 101 => 1 dereference on &a => a => 17.

Solution Properties. Integrality. – Only addition and multiplication over integers. Feasibility. – No negative weight cycle. Uniqueness. – Each variable is defined only once.

Soundness. If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77.  Base: p points to x and y by direct assignment.  Induction: p points to x and y due to an indirect assignment (copy, load, store).  Prove that all indirect assignments are safe.  Argument: Multiplication moves the dataflow fact upwards in the lattice. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.

Precision. If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y. Argument: Prime factorization is unique. Thus, 77 can be decomposed only as 7*11. Prove that none of the address-of, copy, load, store statements add extra primes into the composition. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.

Properties. If the value of a pointer p is a prime number, then it defines a must-point-to relation, else it is a may-point-to relation. If the value of p is 1, then p is unused. If pointers p1 and p2 have the same value, then p1 and p2 are pointer equivalent. Variables x and y are location equivalent when &x dividing the value of pointer p implies &x*&y also divide the value. Pointers p1 and p2 are aliases if gcd(p1, p2) != 1.

Outline. Introduction. First-cut approach. Modified approach.  Evaluation.

Evaluation. Benchmarks: SPEC 2000, httpd, sendmail. Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM. Analysis: Context-sensitive, Flow-insensitive.

Analysis Time (seconds).

Memory (MB).

Summary. We proposed a novel representation of points- to information using prime factorization. We solved pointer analysis as a system of linear equations. We empirically showed that it is competitive to the state-of-the-art algorithms.

Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010

Our Contributions. Ordering points-to statements in an intelligent way to improve the analysis time. Dynamic partitioning of points-to statements for a prioritized points-to analysis. Probabilistic points-to analysis using bloom filters.  Points-to analysis as a set of linear equations.

Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q pq p q p q qp qp qp qp