1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs Lucent Technologies
2PLDI 2000 What is Pointer Analysis? Given a pointer p, which variables does *p refer to? p = &x px q = &y qy if (z) p = q *p may refer to either x or y
3PLDI 2000 Why Should We Care? Indirect access to memory *p = *q; Indirect flow of control Clients: –Optimizing compilers –Software productivity tools –Static verification tools –Test coverage tools
4PLDI 2000 Efficiency versus Precision Hard problem Approximation algorithms –O(n) to O(n 7 ) –Flow and context sensitivity Difficult tradeoffs between efficiency and precision
5PLDI 2000 Our Contributions Off-line variable substitutionOff-line variable substitution –Trading precision for efficiency –Flexibility in choosing the right approximations Precision-preserving substitutionPrecision-preserving substitution –Andersen’s points-to analysis –More than 50% cost reduction
6PLDI 2000 Outline Introduction Off-line Variable Substitution Scaling Andersen’s Analysis Empirical Results Summary
7PLDI 2000 Variable Substitution Technique for cost reduction Replaces a set of variables with a single variable –Smaller points-to graphs –Possible loss of precision Used by several analyses –Done on-the-fly, during the analysis
8PLDI 2000 Off-line Variable Substitution : modify the input problem Alternative: modify the input problem Use variable substitution to simplify the program –Reduces the size of the lattice Analyze the simplified program Recover a solution for the original program –Precise versus approximate solution
9PLDI 2000 xya b Substitution Example p = &a p = &b q = p s = &p t = s x = &a x = &b y = &x psa bqt psa bqt
10PLDI 2000 The Big Picture Construct approximate problems –Choose the right approximations –Control tradeoffs between cost and precision Separation from the analysis algorithm –Easy modifications
11PLDI 2000 Outline Introduction Off-line Variable Substitution Scaling Andersen’s Analysis Empirical Results Summary
12PLDI 2000 Andersen’s Points-to Analysis Flow and context insensitive Worst case: O(n 3 ) Practicality: must reduce analysis cost –Hundreds of thousands LOC in a few minutes
13PLDI 2000 Specific off-line variable substitution which preserves precision
14PLDI 2000 Precision-preserving Substitution Equivalent variables –Have the same points-to sets Substitution –Set of equivalent variables –No targets of points-to edges Linear-time computation of equivalence sets –Non-maximal sets
15PLDI 2000 Subset Graph Nodes: sets of variables Edges: subset relationships &v v*v {v} Pt(*v)Pt(v) &xpx*p p = &x {x} Pt(p) Pt(x) Pt(*p) pq*p*q q = p Pt(p) Pt(q) Pt(*p) Pt(*q)
16PLDI 2000 Sources of Equivalent Variables Strongly-connected components Pt(v 1 ) Pt(v 2 ) … Pt(v k ) Pt(v 1 ) Direct nodes: –No indirect assignments to v (& not taken) v v S1S1 S2S2 S3S3 Pt(v) = S 1 S 2 S 3
17PLDI 2000 Algorithm Highlights Traversal of SCC DAG in topological sort order –Integer label for each SCC Direct SCC: contains only direct nodes –Predecessors with the same label Direct SCC with no predecessors: non-pointers –Eliminate irrelevant statements
18PLDI 2000 Computation of Equivalence Sets r&p p q &x &y *rs t *p x*s y*q j i k
19PLDI 2000 Outline Introduction Off-line Variable Substitution Scaling Andersen’s Analysis Empirical Results Summary
20PLDI 2000 Experiments Overview Large C programs: KLOC Constraint-based implementation of Andersen’s analysis –BANE constraint solver from Berkeley Results: –Many variables have the same points-to sets –53% reduction in running time –59% reduction in memory usage
21PLDI 2000 Program Size
22PLDI 2000 Running Time
23PLDI 2000 Memory Usage
24PLDI 2000 Outline Introduction Off-line Variable Substitution Scaling Andersen’s Analysis Empirical Results Summary
25PLDI 2000 Summary Off-line variable substitution –Control tradeoffs between cost and precision Precision-preserving substitution –Simple and effective linear-time algorithm Empirical results –More than 50% reduction in both time and space Applications for other pointer analyses?