Download presentation
Presentation is loading. Please wait.
Published byCleopatra Goodwin Modified over 9 years ago
1
Points-to Analysis as a System of Linear Equations Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010
2
What is Pointer Analysis? a = &x; b = a; if (b == *p) { … } else { … } Is this condition always satisfied? Pointer Analysis is a mechanism to statically find out run-time values of a pointer. a and b are aliases. a points to x.
3
Why Pointer Analysis? For Parallelization. fun(p) || fun(q); For Optimization. a = p + 2; b = q + 2; For Bug-Finding. For Program Understanding.... Clients of Pointer Analysis.
4
Placement of Pointer Analysis. Pointer Analysis. Parallelizing compiler. String vulnerability finder. Program slicer. Data flow analyzer. Lock synchronizer. Affine expression analyzer. Memory leak detector. Type analyzer. Improved runtime. Secure code. Better debugging. Better compile time.
5
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store
6
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store pq
7
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store pq
8
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q
9
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q
10
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp
11
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp
12
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp
13
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store qp
14
Why as a Linear System? Scalability. Code sizes going into billions. Scalability. Analyses trade off at least one of i. memory requirement, ii. analysis time, iii.precision. Scalability. Linear algebra is a mature topic.
15
Outline. Introduction. First-cut approach. Prime-factorization approach. Evaluation.
16
First-cut Approach: Transformations p = &q p = q – 1 p = q p = q p = *q p = q + 1 *p = q p + 1 = q Each address-taken variable (&v) would be assigned a unique value.
17
First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 a points to x. Solve. a, b, c point to x. p points to a.
18
First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 b points to x. Solve. a, b, c point to x. p points to a.
19
First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 c points to x. Solve. a, b, c point to x. p points to a.
20
First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 p points to a. Solve. a, b, c point to x. p points to a.
21
First-cut Approach. a = &x; p = &a; b = *p; c = b; a = x - 1 p = a - 1 b = p + 1 c = b Transform. Solve. x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 a, b, c point to x. p points to a. p points to b. p points to c. Solve. a, b, c point to x. p points to a. Imprecise analysis..
22
Issues with First-cut Approach. Dereferencing. a = &x versus *a = x. a = &x*a = x a = x-1 a+1 = x Semantically different. Mathematically same.
23
Issues with First-cut Approach. Dereferencing. a = &x versus *a = x. Multiple assignments. a = &x, a = &y; a = &x; a = &y; Transform. a = x-1; a = y-1; Solve. No solution.
24
Issues with First-cut Approach. Dereferencing. a = &x versus *a = x. Multiple assignments. a = &x, a = &y; Cyclic assignments. a = &a; a = &a; Transform. a = a-1 Solve. No solution.
25
Issues with First-cut Approach. Dereferencing. a = &x versus *a = x. Multiple assignments. a = &x, a = &y; Cyclic assignments. a = &a; Symmetry of assignment. a = b implies b = a.
26
Outline. Introduction. First-cut approach. Prime-factorization approach. Evaluation.
27
Important Ideas. Address of a variable as a prime number. Points-to set as a multiplication of primes. Variable renaming to avoid inconsistency.
28
Prime-factorization Approach: Transformations p = &q p i * (p = prime(&q)) p = q p i * (p = q) p = *q p i * (p = q + 1) *p = q handled separately Each address-taken variable (&v) would be assigned a unique prime number.
29
Points-to Information Lattice. 35711… 152133355577… 3*5*73*5*113*7*115*7*11… 3*5*7*11*… 1 Precision increases We start with larger primes to avoid composition gap problem.
30
Algorithm Outline. do { equations = Linearize(constraints); solution = LinSolve(equations); points-to = Interpret(solution); constraints += AddConstraints(store-constraints, points-to); } while points-to information changes;
31
Example. a = &x; p = &a; b = *p; c = b; a = a0*17 p = p0*101 b = b0*(p+1) c = c0*b Transform. Solve. &x = 17 &a = 101 a0 = 1 b0 = 1 c0 = 1 p0 = 1 a = 17 p = 101 b = 102 c = 102 Interpret. a = 17 p = 101 b = 17 c = 17 102 => 1 + 101 => 1 dereference on 101 => 1 dereference on &a => a => 17.
32
Solution Properties. Integrality. – Only addition and multiplication over integers. Feasibility. – No negative weight cycle. Uniqueness. – Each variable is defined only once.
33
Soundness. If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77. Base: p points to x and y by direct assignment. Induction: p points to x and y due to an indirect assignment (copy, load, store). Prove that all indirect assignments are safe. Argument: Multiplication moves the dataflow fact upwards in the lattice. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.
34
Precision. If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y. Argument: Prime factorization is unique. Thus, 77 can be decomposed only as 7*11. Prove that none of the address-of, copy, load, store statements add extra primes into the composition. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.
35
Properties. If the value of a pointer p is a prime number, then it defines a must-point-to relation, else it is a may-point-to relation. If the value of p is 1, then p is unused. If pointers p1 and p2 have the same value, then p1 and p2 are pointer equivalent. Variables x and y are location equivalent when &x dividing the value of pointer p implies &x*&y also divide the value. Pointers p1 and p2 are aliases if gcd(p1, p2) != 1.
36
Outline. Introduction. First-cut approach. Modified approach. Evaluation.
37
Evaluation. Benchmarks: SPEC 2000, httpd, sendmail. Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM. Analysis: Context-sensitive, Flow-insensitive.
38
Analysis Time (seconds).
39
Memory (MB).
40
Summary. We proposed a novel representation of points- to information using prime factorization. We solved pointer analysis as a system of linear equations. We empirically showed that it is competitive to the state-of-the-art algorithms.
41
Points-to Analysis as a System of Linear Equations Rupesh Nasre. nasre@csa.iisc.ernet.in Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010
42
Our Contributions. Ordering points-to statements in an intelligent way to improve the analysis time. Dynamic partitioning of points-to statements for a prioritized points-to analysis. Probabilistic points-to analysis using bloom filters. Points-to analysis as a set of linear equations.
43
Normalized Input. p = &q address-of p = q copy p = *q load *p = q store p q pq p q p q qp qp qp qp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.