Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG.

Similar presentations


Presentation on theme: "Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG."— Presentation transcript:

1 Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG

2 The Pointer Alias Analysis Problem
Statically decide for any pair of pointers, at any point in the program, whether two pointers point to the same memory location. This Problem is known to be undecidable. [Landi 1992]

3  Can this loop be parallelized?
Example #define N 1024 extern void f(); int *a, *b; int main(int argc, char *argv[]) { int i; a = (int *)calloc(N, sizeof(int)); b = (int *)calloc(N, sizeof(int)); f(); for(i = 0; i < N-1; i++) a[i] = b[i] + i; return 0; } main.c extern int *a, *b; void f() { a = &b[1]; } link.c  Can this loop be parallelized?

4 The Compiler Writer vs. The Programmer
Pointers are needed by programmers to realize complex data structures. Pointers can make life difficult for compiler optimizing and parallelizing tools.

5 Pointer Analysis Published by Year
Haven’t we Solved this Problem Yet? Graph from [M. Hind Pointer Analysis: Haven’t We Solved this Problem Yet?]

6 How is Pointer Analysis used?
Parallelizing Compiler Can the loop be parallelized? (TLP) void foo(int *a, int *b) { for(i=1; i<N; i++) { a[i] = b[i] / 13; } *a = *b + 1; Guide Compiler Optimizations load/store redundancy elimination, register allocation, CSE, dead code elimination, live variable, instruction scheduling [eg. VLIW] (ILP), loop invariant code motion, etc. Behavioral Synthesis & Data Flow Processors Necessary for partitioning and instruction scheduling. Error Detection & Program Understanding Programmer can detect errors or discover poorly written code.

7 Pointer Analysis Issues
Scalability vs. Accuracy Generally, a ‘difficult’ tradeoff exists between: the amount of computation and memory required vs. the accuracy of the analysis. Precision/Efficiency tradeoff, where is the sweet spot? Which metric should be used? Direct metric The average number of objects aliased to a pointer expression appearing in the program Report performance applied to an optimization Dynamically measure false positives Which benchmark suite? Are the results reproducible?

8 Pointer Analysis Issues
Complications associated with pointer arithmetic, casting, function pointers, long jumps, and multithreaded applications. Can these be ignored? Different pointer analysis uses have different needs. A universal pointer analysis probably doesn’t exist.

9 Pointer Analysis Design Choices
Flow sensitivity Context sensitivity Heap modeling Aggregate modeling Alias representation Whole program

10 Flow Sensitivity Flow sensitive analysis Flow insensitive
The order of the statements matter Need a control flow graph Requires pointer information storage for each program point Flow insensitive The order of statements doesn’t matter Analysis is the same regardless of statement order Uses a single global state Not as precise Path sensitive Each path in a control flow graph is considered

11 Flow Sensitivity Example
S1: a = malloc(…); S2: b = malloc(…); S3: a = b; S4: a = malloc(…); S5: if(c) a = b; S6: if(!c) a = malloc(…) ; S7: ~ = *a; Flow Insensitive aall => [heapS1, heapS2, heapS4, heapS6] Flow Sensitive aS7 => [heapS2, heapS4, heapS6] Path Sensitive aS7 => [heapS2, heapS6]

12 Context Sensitivity Is calling context considered? p => [b]
int f() { p = &b; g(); } int a, b, *p; int main() { f(); p = &a; g(); } p => [b] int g() { ~ = *p; } p => [a]

13 Heap Modeling Heap merged, i.e. ‘no heap modeling’ Allocation site
Each malloc/calloc is considered to be one memory address. Shape analysis Is it a Linked List, Tree, DAG, or Cyclic Graph?

14 Aggregate Modeling Arrays Structures or or or

15 Alias representation Track pointer aliases
a = &b; b = &c; if(…) b = &d; e = b; Track pointer aliases <*a, b>, <*a, e>, <b, e> <**a, c>, <**a, d>, … More precise, less efficient Track points-to info <a, b>, <b, c>, <b, d>, <e, c>, <e, d> Less precise, more efficient **a b *e d c *a e *b a a b c d e

16 Whole Program & Incremental Compilation
Does the analysis require the whole program? Most analysis's require the whole program. Is this practical for commercial compilers? Analysis must be performed at the linking stage. Recompilation will take a long time. What about library code? Incremental Analysis: Can results from a previous analysis be reused? Is the data structure adaptable?

17 Pointer Analysis Output
Typical Output: Typically, ‘Maybe’ is treated like ‘Definitely’ *A = ~ ~ = *B Definitely Not Definitely Maybe Is Maybe Useful?

18  What can p point to at program point S5?
Example void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); S5: … = *p; }  What can p point to at program point S5?

19 Address Taken The most basic/intuitive algorithm possible.
Flow-insensitive algorithm often used in production compilers. The algorithm: Generate the set of all variables whose addresses are assigned to another variable. Assume that any pointer can potentially point to any variable in that set. Linear with respect to the size of the program. Very imprecise.

20 Address Taken Example Points-to set = { } T *p, *q, *r; int main() {
S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } Points-to set = { }

21 Address Taken Example Points-to set = {heap_S1} T *p, *q, *r;
int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } Points-to set = {heap_S1}

22 Address Taken Example Points-to set = {heap_S1, p, heap_S4}
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } Points-to set = {heap_S1, p, heap_S4}

23 Address Taken Example Points-to set = {heap_S1, p, heap_S4, heap_S6}
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } Points-to set = {heap_S1, p, heap_S4, heap_S6}

24 Address Taken Example T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } Points-to set = {heap_S1, p, heap_S4, heap_S6, heap_S8, q, local} p can point to point to: {heap_S1, p, heap_S4, heap_S6, heap_S8, q, local}

25 Anderson Analysis Flow-insensitive iterative algorithm that conservatively uses one points-to graph to represent the entire program. Each abstract stack location is represented by exactly one node in the points-to graph. Given a program statement, the points-to graph building rules are: Worst case complexity of O(n3), where n is the size of the program. Since, it is necessary to iterate until the graph no longer changes. y = x y points-to x y = &x if x points-to w then y points-to w *y = x if y points-to z and x points-to w then z points-to w y = *x if x points-to z and z points-to w

26 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } p fp q r

27 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p fp q r

28 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p fp q r

29 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q r

30 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q heap_S6 r

31 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q heap_S6 r

32 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q heap_S6 r heap_S1

33 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q local heap_S6 r heap_S8

34 Anderson Analysis Example
Need to Iterate until graph does not change T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q local heap_S6 r heap_S8

35 Anderson Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } p q r fp heap_S1 heap_S4 heap_S6 heap_S8 local p can point to point to: {heap_S1, heap_S4, local}

36 Anderson’s Nightmare Scenario
*x = &z; *w = &y; *v = &x; v = &w; v w x y z Initial points-to graph v w x y z Iteration 1 v w x y z Iteration 2 v w x y z Iteration 3 v w x y z Iteration 4

37 Steensgaard Flow-insensitive algorithm that uses a more compact points-to graph to represent the entire program. Each node within the graph can represent a variable number of abstract stack locations but can only point to one other node [i.e every node has a fan-out of 1 or 0]. A fast union-find data structure is used to enable this fan-out rule. The unioning removes the necessity of iteration from the algorithm. Has almost linear worst case complexity, since it requires only one pass through the program. Not as precise as Anderson’s analysis.

38 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } p fp q r

39 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p fp q r

40 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p fp q r

41 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 heap_S4 p fp q r

42 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 heap_S4 p fp q heap_S6 r

43 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 heap_S4 heap_S6 p, q fp r

44 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 heap_S4 heap_S6 p, q fp r heap_S8

45 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 heap_S4 heap_S6 local p, q fp r heap_S8

46 Steensgaard Analysis Example
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } p, q r fp heap_S1 heap_S4 heap_S6 local heap_S8 p can point to point to: {heap_S1, heap_S4, heap_S6, local}

47 Steensgaard applied to Anderson’s Nightmare Scenario
*x = &z; *w = &y; *v = &x; v = &w; v w x y z Initial points-to graph v x z *v = &x ø w y v w x y z *x = &z ø v x z *w = &y ø w y v x z v = &w y w

48 A Flow Sensitive, Context Insensitive Analysis
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 q local heap_S6 r heap_S8

49 A Flow Sensitive, Context Insensitive Analysis
T *p, *q, *r; int main() { S1: p = alloc(T); f(); g(&p); S4: p = alloc(T); … = *p; } void f() { S6: q = alloc(T); g(&q); S8: r = alloc(T); } g(T **fp) { T local; if(…) p = &local; } heap_S1 p heap_S4 fp q local heap_S6 r heap_S8 undefined

50 Emami Flow sensitive, context sensitive, interprocedural points-to algorithm. Has exponential worst case complexity. Very precise, poor scalability. Properties: Arrays aggregated into 2 abstract sets: ahead, atail. Structs separated. Heap represented as one abstract location. The analysis tracks both possible and definite points-to relationships.

51 Emami – Points-to Node S = {(a,b,D), (b,d,P), (b,c,P)}
A definite points to relation from x to y is denoted (x,y,D). A possible points to relation from x to y is denoted (x,y,P). Definite Possible var_name b d a c S = {(a,b,D), (b,d,P), (b,c,P)}

52 Emami – L-locations and R-locations
Given that a points to set S has been calculated for a program point p, you can define the set of locations referred to by each kind of variable referenced in the statement at p. L-loc Sets and R-loc Sets are represented as a pair of the form (x,D) or (x,P). Var Ref L-Loc Set R-Loc Set &a N/A {(a,D)} &a[0] {(ahead,D)} &a[i] (i>0) {(atail,D)} &a[i] (i≥0) {(atail,P), (ahead,P)} malloc {(heap,P)}

53 Emami – L-locations and R-locations
Var Ref L-Loc Set R-Loc Set a {(a,D)} {(x,d) | (a,x,d) Є S} a[0] {(ahead,D)} {(x,d) | (ahead,x,d) Є S} a[i] (i>0) {(atail,D)} {(x,d) | (atail,x,d) Є S} a[i] (i≥0) {(atail,P), (ahead,P),} {(x,P) | (ahead,x,d) Є S U (atail,x,d) Є S} Var Ref L-Loc Set R-Loc Set *a {(x,d) | (a,x,d) Є S} (*a)[0] {(x,d) | (ahead,x,d) Є S} (*a)[i] (i>0) {(x,d) | (atail,x,d) Є S} (*a)[i] (i≥0) {(x,P) | (ahead,x,d) Є S U (atail,x,d) Є S}

54 Emami Algorithm Given a stmt S and an input set I, what is the points to set O? If( !is_pointer_type(S) ) O = I; Else { }

55 Emami’s dataflow framework
Else { /*kill all relationships of definite L-locations of lhs(S)*/ kill_set = {(p,x,d) | (p,D) Є L-locations(lhs(S))}; /*change from definite to possible*/ change_set = … /*Generate all possible relationships between lhs(S) and rhs(S); definite iff lhs(S) and rhs(S) are definite*/ gen_set = … O = ((I U change_set) – kill_set) U gen_set; }

56 Emami Example IN = S = *a = r; kill_set = a b y x z c r s a b y x z c

57 Emami Example a b y x z c change_set = r s a b y x z c gen_set = r s

58 Emami Example a b y x z c IN = r s S = *a = r; a b y x z c OUT = r s

59 Emami Algorithm Merge operation used to handle control flow and loops.
IN1 IN2 IN = Merge(IN1, IN2) IN = Merge(IN, OUT) OUT

60 Concluding Remarks Pointer analysis techniques
Address taken Anderson’s analysis Steensgaard Emami Pointer analysis is a difficult problem that may never be solved. Does hardware support for data speculation make the analysis easier for the compiler?

61 The End… Any Comments or Questions?


Download ppt "Pointer Analysis Jeff Da Silva Sept 20, 2004 CARG."

Similar presentations


Ads by Google