Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer Science Technische Universität Darmstadt, Germany
Outline A points-to analysis is applied to answer questions like: „Which variables might be accessed by expression *a->b?“ Motivation – Why is this of benefit to know? Code Motion Partial Evaluation Design decisions for a points-to analysis Concepts of Steensgaard‘s points-to analyses Differences between proposed and original PTA Results
MULT r1, b, c STORE r1, (a) LOAD r2, (e) ADD d, r2, f Motivation – Code Motion... *a = b * c; d = *e + f; LOAD r2, (e) MULT r1, b, c STORE r1, (a) LOAD r2, (e) ADD d, r2, f Optimize CPU utilization by code motion RAW hazard on r2 LOAD is multi cycle Pipeline stalled a = &g; e = &g; *a = b * c; d = *e + f; RAW violation If the sets of locations a and e may point to are disjoint → optimization is correct Points-to analysis computes conservative approximation of these sets ANSI C restrict type qualifier: void f(int n, int * restrict p, int * restrict q) { while (n--) *p++ = *q++; }
Motivation – Partial Evaluation int power(int x, unsigned n) { int r=1; while (n) { if (n&1) r=r*x; x=x*x; n=n>>1; } return r; } x : dynamic n : static void power_gen(unsigned n) { printf(„void power(int x, unsigned n) {\n“); printf(„ int r=1;\n“); printf(„ assert(n==%d);\n“, n); while (n) { if (n&1) printf(„ r=r*x;\n“); printf(„ x=x*x;\n“); n=n>>1; } printf(„ return r;\n}“); } n = 3 void power(int x, unsigned n) { int r=1; assert(n==3); r=r*x; x=x*x; r=r*x; x=x*x; return r; } Relation to Points-to Analysis: Can expression *y statically be evaluated?
int *p, a, b; if (a) { p = &a; } else { p = &b; } b = *p; int *p, a, b; if (a) { p = &a; f(p); } else { p = &b; f(p); } b = *p; Storage Shape Graphs p a b PT( *p )={ a,b } PTA computes a storage shape graph undecidable -> conservative approx. Tradeoff between efficiency and accuracy flow sensitivity p b PT( *p )={ b } p a PT( *p )={ a } context sensitivity complexity of storage shape graph pa,b PT( *p )={ a,b } Steensgaard‘s first algorithm [1]: flow insensitive context insensitive graph complexity O(n) almost linear time complexity 80/20 rule: 80% benefit, 20% cost [2] [1] B. Steensgaard. Points-to analysis in almost linear time. In Symposium on Principles of Programming Languages, 1996 [2] M.Hind and A.Pioli. Which pointer analysis should I use? In International Symposium on Software Testing and Analysis, 2000
Concepts of Steensgaard‘s PTA abc de abc de ab,dc e a c,e a = &d; Keeping the storage shape graph O(N) Disjoint-set forests ([2]) => Join is O(α(N,N)) => Analysis O(Nα(N,N)) [2] R.Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 1975 Data flow direction (Pending Joins) a=&b; a=c; c ab c=&d; b,d NIL Join( )
Points-to Analysis Comparison I object simplestruct blank [4] 4 types of nodes: s->a = &a; s->b = &b; *(*int)s = c; s a b s a,b struct object Single kind of representation: Node with fields simple blank struct Contra: Memory layout dependent Pro: Conceptually simpler (algorithm less complex) More precise in case of inconsistent access [4] B.Steensgaard. Points-to Analysis by type inference of programs with structures and unions. In Computational Complexity, 1996 simple
Points-to Analysis Comparison II struct { int *a, *b, *c } s; int **d, f, g, h; s.a = &f; s.b = &g; s.c = &h; d = &s->a; d = &s->b; d f g h s s d f g h s d f g h d f g h s d f g h s d f g h s d f g h s d f g h s d f g h s d f g h s d f g h s dsf,g,hdf,g h d h f g Points-to( **d ) = { f, g, h }Points-to( **d ) = { f, g } Steensgaard‘s Analysis [4]Proposed Analysis 1.Graph Initialization: One node per variable - Join nodes as necessary - Establish links 2.Iterate over statements
Data Structures Abstract locations Fields Field extents Pointer offset range Storage shape graphs are composed of
Relations Intervals-overlap rel.: Sub-interval relation: Field inclusion: Assignment data flow: field extents pointer offset ranges 1 o field extentspointer offset ranges 1 o *a = s *b;
Constraints Deduction Example x = *y;
Results number of variables aliased benchmarkLOC bc6, espresso11, li7, Twelve benchmarks from Todd Austin‘s as well as from the SPEC92 benchmarks. Excerpt: Benchmarks common for Points-To Analyses Hard to compare Steensgaard‘s second paper does not report on this kind of results...
Conclusions We propose an improvement to Steensgard‘s PTA, of which we feel confident that it is more precise than, as fast as, conceptually simpler and thus easier to implement than the original. Thanks for your attention!