U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710 Spring 2003 Pointer Analysis
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 2 Pointer Analysis (= alias analysis, = points-to analysis) Goal: statically determine possible runtime values of a pointer Necessary for optimizations, error detection Vital for C, C++ but: Undecidable Luckily: good approximations Trade-off efficiency & precision Result: points-to sets
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 3 Points-to sets if () p = &a; else p = &b; *p = 3; { a, b }
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 4 Analysis Sensitivity Flow-insensitive What may happen (on at least one path) Linear-time Flow-sensitive Consider control flow (what must happen) Iterative data-flow: possibly exponential Context-insensitive Call treated the same regardless of caller “Monovariant” analysis Context-sensitive Reanalyze callee for each caller “Polyvariant” analysis More sensitivity ) more accuracy, but more expense
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 5 Flow-Sensitive Pointer Analysis Too bad it’s exponential…
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 6 Flow-Insensitivity: Dimensions Address-taken Any object whose address is taken = potential alias Brain-dead, widely used in practice Equality-based Treat assignments as bidirectional Steensgaard’s Almost linear Subset-based Assignment is unidirectional Andersen’s Polynomial complexity Up to 30 times slower, but more precise
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 7 Steensgaard’s Algorithm p = q; pqpqpq
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 8 Andersen’s Algorithm p = &q; p = q; q r1r1 p r2r2 p r1r1 r2r2 q r3r3 p = &q; p = q; p = *q; *p = q;
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 9 Andersen’s Algorithm p = *q; *p = q; r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 10 Analysis Examples Compare points-to sets at S5: Address-taken One set of objects for whole program {heap S1, heap S4, heap S6, heap S8, local, p, q} Steensgaard’s Union two objects pointed to by same pointer into one {heap S1, heap S4, heap S6, local} Andersen’s Don’t merge objects {heap S1, heap S4, local}
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 11 More Dimensions Heap modeling Objects can be named by allocation site Source line, n-levels in call stack More sophisticated shape analysis Aggregate modeling Elements (of arrays structs, unions) distinct Or collapsed into one object
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 12 Haven’t We Solved This Problem Yet? From [Hind, 2001]: Past 21 years: at least 75 papers and nine Ph.D. theses published on pointer analysis
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 13 Many Publications…
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 14 So Which Pointer Analysis is Best? Comparisons between algorithms difficult Size of points-to sets inadequate Model heap as one blob = one object for all heap pointers! Trade-offs unclear Faster pointer analysis can mean more objects = more time for client analysis More precise analysis can reduce client analysis time! Idea: use client to drive pointer analyzer…