Download presentation
Presentation is loading. Please wait.
1
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University September 18, 2002
2
September 18, 2002SAS 2002Slide 2 Background Andersen’s points-to analysis for C (1994) Flow-insensitive, context-insensitive Flow-insensitive, context-insensitive Inclusion-based, more accurate than unification-based Steensgaard Inclusion-based, more accurate than unification-based Steensgaard O(n 3 ), considered too slow to be practical O(n 3 ), considered too slow to be practical CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01) Online caching/cycle elimination Online caching/cycle elimination Field-independent: 1.3M lines of code in 137s Field-independent: 1.3M lines of code in 137s
3
September 18, 2002SAS 2002Slide 3 Doing it for Java We want Andersen-level pointers for Java Naïve port of CLA algorithm: Spec “compress” benchmark: 2+ hours! Spec “compress” benchmark: 2+ hours! Call graph accuracy: same as RTA (terrible) Call graph accuracy: same as RTA (terrible) Our paper: how to do CLA for Java Spec “compress” benchmark: 5 seconds! Spec “compress” benchmark: 5 seconds! JEdit (1371 classes): ~10 minutes! JEdit (1371 classes): ~10 minutes! Call graph accuracy: very good Call graph accuracy: very good
4
September 18, 2002SAS 2002Slide 4 Java vs. C: Virtual calls Java has many virtual calls Accuracy of analysis strongly affects number of call targets Accuracy of analysis strongly affects number of call targets More call targets leads to more code being analyzed and longer analysis times More call targets leads to more code being analyzed and longer analysis times
5
September 18, 2002SAS 2002Slide 5 Java vs. C: Treatment of Fields Field-independent:in o.f, use only o Most C pointer analyses Most C pointer analyses Sound even for non-type-safe languages Sound even for non-type-safe languages Field-based:in o.f, use only f Very inaccurate, requires type safety Very inaccurate, requires type safety Field-sensitive:in o.f, use both o, f Strictly more accurate than field-independent or field-based Strictly more accurate than field-independent or field-based Essential for Java Essential for Java
6
September 18, 2002SAS 2002Slide 6 Java vs. C: Local variables Local variables/stack locations are reused Flow insensitivity causes many false aliases Local flow sensitivity is necessary
7
September 18, 2002SAS 2002Slide 7 Our Contribution Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLA Field sensitivity Field sensitivity Tracks separate fields of separate objectsTracks separate fields of separate objects Uses “method summary graphs” Uses “method summary graphs” Sparse representation, uses local flow sensitivitySparse representation, uses local flow sensitivity Optimizations Optimizations Caching across iterations, reducing redundant opsCaching across iterations, reducing redundant ops Supports all features of Java Supports all features of Java
8
September 18, 2002SAS 2002Slide 8 Algorithm Overview Intraprocedural: Generate a sparse, flow-insensitive summary graph for each method Based on access paths, uses local flow sensitivity Based on access paths, uses local flow sensitivity Interprocedural: Using summary graphs, build inclusion graph to obtain whole-program result
9
September 18, 2002SAS 2002Slide 9 Method Summaries Sparse, flow-insensitive summary of the semantics of each method Stores (writes) in method Stores (writes) in method Calls made by method and their parameters Calls made by method and their parameters Return values, thrown and caught exceptions Return values, thrown and caught exceptions Use a flow-sensitive technique to generate method summaries Precisely model updates to stack and locals Precisely model updates to stack and locals
10
September 18, 2002SAS 2002Slide 10 Method Summary: Example fg static void foo(C x, C y) { C t = x.f; t.g = y; x.g = x; t.bar(y); } x g yx.f bar(t,y); Code for method foo:Summary for method foo: read edge write edge parameter map edge
11
September 18, 2002SAS 2002Slide 11 Node types A node represents an object at run time. Concrete type nodes Objects that have a known concrete type Objects that have a known concrete type new statements and constant objects new statements and constant objects Abstract nodes Parameters, return values, dereferences Parameters, return values, dereferences Interprocedural phase maps an abstract node to set of concrete nodes it can represent Interprocedural phase maps an abstract node to set of concrete nodes it can represent
12
September 18, 2002SAS 2002Slide 12 Edge types Read edge: Created by load statements Created by load statements Represent dereferences (access paths) of known locations Represent dereferences (access paths) of known locations Write edge: Created by store statements Created by store statements Represent references created by the method Represent references created by the method f f
13
September 18, 2002SAS 2002Slide 13 Outgoing parameter map Records which nodes are passed as which parameters This is used in the interprocedural phase to match call sites to call targets fg x g yx.f t.bar(y);
14
September 18, 2002SAS 2002Slide 14 Generating method summary Worklist data flow solver (flow-sensitive) Strong updates on locals, weak on others Detect and close cycles in access paths More detail in the paper
15
September 18, 2002SAS 2002Slide 15 Review: Andersen’s Points-to Points-to is encoded as inclusion relations x = y implies x y x y is also written as: x y
16
September 18, 2002SAS 2002Slide 16 Review: Andersen’s Points-to x new y e new y.f x.f = e; e = x.f; e 1 = e 2 ; Store Load Copy Transitive closure x new y new y.f e e1 e2e1 e2 e1 e2e1 e2 e 1 e 2, e 2 e 3 e 1 e 3 If code contains:Apply rule:Rule name:
17
September 18, 2002SAS 2002Slide 17 Andersen example t = x.f; t.g = y; x.g = x; fg x g yx.f
18
September 18, 2002SAS 2002Slide 18 Andersen example C t = x.f; t.g = y; x.g = x; fg x g yx.f f DE
19
September 18, 2002SAS 2002Slide 19 Andersen example t = x.f; t.g = y; x.g = x; x new y e new y.f e = x.f; Load If code contains:Apply rule:Rule name: C fg x g yx.f f DE
20
September 18, 2002SAS 2002Slide 20 Andersen example t = x.f; t.g = y; x.g = x; x new y e new y.f e = x.f; Load If code contains:Apply rule:Rule name: C fg x g yx.f f DE
21
September 18, 2002SAS 2002Slide 21 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x new y new y.f e
22
September 18, 2002SAS 2002Slide 22 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x new y new y.f e g
23
September 18, 2002SAS 2002Slide 23 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x new y new y.f e g
24
September 18, 2002SAS 2002Slide 24 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x new y new y.f e g g
25
September 18, 2002SAS 2002Slide 25 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y);
26
September 18, 2002SAS 2002Slide 26 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y);
27
September 18, 2002SAS 2002Slide 27 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y); Bar: this Bar: p1
28
September 18, 2002SAS 2002Slide 28 Overall Picture C D E F “Concrete” world “Abstract” world
29
September 18, 2002SAS 2002Slide 29 Graph-based Andersen Computing full transitive closure is prohibitively expensive Store the graph in pre-transitive form, and calculate reachable nodes on demand
30
September 18, 2002SAS 2002Slide 30 Algorithm foreach write edge e 1 → e 2 do foreach n in getConcreteNodes(e 1 ) add write edge n.f → e 2 foreach read edge e 1 → e 2 do foreach n in getConcreteNodes(e 1 ) add inclusion edge e 2 n.f foreach method call e 1.f() foreach n in getConcreteNodes(e 1 ) add parameter mappings for target method
31
September 18, 2002SAS 2002Slide 31 Caching reachability queries getConcreteNodes(e): transitive closure query on the inclusion graph The same queries are repeated many times Store the result in a hash table Cached result may be stale due to edges added since the last query Cached result may be stale due to edges added since the last query Iterate until convergence Iterate until convergence
32
September 18, 2002SAS 2002Slide 32 Online cycle detection Inclusion graph includes cycles The algorithm collapses cycles as they are traversed During traversal, keeps track of current path During traversal, keeps track of current path If a node on current path is revisited, collapse all nodes in cycle If a node on current path is revisited, collapse all nodes in cycle Each node has a “skip” pointer, which is set when collapsed and followed on all accesses Each node has a “skip” pointer, which is set when collapsed and followed on all accesses
33
September 18, 2002SAS 2002Slide 33 Reusing caches Concrete node cache values don’t change much between algorithm iterations Reallocation and rebuilding them is expensive Reuse caches from old iterations Keep track of an iteration ‘version’ number for each cache entry Keep track of an iteration ‘version’ number for each cache entry
34
September 18, 2002SAS 2002Slide 34 Minimizing set union operations Many caches don’t change across iterations Avoid set union operations for caches that haven’t changed since the last iteration Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry If input set hasn’t changed, set union operation is redundant If input set hasn’t changed, set union operation is redundant
35
September 18, 2002SAS 2002Slide 35 Experimental Results Concrete type inference Static call graph Implemented in ~800 lines of Java Freely available at: http://joeq.sourceforge.net
36
September 18, 2002SAS 2002Slide 36 Programs SpecJVM Standard benchmark suite Standard benchmark suite J2EE – Java 2 Enterprise Edition v1.3 Massive (1+ million lines) business framework Massive (1+ million lines) business framework joeq Compiler infrastructure, 75K lines Compiler infrastructure, 75K lines Cloudscape Database shipped with J2EE, no source code Database shipped with J2EE, no source code JEdit Full-featured editor, 100K lines Full-featured editor, 100K lines
37
September 18, 2002SAS 2002Slide 37 Experimental Results We analyzed the reachable code for each application Results include code in class library Results include code in class library Analysis was very effective in reducing total program size Analysis was very effective in reducing total program size Pentium 4 2GHz 2GB RAM, Redhat 7.2 Sun JDK 1.3.1_01 with 512MB heap
38
September 18, 2002SAS 2002Slide 38 Analysis Precision vs. RTA
39
September 18, 2002SAS 2002Slide 39 Analysis time: Small benchmarks
40
September 18, 2002SAS 2002Slide 40 Analysis time: Large benchmarks
41
September 18, 2002SAS 2002Slide 41 Analysis time (speedup)
42
September 18, 2002SAS 2002Slide 42 Analysis time (bytecodes/second)
43
September 18, 2002SAS 2002Slide 43 Related Work Original CLA paper Heintze and Tardieu (PLDI 2001) Heintze and Tardieu (PLDI 2001) Anderson’s analysis for Java Rountev, Milanova, Ryder (OOPSLA 2001) Rountev, Milanova, Ryder (OOPSLA 2001) Liang, Pennings, Harrold (PASTE 2001) Liang, Pennings, Harrold (PASTE 2001) Many others… Many others… Concrete type inference CHA, RTA CHA, RTA Flow and context sensitivity, 0-CFA Flow and context sensitivity, 0-CFA
44
September 18, 2002SAS 2002Slide 44 Conclusion Improved precision Field sensitivity Field sensitivity Local flow sensitivity Local flow sensitivity Improved efficiency Reuse reachability cache across iterations Reuse reachability cache across iterations Minimize set-union operations Minimize set-union operations Scales to the largest Java programs A new baseline for Java pointers No reason to use a less precise analysis No reason to use a less precise analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.