An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University September 18, 2002

September 18, 2002SAS 2002Slide 2 Background  Andersen’s points-to analysis for C (1994) Flow-insensitive, context-insensitive Flow-insensitive, context-insensitive Inclusion-based, more accurate than unification-based Steensgaard Inclusion-based, more accurate than unification-based Steensgaard O(n 3 ), considered too slow to be practical O(n 3 ), considered too slow to be practical  CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01) Online caching/cycle elimination Online caching/cycle elimination Field-independent: 1.3M lines of code in 137s Field-independent: 1.3M lines of code in 137s

September 18, 2002SAS 2002Slide 3 Doing it for Java  We want Andersen-level pointers for Java  Naïve port of CLA algorithm: Spec “compress” benchmark: 2+ hours! Spec “compress” benchmark: 2+ hours! Call graph accuracy: same as RTA (terrible) Call graph accuracy: same as RTA (terrible)  Our paper: how to do CLA for Java Spec “compress” benchmark: 5 seconds! Spec “compress” benchmark: 5 seconds! JEdit (1371 classes): ~10 minutes! JEdit (1371 classes): ~10 minutes! Call graph accuracy: very good Call graph accuracy: very good

September 18, 2002SAS 2002Slide 4 Java vs. C: Virtual calls  Java has many virtual calls Accuracy of analysis strongly affects number of call targets Accuracy of analysis strongly affects number of call targets More call targets leads to more code being analyzed and longer analysis times More call targets leads to more code being analyzed and longer analysis times

September 18, 2002SAS 2002Slide 5 Java vs. C: Treatment of Fields  Field-independent:in o.f, use only o Most C pointer analyses Most C pointer analyses Sound even for non-type-safe languages Sound even for non-type-safe languages  Field-based:in o.f, use only f Very inaccurate, requires type safety Very inaccurate, requires type safety  Field-sensitive:in o.f, use both o, f Strictly more accurate than field-independent or field-based Strictly more accurate than field-independent or field-based Essential for Java Essential for Java

September 18, 2002SAS 2002Slide 6 Java vs. C: Local variables  Local variables/stack locations are reused  Flow insensitivity causes many false aliases  Local flow sensitivity is necessary

September 18, 2002SAS 2002Slide 7 Our Contribution  Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLA Field sensitivity Field sensitivity Tracks separate fields of separate objectsTracks separate fields of separate objects Uses “method summary graphs” Uses “method summary graphs” Sparse representation, uses local flow sensitivitySparse representation, uses local flow sensitivity Optimizations Optimizations Caching across iterations, reducing redundant opsCaching across iterations, reducing redundant ops Supports all features of Java Supports all features of Java

September 18, 2002SAS 2002Slide 8 Algorithm Overview Intraprocedural: Generate a sparse, flow-insensitive summary graph for each method Based on access paths, uses local flow sensitivity Based on access paths, uses local flow sensitivity Interprocedural: Using summary graphs, build inclusion graph to obtain whole-program result

September 18, 2002SAS 2002Slide 9 Method Summaries  Sparse, flow-insensitive summary of the semantics of each method Stores (writes) in method Stores (writes) in method Calls made by method and their parameters Calls made by method and their parameters Return values, thrown and caught exceptions Return values, thrown and caught exceptions  Use a flow-sensitive technique to generate method summaries Precisely model updates to stack and locals Precisely model updates to stack and locals

September 18, 2002SAS 2002Slide 10 Method Summary: Example fg static void foo(C x, C y) { C t = x.f; t.g = y; x.g = x; t.bar(y); } x g yx.f bar(t,y); Code for method foo:Summary for method foo: read edge write edge parameter map edge

September 18, 2002SAS 2002Slide 11 Node types A node represents an object at run time.  Concrete type nodes Objects that have a known concrete type Objects that have a known concrete type new statements and constant objects new statements and constant objects  Abstract nodes Parameters, return values, dereferences Parameters, return values, dereferences Interprocedural phase maps an abstract node to set of concrete nodes it can represent Interprocedural phase maps an abstract node to set of concrete nodes it can represent

September 18, 2002SAS 2002Slide 12 Edge types  Read edge: Created by load statements Created by load statements Represent dereferences (access paths) of known locations Represent dereferences (access paths) of known locations  Write edge: Created by store statements Created by store statements Represent references created by the method Represent references created by the method f f

September 18, 2002SAS 2002Slide 13 Outgoing parameter map  Records which nodes are passed as which parameters  This is used in the interprocedural phase to match call sites to call targets fg x g yx.f t.bar(y);

September 18, 2002SAS 2002Slide 14 Generating method summary  Worklist data flow solver (flow-sensitive)  Strong updates on locals, weak on others  Detect and close cycles in access paths  More detail in the paper

September 18, 2002SAS 2002Slide 15 Review: Andersen’s Points-to  Points-to is encoded as inclusion relations x = y implies x  y x  y is also written as: x  y

September 18, 2002SAS 2002Slide 16 Review: Andersen’s Points-to x  new y e  new y.f x.f = e; e = x.f; e 1 = e 2 ; Store Load Copy Transitive closure x  new y new y.f  e e1  e2e1  e2 e1  e2e1  e2 e 1  e 2, e 2  e 3 e 1  e 3 If code contains:Apply rule:Rule name:

September 18, 2002SAS 2002Slide 17 Andersen example t = x.f; t.g = y; x.g = x; fg x g yx.f

September 18, 2002SAS 2002Slide 18 Andersen example C t = x.f; t.g = y; x.g = x; fg x g yx.f f DE

September 18, 2002SAS 2002Slide 19 Andersen example t = x.f; t.g = y; x.g = x; x  new y e  new y.f e = x.f; Load If code contains:Apply rule:Rule name: C fg x g yx.f f DE

September 18, 2002SAS 2002Slide 20 Andersen example t = x.f; t.g = y; x.g = x; x  new y e  new y.f e = x.f; Load If code contains:Apply rule:Rule name: C fg x g yx.f f DE

September 18, 2002SAS 2002Slide 21 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x  new y new y.f  e

September 18, 2002SAS 2002Slide 22 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x  new y new y.f  e g

September 18, 2002SAS 2002Slide 23 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x  new y new y.f  e g

September 18, 2002SAS 2002Slide 24 Andersen example t = x.f; t.g = y; x.g = x; If code contains:Apply rule:Rule name: C fg x g yx.f f DE x.f = e; Store x  new y new y.f  e g g

September 18, 2002SAS 2002Slide 25 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y);

September 18, 2002SAS 2002Slide 26 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y);

September 18, 2002SAS 2002Slide 27 Mapping method calls t = x.f; t.g = y; x.g = x; t.bar(y); C fg x g yx.f f DE g g t.bar(y); Bar: this Bar: p1

September 18, 2002SAS 2002Slide 28 Overall Picture C D E F “Concrete” world “Abstract” world

September 18, 2002SAS 2002Slide 29 Graph-based Andersen  Computing full transitive closure is prohibitively expensive  Store the graph in pre-transitive form, and calculate reachable nodes on demand

September 18, 2002SAS 2002Slide 30 Algorithm foreach write edge e 1 → e 2 do foreach n in getConcreteNodes(e 1 ) add write edge n.f → e 2 foreach read edge e 1 → e 2 do foreach n in getConcreteNodes(e 1 ) add inclusion edge e 2  n.f foreach method call e 1.f() foreach n in getConcreteNodes(e 1 ) add parameter mappings for target method

September 18, 2002SAS 2002Slide 31 Caching reachability queries  getConcreteNodes(e): transitive closure query on the inclusion graph  The same queries are repeated many times  Store the result in a hash table Cached result may be stale due to edges added since the last query Cached result may be stale due to edges added since the last query Iterate until convergence Iterate until convergence

September 18, 2002SAS 2002Slide 32 Online cycle detection  Inclusion graph includes cycles  The algorithm collapses cycles as they are traversed During traversal, keeps track of current path During traversal, keeps track of current path If a node on current path is revisited, collapse all nodes in cycle If a node on current path is revisited, collapse all nodes in cycle Each node has a “skip” pointer, which is set when collapsed and followed on all accesses Each node has a “skip” pointer, which is set when collapsed and followed on all accesses

September 18, 2002SAS 2002Slide 33 Reusing caches  Concrete node cache values don’t change much between algorithm iterations  Reallocation and rebuilding them is expensive  Reuse caches from old iterations Keep track of an iteration ‘version’ number for each cache entry Keep track of an iteration ‘version’ number for each cache entry

September 18, 2002SAS 2002Slide 34 Minimizing set union operations  Many caches don’t change across iterations  Avoid set union operations for caches that haven’t changed since the last iteration Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry If input set hasn’t changed, set union operation is redundant If input set hasn’t changed, set union operation is redundant

September 18, 2002SAS 2002Slide 35 Experimental Results  Concrete type inference  Static call graph  Implemented in ~800 lines of Java  Freely available at: http://joeq.sourceforge.net

September 18, 2002SAS 2002Slide 36 Programs  SpecJVM Standard benchmark suite Standard benchmark suite  J2EE – Java 2 Enterprise Edition v1.3 Massive (1+ million lines) business framework Massive (1+ million lines) business framework  joeq Compiler infrastructure, 75K lines Compiler infrastructure, 75K lines  Cloudscape Database shipped with J2EE, no source code Database shipped with J2EE, no source code  JEdit Full-featured editor, 100K lines Full-featured editor, 100K lines

September 18, 2002SAS 2002Slide 37 Experimental Results  We analyzed the reachable code for each application Results include code in class library Results include code in class library Analysis was very effective in reducing total program size Analysis was very effective in reducing total program size  Pentium 4 2GHz 2GB RAM, Redhat 7.2  Sun JDK 1.3.1_01 with 512MB heap

September 18, 2002SAS 2002Slide 38 Analysis Precision vs. RTA

September 18, 2002SAS 2002Slide 39 Analysis time: Small benchmarks

September 18, 2002SAS 2002Slide 40 Analysis time: Large benchmarks

September 18, 2002SAS 2002Slide 41 Analysis time (speedup)

September 18, 2002SAS 2002Slide 42 Analysis time (bytecodes/second)

September 18, 2002SAS 2002Slide 43 Related Work  Original CLA paper Heintze and Tardieu (PLDI 2001) Heintze and Tardieu (PLDI 2001)  Anderson’s analysis for Java Rountev, Milanova, Ryder (OOPSLA 2001) Rountev, Milanova, Ryder (OOPSLA 2001) Liang, Pennings, Harrold (PASTE 2001) Liang, Pennings, Harrold (PASTE 2001) Many others… Many others…  Concrete type inference CHA, RTA CHA, RTA Flow and context sensitivity, 0-CFA Flow and context sensitivity, 0-CFA

September 18, 2002SAS 2002Slide 44 Conclusion  Improved precision Field sensitivity Field sensitivity Local flow sensitivity Local flow sensitivity  Improved efficiency Reuse reachability cache across iterations Reuse reachability cache across iterations Minimize set-union operations Minimize set-union operations  Scales to the largest Java programs  A new baseline for Java pointers No reason to use a less precise analysis No reason to use a less precise analysis

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.

Similar presentations

Presentation on theme: "An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University.

Similar presentations

Presentation on theme: "An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. Lam Computer Systems Laboratory Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback