PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND LARGE LIBRARIES Mariana Sharp Adviser: Prof. Atanas Rountev Committee: Prof. Paul Sivilotti, Prof. Neelam Soundarajan

Introduction  Static program analysis infers properties of program behavior based on code structure and semantics  Analyses considered in our research: - Points-to analysis - Type analysis - Side-effect analysis - Dependence analysis  Static analysis techniques used in: - performance optimization - program understanding and maintenance - software testing - verification of program properties

Two Challenges for Analysis of Modern Java Software  Distributed Java applications - Traditional algorithms designed for non-distributed software only - Distributed Java applications using RMI have complex semantics not modeled by existing static analysis  Large-scale applications - Existing algorithms designed to operate on whole homogeneous programs  Start from scratch at each analysis execution  Library code analyzed together with the application code (90% of methods are in libraries) - Potential scalability problems limit the practical use of the analyses for real-world Java applications

Contributions  Theoretical model for the analysis of distributed Java applications - Extension of points-to analysis and side-effect analysis for RMI-based applications - Experimental evaluation on a collection of RMI programs shows practical cost and high precision of technique  Incremental approach for analyzing Java applications built with reusable components - Use of precomputed summaries for reusable components as input to the analysis of client code - Summary generation algorithms for type analysis and dependence analysis - Experimental evaluation on a collection of Java programs shows dramatic savings compared to whole-program analysis

Outline  Static Analysis of Object References in RMI-based Java Software  Type Analysis in the Presence of Large Libraries  Dependence Analysis in the Presence of Large Libraries

Static Analysis of Object References in RMI-based Java Software  Points-to analysis: which objects may variable x refer to? - Builds a points-to graph  Side-effect analysis: which objects could be modified by a statement s? - Based on points-to information

Analysis in the presence of RMI calls  Java Remote Method Invocation (RMI) - There is a set of components C 1, C 2, …, C n  Each runs on a different VM - References to remote objects cross JVM boundaries - Methods declared in remote interfaces can be invoked remotely - Parameter passing for remote calls involves object serialization  Entire graphs of serialized objects are copied over

Java RMI  Example of a remote class: interface Listener extends java.rmi.Remote update { void update(Event b); } class MyListener implements Listener extends … update { void update(Event b) { … } }  Parameter passing in the remote calls affects the points-to information - Remote references - Passing of serialized objects

Passing a Remote Reference (Points-to Relations) o channel o listener f g p Component 1 Component 2 void add( Listener p ) {... } Channel f = (Channel) Naming.lookup(...); Listener g = new MyListener(); add f.add(g); (remote)

Passing a Serialized Object (Points-to Relations) o channel o event f e Component 1 Component 2 copy of o event p void notify( Event p ) {... } Event e = new Event(); notify f.notify(e); (remote) (serializable)

Analysis algorithm  The result is a Pointer Assignment Graph (PAG) - Nodes are variables and object fields  Nodes have points-to sets attached - Edges represent flow of values  Each node has a local points-to set Pt L and a remote points-to set Pt R  v1 = v2 in C i : creates edge node( v2 i ) → node( v1 i ) - The edge represents following relationships: Pt L ( v2 i )  Pt L ( v1 i ) Pt R ( v2 i )  Pt R ( v1 i )

Rules for Handling Statements

Passing a Remote Reference (PAG) o channel o listener f g p Component 1 Component 2 void add( Listener p ) {... } Channel f = (Channel) Naming.lookup(...); Listener g = new MyListener(); add f.add(g); Pt R (f) = {o channel } Pt L (g) = {o listener } Pt R (p) = {o listener } (remote)

Passing a Serialized Object (PAG) o channel o event f e Component 1 Component 2 copy of o event p void notify( Event p ) {... } Event e = new Event(); notify f.notify(e); Pt R (f) = {o channel } Pt L (e) = {o event } Pt L (p) = {copy of o event } (remote) (serializable)

Experimental results  11 RMI applications - Between 12 and 125 methods - Analysis includes ~7000 library methods  Implementation - Generalized the points-to analysis in the Soot analysis framework  Analysis running time - 2.8 GHz PC with 3 GB memory - Time: 5-6 minutes per application - Special handling of libraries: standard libraries are not replicated across components

Experimental results Passing of remote references and serialization are common High precision of the call graph at remote calls All remote call sites with serialization: used acyclic and uniquely-typed object graphs

Static Analysis for RMI-based Software: Conclusions  Existing formalisms generalized to handle RMI features  Key points: - Two separate points-to sets per variable - Remote PAG edges - Propagation of references to deserialized copy objects  Precise and practical choice for the analysis of RMI applications  Approach can be generalized to side-effect analysis - Determine inter-component data dependencies

Type Analysis in the Presence of Large Libraries  Type analysis: what are the types of objects variable x may refer to? - A form of points-to analysis - Dynamically allocated objects are represented with one abstract object per class  Libraries represent a large part of the code - Roughly 93% of methods in benchmark programs are in the standard Java libraries  Summary-based analysis: the code is split into a user component and a library component. Two steps: - Computing the information about the library - Running the analysis that uses this information

Summary Representation  We restate the analysis in terms of graph representations and operations - Based on the formulation of Interprocedural Distributive Environment (IDE) problems - Compact representation of dataflow functions - Efficient operations of functional meet and functional composition  Dataflow function: encoded by a graph that represents the flow of data in a method - Nodes represent variables and object types - Edges:  Object type to variable: represents a type relation  Variable to variable: represents object flow

Summary Generation Type Analysis  Input: the code of the library classes  Output: for each method - Dataflow graph of the method - Information about the call sites occurring in the method  Algorithm with 3 steps: - Step 1: Computing dataflow functions for library methods  No calls - Step 2: Computing the closure of each dataflow function  Type relationships due to transitivity - Step 3: Minimizing dataflow functions  Eliminating edges that have only intraprocedural effect

public class MyClass { private String[] names; String replaceName(String) { MyClass r0; String r1, r3; String[] r2; r0 := this; r1 := param0; r2 := r0.names; r2[0] := r1; r3 := r1; r4 := new String; specialinvoke r4. ("New name:"); r5 := r4; r6 := virtualinvoke r5.concat(r3); return r6; } } r6return r5r4 String r3 r2names r1 param0 this Code and Dataflow Function for Sample Method array_elem r0

Experimental Study: Generating the Summary  Input of the summary generation: classes from Java standard libraries from J2SE 1.4.2 - Packages java., javax., com., COM., org., and sun. (10238 classes, 77190 methods, 1496003 statements)  Output: file that stores the summary representation of these classes, methods, and the corresponding dataflow functions - Summary file size: 12.2 MB

Study: Summary-based Type Analysis  Summary-based type analysis implemented with the Soot 2.2.2 framework (Spark)  Jimple representation used for classes that belong to the user component of the analyzed program  For library classes, a representation is obtained from the summary  Experimental study with 20 Java programs

Comparison of the Whole-program Analysis and the Summary-based Analysis (Running Time)

Comparison of the Whole-program Analysis and the Summary-based Analysis (Memory Usage)

Experimental Study for Summary-Based Type Analysis  Compared to its whole-program counterpart, summary-based type analysis can achieve significant savings of running time and memory usage - For all experimental subjects, the running time reduction was at least 55%, with average running time reduction of 70%  Savings come from avoiding the cost of reading the library code and building its Jimple representation, as well as the type propagation in library methods

Dependence Analysis in the Presence of Large Libraries  Dependence analysis: two kinds of dependencies: - Control dependencies: relationships between conditional expressions and the statements guarded by them. - Data dependencies: represent the flow of values from one statement to another due to writes and reads of shared memory locations.  Our focus: - Analysis of data dependencies  Data dependencies between the formals and the return of a method  Dependence analysis based on an earlier type analysis - Addressing the problem of reducing analysis cost for applications built with large libraries.

 Step 1: Intraprocedural reaching definitions analysis - Computes a set of reaching definitions for each statement inside a method body - Intraprocedural def-use analysis calculates direct dependencies between the statements of a method  Does not consider the effects of calls - Output: “reduced” CFG  Step 2: Intraprocedural Dependence Analysis - Transitive dependencies are computed - Calls are not considered  Step 3: Interprocedural Dependence Analysis - Bottom-up traversal of the call graph - For each call site the callee's dependence information is inlined into the caller's dependence information Whole-Program Dependence Analysis Algorithm

Example: Reduced CFG for Sample Method r2 := r0.names r6 := virtualinvoke r5.concat(r3) r0 := this r4 := new String r5 := r4specialinvoke r4...r3 := r1 return r6 r2[0] := r1 r1 := param0 r1, N6 r3, N6 r6, N9 r0, N1 r5 r4 N1N2 N6 N8N7N5N4 N9 N10 N3 r1, N6

Summary Generation  Summary information - Only dependencies that may ultimately affect the return value of a method are considered - Dependencies related to the return value of a method and the call sites - Dependence pairs  Summary Generation Algorithm - Two phases that are similar to the first two phases of the whole-program dependence analysis - A third phase with summary optimizations

Example: r2 := r0.names r6 := virtualinvoke r5.concat(r3) r0 := this r4 := new String r5 := r4specialinvoke r4...r3 := r1 return r6 r2[0] := r1 r1 := param0 r1, N6 r3, N6 r6, N9 r0, N1 r5 r4 N1N2 N6 N8N7N5N4 N9 N10 N3 r1, N6

Summary Optimizations  Definitions: - Fixed call: call with exactly one target method - Fixed method: method that either does not make any calls, or it has only fixed calls to fixed methods  Optimization 1: Inlining all fixed methods (into callers that can be both fixed and non-fixed)  Definition: - Method with known callers: all the callers of such a method can be determined when the summary is built  Cannot be called by future user code  Optimization 2: Completely removing from the summary all fixed methods that have known callers

Experimental Study  Generating the summary: - Improvements after the first optimization:  number of calls in the library reduced by 36.9%,  number of dependence pairs reduced by 16.2% - Improvements after the second optimization:  number of dependence pairs resulted from first optimization reduced by 16.8% - File size on disk: 14.4 MB  2.2 MB of dependence information  Summary-based analysis - Same 20 benchmarks as for type analysis

Experimental Study: Summary-based Analysis  Baseline analysis: - Uses artificial summary with empty dependency information - Represents a limit in what time reduction can be achieved  The average reduction in time is 79.78% in the summary-based version

Experimental Study: Summary-based Analysis  The memory usage is reduced on average by 96.93%

Conclusions  Limitations of the traditional model of whole- program data-flow analysis - Whole-program analysis cannot be applied to distributed software - Whole-program analysis does not scale to very large programs  Solutions to address these limitations - Theoretical model for points-to analysis of distributed Java applications - Analysis approach which employs precomputed library summary information

Future Work  Static Analysis for RMI Software - Various flow- and context-sensitive points-to analyses - RMI generalizations for other analyses  Summary-based static analyses - Summary-based algorithms for other dataflow analyses - Systems built with multiple library components

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.

Similar presentations

Presentation on theme: "PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.

Similar presentations

Presentation on theme: "PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND."— Presentation transcript:

Similar presentations

About project

Feedback