Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mark Marron IMDEA-Software (Madrid, Spain) 1.

Similar presentations


Presentation on theme: "Mark Marron IMDEA-Software (Madrid, Spain) 1."— Presentation transcript:

1 Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org 1

2  Many optimization and software engineering applications utilize heap information Optimization  Parallelization  Memory management Software Engineering and Debugging  Interactive debugging of heap data structures  Tainted or Information Flow analysis through heap objects  Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis) 2

3  General purpose model capable of supporting these applications: Must model the range of fundamental properties needed by our target application domains Cannot place significant restrictions on the program being analyzed Must be computationally efficient and compact Willing to sacrifice some precision  Our focus is on providing general classes of information for others to build on 3

4  Connectivity Reachability Interference Paths  Logical data structures (Regions) Group related sections of the heap Keep unrelated sections of the heap separate  Shape of a region Cycle, Dag, Tree, List, Singleton 4

5  Identity Given an object at point p, track the flow of this object at all later program points q  Heap Based Use-Mod Find all program points a given memory location may be read/written at  Escape Objects that are freshly allocated Objects that escape the local call context 5

6  The theory of Abstract Interpretation provides framework for static program analysis Takes a lattice (set) of abstract models, each of which represents a set of concrete program states Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point 6

7  A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: Debugging Specification mining/checking  Given a snapshot of the single current program heap compute the corresponding abstract model 7

8  Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io)  Precisely model (in static and dynamic analyses) the properties of interest  Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date)  Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps) 8

9  Base on storage shape graph Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers Has natural representation for many of the properties we are interested in Easy to visualize Efficient to compute with  Annotate nodes and edges with additional instrumentation properties 9

10  Key issue in shape graph is how to pick nodes that abstract concrete objects Too many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures) Often done via allocation site or types  Solution: nodes are related sets of objects Recursive type information (recursive vs. non- recursive types) Objects stored in the same collection, array or structure 10

11 11

12 12

13  Most general way objects in a region are connected (S)ingleton: no pointers between any objects (L)ist: may contain a linear List or simpler structures (T)ree: may contain a Tree or simpler structures (D)ag: may contain a Dag or simpler structures (C)ycle: may a cyclic or simpler structures  E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures. 13

14 14

15  Edges abstract sets of references (variable references or pointers)  Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties E.g. given an array of objects does any object appear multiple times?  May occur between references abstracted by same edge or two different edges Interference: abstracted by same edge Connectivity: abstracted by different edges 15

16  Does a single edge abstract only references with disjoint targets or may some of these references alias/related?  Edge e is: non-interfering: all pairs of references r 1, r 2 in γ(e) must be unrelated (refer to disjoint data structures). interfering: may be a pair of references r 1, r 2 in γ(e) that are related (refer to the same data structure). 16

17 17

18  Connectivity: Do two edges abstract sets of references with disjoint targets or do some of these references alias/related?  Edges e 1, e 2 are: disjoint: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) are unrelated (refer to disjoint data structures). connected: may be pair of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) that are related (refer to the same data structure). 18

19 19

20  Object Identity Across each method call track how data structures are split, merged, reconnected  Field Sensitive Use/Mod For each method track the fields for the objects in each region (node) and if the field is used/modified in the method At each line track which regions (nodes) and fields may be used modified  Object Allocation Track which objects are allocated in this scope and which may escape 20

21 21 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }

22 22

23  N-Body simulation in 3-dimensions  Uses Fast Multi-Pole method with space decomposition tree For nearby bodies use naive n 2 algorithm For distant bodies compute center of mass of many bodies and treat as single point mass  Updates space decomposition tree to account for body motion  Has not been analyzed with other existing (precise) heap analysis methods 23

24 24

25  Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction 25

26 Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root); 26

27  TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 27

28 BenchmarkLOCAnalysis Time Analysis Mem ShapeSharingUse/Mod* tsp9100.03s<30MB100%98%Y em3d11030.09s<30MB100% Y voronoi13240.50s<30MB98%97%Y bh23040.72s<30MB94%96%Y db19850.68s<30MB100%82%Y raytrace580915.5s38MB98%92%Y Exp3567152.3s48MB100% Y Interpreter15293114.8s122MB97%86%Y 28

29 BenchmarkObjectsNodesTime bisort~10050.01s bisort~1700051.20s tsp~25040.12s tsp~800041.62s health~300200.01s health~5000201.35s exp~70120.02s exp~4000121.55s 29

30  Have the core of a practical analysis system Performance:  Analyze moderate size non-trivial Java programs  15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method)  Debugging abstraction efficiently compresses large heaps to compact abstract representation Accuracy:  Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information Qualitatively Useful  Used results in multiple optimization domains and in debugging applications 30

31  Currently working on transforming core concepts from prototype to robust tools Implementing static analysis for MSIL bytecode + core libraries Implementing full featured debugger support and specification mining (for both MSIL and Java)  Enrich the model Wider range of properties (what is useful in general) Allow user to easily extend with new properties  Apply information in more client applications Additional optimization domains Support for programmer assisted refactorings 31

32 32

33  Simple interpreter and debug environment for large subset of Java language  14,000+ Loc (in normalized form), 90 Classes Additional 1500 Loc for specialized standard library handling stubs  Large recursive call structures, large inheritance trees with numerous virtual method implementations  Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures 33


Download ppt "Mark Marron IMDEA-Software (Madrid, Spain) 1."

Similar presentations


Ads by Google