Download presentation
Presentation is loading. Please wait.
1
Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org 1
2
Many optimization and software engineering applications utilize heap information Optimization Parallelization Memory management Software Engineering and Debugging Interactive debugging of heap data structures Tainted or Information Flow analysis through heap objects Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis) 2
3
General purpose model capable of supporting these applications: Must model the range of fundamental properties needed by our target application domains Cannot place significant restrictions on the program being analyzed Must be computationally efficient and compact Willing to sacrifice some precision Our focus is on providing general classes of information for others to build on 3
4
Connectivity Reachability Interference Paths Logical data structures (Regions) Group related sections of the heap Keep unrelated sections of the heap separate Shape of a region Cycle, Dag, Tree, List, Singleton 4
5
Identity Given an object at point p, track the flow of this object at all later program points q Heap Based Use-Mod Find all program points a given memory location may be read/written at Escape Objects that are freshly allocated Objects that escape the local call context 5
6
The theory of Abstract Interpretation provides framework for static program analysis Takes a lattice (set) of abstract models, each of which represents a set of concrete program states Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point 6
7
A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: Debugging Specification mining/checking Given a snapshot of the single current program heap compute the corresponding abstract model 7
8
Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io) Precisely model (in static and dynamic analyses) the properties of interest Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date) Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps) 8
9
Base on storage shape graph Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers Has natural representation for many of the properties we are interested in Easy to visualize Efficient to compute with Annotate nodes and edges with additional instrumentation properties 9
10
Key issue in shape graph is how to pick nodes that abstract concrete objects Too many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures) Often done via allocation site or types Solution: nodes are related sets of objects Recursive type information (recursive vs. non- recursive types) Objects stored in the same collection, array or structure 10
11
11
12
12
13
Most general way objects in a region are connected (S)ingleton: no pointers between any objects (L)ist: may contain a linear List or simpler structures (T)ree: may contain a Tree or simpler structures (D)ag: may contain a Dag or simpler structures (C)ycle: may a cyclic or simpler structures E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures. 13
14
14
15
Edges abstract sets of references (variable references or pointers) Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties E.g. given an array of objects does any object appear multiple times? May occur between references abstracted by same edge or two different edges Interference: abstracted by same edge Connectivity: abstracted by different edges 15
16
Does a single edge abstract only references with disjoint targets or may some of these references alias/related? Edge e is: non-interfering: all pairs of references r 1, r 2 in γ(e) must be unrelated (refer to disjoint data structures). interfering: may be a pair of references r 1, r 2 in γ(e) that are related (refer to the same data structure). 16
17
17
18
Connectivity: Do two edges abstract sets of references with disjoint targets or do some of these references alias/related? Edges e 1, e 2 are: disjoint: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) are unrelated (refer to disjoint data structures). connected: may be pair of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) that are related (refer to the same data structure). 18
19
19
20
Object Identity Across each method call track how data structures are split, merged, reconnected Field Sensitive Use/Mod For each method track the fields for the objects in each region (node) and if the field is used/modified in the method At each line track which regions (nodes) and fields may be used modified Object Allocation Track which objects are allocated in this scope and which may escape 20
21
21 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }
22
22
23
N-Body simulation in 3-dimensions Uses Fast Multi-Pole method with space decomposition tree For nearby bodies use naive n 2 algorithm For distant bodies compute center of mass of many bodies and treat as single point mass Updates space decomposition tree to account for body motion Has not been analyzed with other existing (precise) heap analysis methods 23
24
24
25
Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction 25
26
Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root); 26
27
TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 27
28
BenchmarkLOCAnalysis Time Analysis Mem ShapeSharingUse/Mod* tsp9100.03s<30MB100%98%Y em3d11030.09s<30MB100% Y voronoi13240.50s<30MB98%97%Y bh23040.72s<30MB94%96%Y db19850.68s<30MB100%82%Y raytrace580915.5s38MB98%92%Y Exp3567152.3s48MB100% Y Interpreter15293114.8s122MB97%86%Y 28
29
BenchmarkObjectsNodesTime bisort~10050.01s bisort~1700051.20s tsp~25040.12s tsp~800041.62s health~300200.01s health~5000201.35s exp~70120.02s exp~4000121.55s 29
30
Have the core of a practical analysis system Performance: Analyze moderate size non-trivial Java programs 15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method) Debugging abstraction efficiently compresses large heaps to compact abstract representation Accuracy: Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information Qualitatively Useful Used results in multiple optimization domains and in debugging applications 30
31
Currently working on transforming core concepts from prototype to robust tools Implementing static analysis for MSIL bytecode + core libraries Implementing full featured debugger support and specification mining (for both MSIL and Java) Enrich the model Wider range of properties (what is useful in general) Allow user to easily extend with new properties Apply information in more client applications Additional optimization domains Support for programmer assisted refactorings 31
32
32
33
Simple interpreter and debug environment for large subset of Java language 14,000+ Loc (in normalized form), 90 Classes Additional 1500 Loc for specialized standard library handling stubs Large recursive call structures, large inheritance trees with numerous virtual method implementations Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.