Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mark Marron IMDEA-Software (Madrid, Spain) 1.

Similar presentations


Presentation on theme: "Mark Marron IMDEA-Software (Madrid, Spain) 1."— Presentation transcript:

1 Mark Marron IMDEA-Software (Madrid, Spain) mark.marron@imdea.org 1

2 “Algorithms + Data Structures = Programs” Niklaus Wirth “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.” Fred Brooks 2

3  Understanding the heap is increasingly important to understanding the program Object Oriented Programming makes objects (data not code) central organizing principle of program Introduction of GC encourages extensive use of heap allocated structures  Support large class of client applications in IDE, compiler, and testing  Focus on scalable, manageable models/tools even at cost of overall expressivity/analytic power 3

4  Optimization Parallelization and Redundancy Elimination Memory Allocation and GC  Program Understanding Pre/Post Annotation Extraction Rich Class Invariant Computation  Testing Test Case Generation Debugger Visualizations 4

5  Model of the Program heap Represent key heap properties Amenable to efficient static analysis Understandable by non-experts  Static Analysis Scalable to module level Efficient (time/memory) to run interactively on developer machines  Dynamic Support Debugger hooks Hooks for specification checking Heap structure test case generation 5

6 Representation 6

7  Logical data structures (Regions) Group related sections of the heap Keep unrelated sections of the heap separate  Connectivity Inside and Between Regions Shape Reachability/Interference  Between variables  Arbitrary objects on heap  Important special case is uniqueness or aliasing of elements in collections/arrays 7

8  Object/Data Structure Identity Track objects (data structures) across time in program execution (how are structures rearranged)  Heap Based Use-Mod Given a stmt which memory locations may be used/mod at that line Given a method which memory locations may be used/mod in the body (frame or footprint)  Allocation and Escape Sites  Field Nullity and Collection Size/Iterators 8

9  Base on storage shape graph Nodes represent sets of objects (or data structures), edges represent sets of pointers Has natural representation for many of the properties we are interested in Easy to visualize and understand by non-experts Efficient and compact for computation  Disjointness properties are natural (and mostly free)  Majority of computation is local  Imprecision spread is minimal  Annotate nodes and edges with additional instrumentation properties 9

10  Key issue for shape graph approach is how to group concrete objects into abstract nodes Too many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures) Often done via allocation site or types  Solution: nodes are related sets of objects Recursive type information (recursive vs. non- recursive types) Objects stored in the same collection, array or structure 10

11 11

12 12

13  Most general way objects in a region are connected (S)ingleton: no pointers between any objects (L)ist: may contain a linear List or simpler structures (T)ree: may contain a Tree or simpler structures (C)ycle: may a cyclic or simpler structures  E.g. A region with a (T)ree layout may contain Tree, List or Singleton structures, but no Cyclic structures. 13

14 14

15  Edges abstract sets of references (variable references or pointers)  Heap graph has ability to track some sharing properties but insufficiently precise to model many important properties E.g. given an array of objects does any object appear multiple times?  May occur between references abstracted by same edge or two different edges Interference: abstracted by same edge Connectivity: abstracted by different edges 15

16  Does a single edge abstract only references with disjoint targets or may some of these references be aliased/related?  Edge e is: non-interfering: all pairs of references r 1, r 2 must refer to disjoint data structures. interfering: may exist a pair of references r 1, r 2 that refer to the same data structure. 16

17 17

18  Want to know how heap structure is changed between points in the program Often access paths are used to specify sets of objects but these quickly become very complicated  Shape graph is a partition of the heap Pick an arbitrary points in the program to use as the canonical partitions At each future program point track how partitioning has changed relative to canonical partition  Most natural (and generally useful) point is each method entry 18

19  Use the notion of Data Structure Identity Each node represents a set of objects Add tags to each node tracking the most recent stmt that  Read from a field in an object represented by the node  Wrote to a field in an object represented by the node  Can then compute stmt level and method level use/mod sets (footprints, frames) via Structure Identity Use/Mod tags in Nodes 19

20 20 int mangle(Pair p) { int res = p.first.val; p.second = p.first; p.first = null; return res; }

21  Track object allocation sites and escape sites similarly to tracking use/mod Allocation sites of the objects Escape sites of the objects  Nullity and collection information Per field nullity tag in each node For nodes that may represent collections (lists, sets, dictionaries) track size of collection (0, 1+, 0+) 21

22  N-Body simulation in 3-dimensions  Uses Fast Multi-Pole method with space decomposition tree For nearby bodies use naive n 2 algorithm For distant bodies compute center of mass of many bodies and treat as single point mass 22

23 23

24  Inline double[] into MathVector objects and unroll loops, 23% speedup 37% memory use reduction 24

25 Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root); 25

26  TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 26

27 Static Analysis 27

28  Computational Cost Flow Sensitivity Disjunctive Representations (powersets) Context Sensitivity  Extensive use of Collections and other standard libraries Need to deal with IO, Files, etc. Collections particularly important (wrt. heap analysis) and used extensively 28

29  Adopt existing work on Partially Disjunctive Domains (SAS 04) to help with flow and disjunctive costs  Our recent work (in submission) addresses the disjunctive and context sensitivity issues Calls take/return a model set of size 1 Virtual calls handled efficiently, e.g. do not result in explosion in set size Average number of contexts per method (in all benchmarks) is less than 2.2 All with near optimal precision (more later) 29

30  Most libraries are not interesting from the standpoint of userland code Don’t care how FileStream is implemented Just treat as uninterrupted heap region  Can analyze Collection implementations directly but Many use specialized highly optimized data structures, so very hard to model correctly Analysis is expensive and redundant  Use simple high level semantics for common collections (precise and efficient) 30

31  Prototype implemented in C++ analyzing Java source (timing results reported here)  Currently working on an implementation for CLI (C#/.net) bytecode Working on support for bytecodes generated by C# compiler (currently function pointers, references, exceptions not handled) Support key portions of System, Collections, and IO handled - plans to support GUI libs and threads Goal is module level analysis with other techniques for inter-module analysis 31

32 BenchmarkLOCAnalysis Time Analysis Mem RegionShapeShare tsp9100.03s<30MB100% 98% em3d11030.09s<30MB100% 99% voronoi13240.50s<30MB97%98% bh23040.72s<30MB100%96% db19850.68s<30MB100% 81% raytrace580915.5s38MB100%95%92% Exp3567152.3s48MB100% Interpreter15293114.8s122MB100%97%96% 32

33 33

34  Model of the Program heap Captures key properties for client applications Amenable to static analysis Intuitive and understandable by non-experts  Static Analysis Analyze 15KLoc in a few minutes and ~200MB (average 2 contexts per method) Precisely represent range of useful information  Dynamic Support Debugger hooks in place for C# and Visual Studio Other tool support in progress 34

35  Transform core concepts from prototype to robust tools Finish implementation of static analysis for CLI bytecode + core libraries (also runtime support) Export results to Visual Studio for inspection, spec. generation, or other tools Build strong foundation for other tools  Apply results in optimization, refactoring, testing, or error detection applications 35

36 36

37  Simple interpreter and debug environment for simple OO-Java like language  14,000+ Loc (in normalized form), 90 Classes Additional 1500 Loc for specialized standard library handling stubs  Large recursive call structures, large inheritance trees with numerous virtual method implementations  Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures 37

38  Track object allocation sites and escape sites similarly to tracking use/mod Add tags to each node  Allocation sites of the objects represented by the node  Escape sites of the objects represented by the node  Then based on Identity information and these Tags we can identify Where objects are allocated locally Which objects escape Which calls created objects that escaped into this call and which data structures these objects are in 38

39  Via Common Compiler Infrastructure (CCI) and reflection, rewrite bytecode with shadow memory Extract snapshot Apply already defined abstraction function  More precise than static analysis (less chance for user confusion) No error propagation issues Abstract Model represents a single heap (not a set) Does not have the overhead of static analysis 39

40  Experience with using model to understand heap state shows promise Has been used by a number of individuals outside the group with positive reports  Our implementation (done as a Visual Studio addin) can abstract nontrivial heaps 220K objects in 9s and approx. 300MB of Memory  Code is available on CodePlex... 40


Download ppt "Mark Marron IMDEA-Software (Madrid, Spain) 1."

Similar presentations


Ads by Google