Mark Marron IMDEA-Software (Madrid, Spain) 1.

Slides:



Advertisements
Similar presentations
Pointers.
Advertisements

Symbol Table.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Part IV: Memory Management
Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Names and Bindings.
Dec 5, 2007University of Virginia1 Efficient Dynamic Tainting using Multiple Cores Yan Huang University of Virginia Dec
Program Representations. Representing programs Goals.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Reasons to study concepts of PL
CS 104 Introduction to Computer Science and Graphics Problems
Previous finals up on the web page use them as practice problems look at them early.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Mark Marron, Mario Mendez-Lojo Manuel Hermenegildo, Darko Stefanovic, Deepak Kapur 1.
Generative Programming. Automated Assembly Lines.
DATA STRUCTURE & ALGORITHMS (BCS 1223) NURUL HASLINDA NGAH SEMESTER /2014.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Model construction and verification for dynamic programming languages Radu Iosif
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
COMP3190: Principle of Programming Languages
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
PHP vs. Python. Similarities are interpreted, high level languages with dynamic typing are Open Source are supported by large developer communities are.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
Dynamic Representation and Prefetching of Linked Data Structures (or Introspection for Low-Level Data Prefetching) Mark Whitney & John Kubiatowicz ROC.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Advanced Computer Systems
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Data Structure Interview Question and Answers
Compiler Construction (CS-636)
Spring 2016 Program Analysis and Verification
Compositional Pointer and Escape Analysis for Java Programs
Seminar in automatic tools for analyzing programs with dynamic memory
Harry Xu University of California, Irvine & Microsoft Research
Capriccio – A Thread Model
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Binding Times Binding is an association between two things Examples:
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
COMP755 Advanced Operating Systems
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Mark Marron IMDEA-Software (Madrid, Spain) 1

“Algorithms + Data Structures = Programs” Niklaus Wirth “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.” Fred Brooks 2

 Understanding the heap is increasingly important to understanding the program Object Oriented Programming makes objects (data not code) central organizing principle of program Introduction of GC encourages extensive use of heap allocated structures  Support large class of client applications in IDE, compiler, and testing  Focus on scalable, manageable models/tools even at cost of overall expressivity/analytic power 3

 Optimization Parallelization and Redundancy Elimination Memory Allocation and GC  Program Understanding Pre/Post Annotation Extraction Rich Class Invariant Computation  Testing Test Case Generation Debugger Visualizations 4

 Model of the Program heap Represent key heap properties Amenable to efficient static analysis Understandable by non-experts  Static Analysis Scalable to module level Efficient (time/memory) to run interactively on developer machines  Dynamic Support Debugger hooks Hooks for specification checking Heap structure test case generation 5

Representation 6

 Logical data structures (Regions) Group related sections of the heap Keep unrelated sections of the heap separate  Connectivity Inside and Between Regions Shape Reachability/Interference  Between variables  Arbitrary objects on heap  Important special case is uniqueness or aliasing of elements in collections/arrays 7

 Object/Data Structure Identity Track objects (data structures) across time in program execution (how are structures rearranged)  Heap Based Use-Mod Given a stmt which memory locations may be used/mod at that line Given a method which memory locations may be used/mod in the body (frame or footprint)  Allocation and Escape Sites  Field Nullity and Collection Size/Iterators 8

 Base on storage shape graph Nodes represent sets of objects (or data structures), edges represent sets of pointers Has natural representation for many of the properties we are interested in Easy to visualize and understand by non-experts Efficient and compact for computation  Disjointness properties are natural (and mostly free)  Majority of computation is local  Imprecision spread is minimal  Annotate nodes and edges with additional instrumentation properties 9

 Key issue for shape graph approach is how to group concrete objects into abstract nodes Too many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures) Often done via allocation site or types  Solution: nodes are related sets of objects Recursive type information (recursive vs. non- recursive types) Objects stored in the same collection, array or structure 10

11

12

 Most general way objects in a region are connected (S)ingleton: no pointers between any objects (L)ist: may contain a linear List or simpler structures (T)ree: may contain a Tree or simpler structures (C)ycle: may a cyclic or simpler structures  E.g. A region with a (T)ree layout may contain Tree, List or Singleton structures, but no Cyclic structures. 13

14

 Edges abstract sets of references (variable references or pointers)  Heap graph has ability to track some sharing properties but insufficiently precise to model many important properties E.g. given an array of objects does any object appear multiple times?  May occur between references abstracted by same edge or two different edges Interference: abstracted by same edge Connectivity: abstracted by different edges 15

 Does a single edge abstract only references with disjoint targets or may some of these references be aliased/related?  Edge e is: non-interfering: all pairs of references r 1, r 2 must refer to disjoint data structures. interfering: may exist a pair of references r 1, r 2 that refer to the same data structure. 16

17

 Want to know how heap structure is changed between points in the program Often access paths are used to specify sets of objects but these quickly become very complicated  Shape graph is a partition of the heap Pick an arbitrary points in the program to use as the canonical partitions At each future program point track how partitioning has changed relative to canonical partition  Most natural (and generally useful) point is each method entry 18

 Use the notion of Data Structure Identity Each node represents a set of objects Add tags to each node tracking the most recent stmt that  Read from a field in an object represented by the node  Wrote to a field in an object represented by the node  Can then compute stmt level and method level use/mod sets (footprints, frames) via Structure Identity Use/Mod tags in Nodes 19

20 int mangle(Pair p) { int res = p.first.val; p.second = p.first; p.first = null; return res; }

 Track object allocation sites and escape sites similarly to tracking use/mod Allocation sites of the objects Escape sites of the objects  Nullity and collection information Per field nullity tag in each node For nodes that may represent collections (lists, sets, dictionaries) track size of collection (0, 1+, 0+) 21

 N-Body simulation in 3-dimensions  Uses Fast Multi-Pole method with space decomposition tree For nearby bodies use naive n 2 algorithm For distant bodies compute center of mass of many bodies and treat as single point mass 22

23

 Inline double[] into MathVector objects and unroll loops, 23% speedup 37% memory use reduction 24

Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root); 25

 TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 26

Static Analysis 27

 Computational Cost Flow Sensitivity Disjunctive Representations (powersets) Context Sensitivity  Extensive use of Collections and other standard libraries Need to deal with IO, Files, etc. Collections particularly important (wrt. heap analysis) and used extensively 28

 Adopt existing work on Partially Disjunctive Domains (SAS 04) to help with flow and disjunctive costs  Our recent work (in submission) addresses the disjunctive and context sensitivity issues Calls take/return a model set of size 1 Virtual calls handled efficiently, e.g. do not result in explosion in set size Average number of contexts per method (in all benchmarks) is less than 2.2 All with near optimal precision (more later) 29

 Most libraries are not interesting from the standpoint of userland code Don’t care how FileStream is implemented Just treat as uninterrupted heap region  Can analyze Collection implementations directly but Many use specialized highly optimized data structures, so very hard to model correctly Analysis is expensive and redundant  Use simple high level semantics for common collections (precise and efficient) 30

 Prototype implemented in C++ analyzing Java source (timing results reported here)  Currently working on an implementation for CLI (C#/.net) bytecode Working on support for bytecodes generated by C# compiler (currently function pointers, references, exceptions not handled) Support key portions of System, Collections, and IO handled - plans to support GUI libs and threads Goal is module level analysis with other techniques for inter-module analysis 31

BenchmarkLOCAnalysis Time Analysis Mem RegionShapeShare tsp s<30MB100% 98% em3d s<30MB100% 99% voronoi s<30MB97%98% bh s<30MB100%96% db s<30MB100% 81% raytrace s38MB100%95%92% Exp s48MB100% Interpreter s122MB100%97%96% 32

33

 Model of the Program heap Captures key properties for client applications Amenable to static analysis Intuitive and understandable by non-experts  Static Analysis Analyze 15KLoc in a few minutes and ~200MB (average 2 contexts per method) Precisely represent range of useful information  Dynamic Support Debugger hooks in place for C# and Visual Studio Other tool support in progress 34

 Transform core concepts from prototype to robust tools Finish implementation of static analysis for CLI bytecode + core libraries (also runtime support) Export results to Visual Studio for inspection, spec. generation, or other tools Build strong foundation for other tools  Apply results in optimization, refactoring, testing, or error detection applications 35

36

 Simple interpreter and debug environment for simple OO-Java like language  14,000+ Loc (in normalized form), 90 Classes Additional 1500 Loc for specialized standard library handling stubs  Large recursive call structures, large inheritance trees with numerous virtual method implementations  Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures 37

 Track object allocation sites and escape sites similarly to tracking use/mod Add tags to each node  Allocation sites of the objects represented by the node  Escape sites of the objects represented by the node  Then based on Identity information and these Tags we can identify Where objects are allocated locally Which objects escape Which calls created objects that escaped into this call and which data structures these objects are in 38

 Via Common Compiler Infrastructure (CCI) and reflection, rewrite bytecode with shadow memory Extract snapshot Apply already defined abstraction function  More precise than static analysis (less chance for user confusion) No error propagation issues Abstract Model represents a single heap (not a set) Does not have the overhead of static analysis 39

 Experience with using model to understand heap state shows promise Has been used by a number of individuals outside the group with positive reports  Our implementation (done as a Visual Studio addin) can abstract nontrivial heaps 220K objects in 9s and approx. 300MB of Memory  Code is available on CodePlex... 40