Mark Marron IMDEA-Software (Madrid, Spain) 1.

Slides:



Advertisements
Similar presentations
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Programming Paradigms and languages
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Names and Bindings.
Program Representations. Representing programs Goals.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
EE663 Image Processing Edge Detection 5 Dr. Samir H. Abdul-Jauwad Electrical Engineering Department King Fahd University of Petroleum & Minerals.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Pointer and Shape Analysis Seminar Context-sensitive points-to analysis: is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Feedback: Keep, Quit, Start
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Interprocedural analyses and optimizations. Costs of procedure calls Up until now, we treated calls conservatively: –make the flow function for call nodes.
Reasons to study concepts of PL
Previous finals up on the web page use them as practice problems look at them early.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
From last time: Inlining pros and cons Pros –eliminate overhead of call/return sequence –eliminate overhead of passing args & returning results –can optimize.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Data Structures and Programming.  John Edgar2.
CIT241 Prerequisite Knowledge ◦ Variables ◦ Operators ◦ C++ Syntax ◦ Program Structure ◦ Classes  Basic Structure of a class  Concept of Data Hiding.
Precision Going back to constant prop, in what cases would we lose precision?
Programming Languages and Paradigms Object-Oriented Programming.
ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Putting Pointer Analysis to Work Rakesh Ghiya and Laurie J. Hendren Presented by Shey Liggett & Jason Bartkowiak.
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
Mark Marron, Mario Mendez-Lojo Manuel Hermenegildo, Darko Stefanovic, Deepak Kapur 1.
CS212: DATA STRUCTURES Lecture 1: Introduction. What is this course is about ?  Data structures : conceptual and concrete ways to organize data for efficient.
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Model construction and verification for dynamic programming languages Radu Iosif
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Lecture by: Prof. Pooja Vaishnav.  Language Processor implementations are highly influenced by the kind of storage structure used for program variables.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
 Programming - the process of creating computer programs.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Design-Directed Programming Martin Rinard Daniel Jackson MIT Laboratory for Computer Science.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
1PLDI 2000 Off-line Variable Substitution for Scaling Points-to Analysis Atanas (Nasko) Rountev PROLANGS Group Rutgers University Satish Chandra Bell Labs.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Advanced Computer Systems
Spring 2016 Program Analysis and Verification
Compositional Pointer and Escape Analysis for Java Programs
Harry Xu University of California, Irvine & Microsoft Research
GENERAL VIEW OF KRATOS MULTIPHYSICS
Binding Times Binding is an association between two things Examples:
Pointer analysis.
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Mark Marron IMDEA-Software (Madrid, Spain) 1

 Many optimization and software engineering applications utilize heap information Optimization  Parallelization  Memory management Software Engineering and Debugging  Interactive debugging of heap data structures  Tainted or Information Flow analysis through heap objects  Existing heap analysis techniques often inapplicable due to imprecision (points-to style) or computational cost (shape analysis) 2

 General purpose model capable of supporting these applications: Must model the range of fundamental properties needed by our target application domains Cannot place significant restrictions on the program being analyzed Must be computationally efficient and compact Willing to sacrifice some precision  Our focus is on providing general classes of information for others to build on 3

 Connectivity Reachability Interference Paths  Logical data structures (Regions) Group related sections of the heap Keep unrelated sections of the heap separate  Shape of a region Cycle, Dag, Tree, List, Singleton 4

 Identity Given an object at point p, track the flow of this object at all later program points q  Heap Based Use-Mod Find all program points a given memory location may be read/written at  Escape Objects that are freshly allocated Objects that escape the local call context 5

 The theory of Abstract Interpretation provides framework for static program analysis Takes a lattice (set) of abstract models, each of which represents a set of concrete program states Computes, for each program point, an abstract model that represents all possible heap states that may occur at the program point 6

 A surprising benefit of building a model suitable for abstract interpretation that is the model also works for dynamic analysis: Debugging Specification mining/checking  Given a snapshot of the single current program heap compute the corresponding abstract model 7

 Handle a large fragment of Java 1.5 and commonly used libraries (lang, util, io)  Precisely model (in static and dynamic analyses) the properties of interest  Can efficiently (on the order of seconds) statically analyze moderate sized programs (~15KLOC to date)  Have simple implementation of debugger and specification miner (a few seconds to compute models of Multi-MB heaps) 8

 Base on storage shape graph Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers Has natural representation for many of the properties we are interested in Easy to visualize Efficient to compute with  Annotate nodes and edges with additional instrumentation properties 9

 Key issue in shape graph is how to pick nodes that abstract concrete objects Too many nodes is confusing and computationally expensive Too few nodes leads to imprecision (as a single node must represent multiple logical structures) Often done via allocation site or types  Solution: nodes are related sets of objects Recursive type information (recursive vs. non- recursive types) Objects stored in the same collection, array or structure 10

11

12

 Most general way objects in a region are connected (S)ingleton: no pointers between any objects (L)ist: may contain a linear List or simpler structures (T)ree: may contain a Tree or simpler structures (D)ag: may contain a Dag or simpler structures (C)ycle: may a cyclic or simpler structures  E.g. A region with a (T)ree layout may contain tree, list or singleton structures, but no dag or cyclic structures. 13

14

 Edges abstract sets of references (variable references or pointers)  Heap Graph has ability to track some sharing properties but insufficiently precise to model many important properties E.g. given an array of objects does any object appear multiple times?  May occur between references abstracted by same edge or two different edges Interference: abstracted by same edge Connectivity: abstracted by different edges 15

 Does a single edge abstract only references with disjoint targets or may some of these references alias/related?  Edge e is: non-interfering: all pairs of references r 1, r 2 in γ(e) must be unrelated (refer to disjoint data structures). interfering: may be a pair of references r 1, r 2 in γ(e) that are related (refer to the same data structure). 16

17

 Connectivity: Do two edges abstract sets of references with disjoint targets or do some of these references alias/related?  Edges e 1, e 2 are: disjoint: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) are unrelated (refer to disjoint data structures). connected: may be pair of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) that are related (refer to the same data structure). 18

19

 Object Identity Across each method call track how data structures are split, merged, reconnected  Field Sensitive Use/Mod For each method track the fields for the objects in each region (node) and if the field is used/modified in the method At each line track which regions (nodes) and fields may be used modified  Object Allocation Track which objects are allocated in this scope and which may escape 20

21 1 void swap(Pair p) { 2 Data temp = p.first; 3 p.first = p.second; 4 p.second = temp; 5 }

22

 N-Body simulation in 3-dimensions  Uses Fast Multi-Pole method with space decomposition tree For nearby bodies use naive n 2 algorithm For distant bodies compute center of mass of many bodies and treat as single point mass  Updates space decomposition tree to account for body motion  Has not been analyzed with other existing (precise) heap analysis methods 23

24

 Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction 25

Iterator b = this.bodyTabRev.iterator(); while(b.hasNext()) ((Body) b.next()).hackGravity(rsize, root); 26

 TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 27

BenchmarkLOCAnalysis Time Analysis Mem ShapeSharingUse/Mod* tsp s<30MB100%98%Y em3d s<30MB100% Y voronoi s<30MB98%97%Y bh s<30MB94%96%Y db s<30MB100%82%Y raytrace s38MB98%92%Y Exp s48MB100% Y Interpreter s122MB97%86%Y 28

BenchmarkObjectsNodesTime bisort~ s bisort~ s tsp~ s tsp~ s health~ s health~ s exp~ s exp~ s 29

 Have the core of a practical analysis system Performance:  Analyze moderate size non-trivial Java programs  15KLoc programs in a 114 seconds using ~120MB of memory (average 2 contexts per method)  Debugging abstraction efficiently compresses large heaps to compact abstract representation Accuracy:  Precisely represent connectivity, sharing, shape properties + region, frame, and dependence information Qualitatively Useful  Used results in multiple optimization domains and in debugging applications 30

 Currently working on transforming core concepts from prototype to robust tools Implementing static analysis for MSIL bytecode + core libraries Implementing full featured debugger support and specification mining (for both MSIL and Java)  Enrich the model Wider range of properties (what is useful in general) Allow user to easily extend with new properties  Apply information in more client applications Additional optimization domains Support for programmer assisted refactorings 31

32

 Simple interpreter and debug environment for large subset of Java language  14,000+ Loc (in normalized form), 90 Classes Additional 1500 Loc for specialized standard library handling stubs  Large recursive call structures, large inheritance trees with numerous virtual method implementations  Wide range of data structure types, extensive use of java.util collections, uses both shared and unshared structures 33