Mark Marron, Mario Mendez-Lojo Manuel Hermenegildo, Darko Stefanovic, Deepak Kapur 1.

Slides:



Advertisements
Similar presentations
Instructor Notes Lecture discusses parallel implementation of a simple embarrassingly parallel nbody algorithm We aim to provide some correspondence between.
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Greedy Algorithms.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
Performance Visualizations using XML Representations Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
Lecture 8 – Collective Pattern Collectives Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Galois Performance Mario Mendez-Lojo Donald Nguyen.
1cs542g-term Sparse matrix data structure  Typically either Compressed Sparse Row (CSR) or Compressed Sparse Column (CSC) Informally “ia-ja” format.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Recap from last time Saw several examples of optimizations –Constant folding –Constant Prop –Copy Prop –Common Sub-expression Elim –Partial Redundancy.
Using Interfaces to Analyze Compositionality Haiyang Zheng and Rachel Zhou EE290N Class Project Presentation Dec. 10, 2004.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Improving Code Generation Honors Compilers April 16 th 2002.
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Important Problem Types and Fundamental Data Structures
DATA STRUCTURE Subject Code -14B11CI211.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Pointers (Continuation) 1. Data Pointer A pointer is a programming language data type whose value refers directly to ("points to") another value stored.
Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
Applications of Data-Structure
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
CPSC 320: Intermediate Algorithm Design & Analysis Greedy Algorithms and Graphs Steve Wolfman 1.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Computer Science: A Structured Programming Approach Using C Graphs A graph is a collection of nodes, called vertices, and a collection of segments,
Introduction to Code Generation and Intermediate Representations
P-Tree Implementation Anne Denton. So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
Data Flow Analysis for Software Prefetching Linked Data Structures in Java Brendon Cahoon Dept. of Computer Science University of Massachusetts Amherst,
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
1 Directed Graphs Chapter 8. 2 Objectives You will be able to: Say what a directed graph is. Describe two ways to represent a directed graph: Adjacency.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Code Optimization More Optimization Techniques. More Optimization Techniques  Loop optimization  Code motion  Strength reduction for induction variables.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
Harry Xu University of California, Irvine & Microsoft Research
Optimization Code Optimization ©SoftMoore Consulting.
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
Optimizing Transformations Hal Perkins Autumn 2011
Closure Representations in Higher-Order Programming Languages
Optimizing Transformations Hal Perkins Winter 2008
Compiler Code Optimizations
Pointer analysis.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Force Directed Placement: GPU Implementation
Presentation transcript:

Mark Marron, Mario Mendez-Lojo Manuel Hermenegildo, Darko Stefanovic, Deepak Kapur 1

 Want to optimize object-oriented programs which make use of pointer rich structures ◦ In an Array or Collection (e.g. java.util.List) are there any elements that appear multiple times? ◦ Differentiate structures like compiler AST with/without interned symbols --- backbone is tree with shared symbol objects or a pure tree 2

 Ability to answer these sharing questions enables application of many classic optimizations ◦ Thread Level Parallelization ◦ Redundancy Elimination ◦ Object co-location ◦ Vectorization, Loop Unroll Schedule 3

 Start with classic Abstract Heap Graph Model and add additional instrumentation relations ◦ Nodes represent sets of objects (or recursive data structures), edges represent sets of pointers ◦ Has natural representation for data structures and connectivity properties ◦ Naturally groups related sets of pointers ◦ Efficient to work with  Augment edges, which represent sets of pointers with additional information on the sharing relations between the pointers 4

5

 Region of the heap (O, P, P c ) ◦ O is a set of objects ◦ P is the set of the pointers between them ◦ P c the references that enter/exit the region  Given references r 1, r 2 in P c pointing to objects o 1, o 2 respectively we say: ◦ alias: o 1 == o 2 ◦ related: o 1 != o 2 but in same weakly-connected component ◦ unrelated: o 1 and o 2 in different weakly-connected components 6

7

8

 Edges abstract sets of references (variable references or pointers)  Introduce 2 related abstract properties to model sharing ◦ Interference: Does a single edge (which abstracts possible many references) abstract only references with disjoint targets or do some of these references alias/related? ◦ Connectivity: Do two edges abstract sets of references with disjoint targets or do some of these references alias/related? 9

 For a single edge how are the targets of the references it abstracts related  Edge e is: ◦ non-interfering: all pairs of references r 1, r 2 in γ(e) must be unrelated (there are none that alias or are related). ◦ interfering: all pairs of references r 1, r 2 in γ(e), may either be unrelated or related (there are none that alias). ◦ share: all pairs of references r 1, r 2 in γ(e), may be aliasing, unrelated or related. 10

11

 For two different edges how are the targets of the references they abstract related  Edges e 1, e 2 are: ◦ disjoint: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) are unrelated (there are none that alias or are related). ◦ connected: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) may either be unrelated or related (there are none that alias). ◦ share: all pairs of references r 1 in γ(e 1 ), r 2 in γ(e 2 ) may be aliasing, unrelated or related. 12

13

 N-Body Simulation in 3-dimensions  Uses Fast Multi-Pole method with space decomposition tree ◦ For nearby bodies use naive n 2 algorithm ◦ For distant bodies compute center of mass of many bodies and treat as single point mass  Dynamically Updates Space Decomposition Tree to Account for Body Motion  Has not been successfully analyzed with other existing shape analysis methods 14

15

 Inline Double[] into MathVector objects, 23% serial speedup 37% memory use reduction 16

 TLP update loop over bodyTabRev, factor 3.09 speedup on quad-core machine 17

18

BenchmarkLOCAnalysis Time bisort s mst s tsp s em3d s health s voronoi s power s bh s db s raytrace s 19

 Presented a practical abstraction for modeling sharing in programs  Allows us to accurately model how objects are stored arrays (or Collections from java.util)  This information can be usefully applied to compiler optimizations ◦ Thread-Level Parallelization ◦ Vectorization or Loop Unrolling ◦ Various memory locality optimizations 20

Demo of the (shape) analysis available at:

22