Experiences with Enumeration of Integer Projections of Parametric Polytopes Sven Verdoolaege, Kristof Beyls, Maurice Bruynooghe, Francky Catthoor Compiler.

Slides:



Advertisements
Similar presentations
Chapter 20 Computational complexity. This chapter discusses n Algorithmic efficiency n A commonly used measure: computational complexity n The effects.
Advertisements

MATH 224 – Discrete Mathematics
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Hadi Goudarzi and Massoud Pedram
Fast Algorithms For Hierarchical Range Histogram Constructions
Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable supply of ready processes to.
1 JuliusC A practical Approach to Analyze Divide-&-Conquer Algorithms Speaker: Paolo D'Alberto Authors: D'Alberto & Nicolau Information & Computer Science.
Performance Visualizations using XML Representations Presented by Kristof Beyls Yijun Yu Erik H. D’Hollander.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.
Visual Recognition Tutorial
Fall 2006CENG 7071 Algorithm Analysis. Fall 2006CENG 7072 Algorithmic Performance There are two aspects of algorithmic performance: Time Instructions.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Introduction to Analysis of Algorithms
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
Design and Analysis of Algorithms
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Complexity Analysis (Part I)
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Algorithm Analysis CS 201 Fundamental Structures of Computer Science.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
Analysis of Algorithms 7/2/2015CS202 - Fundamentals of Computer Science II1.
High Performance Embedded Computing © 2007 Elsevier Lecture 11: Memory Optimizations Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
1 Section 2.3 Complexity of Algorithms. 2 Computational Complexity Measure of algorithm efficiency in terms of: –Time: how long it takes computer to solve.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Scheduling Parallel Task
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Chapter 1 Algorithm Analysis
Program Performance & Asymptotic Notations CSE, POSTECH.
For Wednesday Read Weiss chapter 3, sections 1-5. This should be largely review. If you’re struggling with the C++ aspects, you may refer to Savitch, chapter.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Array Dependence Analysis COMP 621 Special Topics By Nurudeen Lameed
Reuse Distance as a Metric for Cache Behavior Kristof Beyls and Erik D’Hollander Ghent University PDCS - August 2001.
USC Search Space Properties for Pipelined FPGA Applications University of Southern California Information Sciences Institute Heidi Ziegler, Mary Hall,
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
Minimizing Stall Time in Single Disk Susanne Albers, Naveen Garg, Stefano Leonardi, Carsten Witt Presented by Ruibin Xu.
Carnegie Mellon Lecture 14 Loop Optimization and Array Analysis I. Motivation II. Data dependence analysis Chapter , 11.6 Dror E. MaydanCS243:
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Slide 1 Platform-Independent Cache Optimization by Pinpointing Low-Locality Reuse Kristof Beyls and Erik D’Hollander International Conference on Computational.
Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Bounding Worst-Case Data Cache Behavior by Analytically.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Static Analysis of Parameterized Loop Nests for Energy Efficient Use of Data Caches P.D’Alberto, A.Nicolau, A.Veidenbaum, R.Gupta University of California.
Programming for Performance CS 740 Oct. 4, 2000 Topics How architecture impacts your programs How (and how not) to tune your code.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Data Structure and Algorithm: CIT231 Lecture 3: Arrays and ADT DeSiaMore DeSiaMorewww.desiamore.com/ifm1.
Big O David Kauchak cs302 Spring Administrative Assignment 1: how’d it go? Assignment 2: out soon… Lab code.
Parallel Computing Presented by Justin Reschke
Program Performance 황승원 Fall 2010 CSE, POSTECH. Publishing Hwang’s Algorithm Hwang’s took only 0.1 sec for DATASET1 in her PC while Dijkstra’s took 0.2.
1 ENERGY 211 / CME 211 Lecture 4 September 29, 2008.
CS 721 Project Implementation of Hypergraph Edge Covering Algorithms By David Leung ( )
Dependence Analysis and Loops CS 3220 Spring 2016.
CSCI1600: Embedded and Real Time Software
A Unified Framework for Schedule and Storage Optimization
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Experiences with Enumeration of Integer Projections of Parametric Polytopes Sven Verdoolaege, Kristof Beyls, Maurice Bruynooghe, Francky Catthoor Compiler Construction

2 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

3 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

4 Introduction  Counting problems in compiler: How many executed calculations? How many data addresses accessed? How many cache misses? How many dynamically allocated bytes? How many live array elements at a symbolic iteration (i,j)? How much communication between parallel processes? … Often, answering these questions lead to counting the number integer solutions to a system of linear inequalities: -when the code consists of loops with linear loop bounds. -when the array index expressions have a linear form.

5 Example 1: Counting solutions to systems of linear inequalities void s(int N, int M) { int i,j; for(i=max(0,N-M); i<=N-M+3; i++) for(j=0; j<=N-2*i; j++) S1; } How many times is statement S1 executed? Equals the number of elements in the set: linear inequalities defining a bounded domain  polytope parameters  parametric

6 Geometric representation: parametric integer polytope

7 Solution: counting the number of integer points in a parametric polytope  Algorithm see CASES2004: “Analytical Computation of Ehrhart Polynomials: Enabling more Compiler Analyses and Optimizations”.  Solution:

8 Contribution: Extension to include existential variables  Goal: count the solution in parameterized sets of the form:  CASES2004:  CC2005:

9 l = 6i+9j-7 1 <= j <= P and 1 <= i <= 8 Example: How many array elements accessed in following loop? for j:= 1 to P do for i:= 1 to 8 do a(6i+9j-7) += 5

10 Geometric representation: Integer projection of parametric polytope. for j:= 1 to P do for i:= 1 to 8 do a(6i+9j-7) += 5 P = 3 Answer: Not a polytope!

11 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

12 Examples of compiler analyses benefiting from our work  Data placement while taking into account real-time constraints Anantharaman et al., RTSS 1998  Memory size estimation of loop nests after translation to VLSI designs Balasa et al., IEEE T.VLSI 1995 Zhao et al., IEEE T.VLSI 2000  Compilation to parallel FPGA / VLSI Bednara et al., Samos 2002 Hannig et al., PaCT 2001  Calculating Cache Behavior Beyls et al., JSA 2005 Chatterjee et al., PLDI 2001  Computing communication in distributed memory computers (HPF) Boulet et al., Euro-Par 1998 Heine et al., Euro-Par 2000 Su et al., ICS 1995  Low-Power Compilation D’Alberto et al., COLP 2001

13 Usefulness  In many of the above papers, the authors spent most of the paper discussing estimation methods to get approximate answers to the question: How many elements in S?  In this paper, we answer this question exactly.

14 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

15 Overall idea: PIP/heuristics + Barvinok PIP (Feautrier’88) Ehrhart (Clauss’96) Solution: closed form Ehrhart quasi-polynomial 3 Heuristics (novel) Barvinok (Verdoolaege’04) Boulet(1998): Worst-case exponential exec. time, even for fixed number of variables Novel method: Worst-case polynomial exec. time, for fixed number of variables

16 PIP (Feautrier’88) Parametric Integer Programming (PIP)  PIP allows to compute the lexicographical minimal element of a parametric polytope  Compute the lexicographical minimum of all points in S that are projected onto the same point in S’. (Worst-case exponential time)

17 3 Heuristics (novel) 3 Polynomial-time Heuristics 1. Unique existential variables “thickness” always smaller than 1: treat existential variable as regular variable 2. Redundant existential variables “thickness” always larger than 1: project polytopes, and count project. Legal, since there are no “holes”. (=Omega test). 3. Independent splits If none of the above applies, try to split polytope in multiple polytopes, for which one of the above rules applies

18 3 Heuristics (novel) 3 Heuristics: example “thickness” ≥1 “thickness” ≤1

19 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

20 Experiments  Reuse Distance Calculation [Beyls05]  Communication volume computation in HPF [Boulet98]  Memory Size Estimation [Balasa95]  Parametric Cache Miss Calculation [Chatterjee01]

21 Reuse Distance Calculation  Computes the number of data locations accessed between two consecutive reuses of the same data.  Parameters: iteration point where reuse occurs + program parameters.

22 Test 1: Matrix multiply, matrix size multiple of cache line size.  PIP vs. Heuristics PIP (Feautrier’88) 3 Heuristics (novel) Barvinok (Verdoolaege’04)

23 Test 2: Matrix multiply, matrix size 19 and 41, cache line size 4  Heuristics: 2 sets couldn’t be computed in one hour of time. (vertex calculation during change of basis)  PIP: 4 sets couldn’t be computed in an hour of time.  Conclusion: There are sets for which neither method can compute the solution in reasonable time.

24 Test 3: Ehrhart vs. Barvinok PIP (Feautrier’88) Ehrhart (Clauss’96) Barvinok (Verdoolaege’04)

25 Other applications. a) communication in HPF [Boulet98]  Computation of communication volume (HPF, Boulet98): EhrhartBarvinok 8x8 processors 713s/0.04s0.01s 64x64 processors 6855s/1.43s0.01s

26 Other applications b) Memory size estimation [Balasa95] EhrhartBarvinok 4 memory references 1.38s/0.01s/ 1.41s/1.41s 0.06s/0.01s/ 0.07s/0.04s Memory accessed by 4 different references in a motion estimation loop kernel, with symbolic loop bounds

27 Other applications c) Cache miss analysis [Chatterjee01]  Computes the number of cache misses in a two-way set-associative cache, for matrix-vector multiplication with symbolic loop bounds. PIP+ Ehrhart PIP+ Barvinok Heuristics+ Barvinok symbolic cache miss counting > 15 h449.39s434.47s

28 Overview  Explaining the title.  Useful in which Compiler Analyses?  High-level Algorithm Overview  Experiments  Conclusion

29 Conclusions  Many compiler analyses and optimization require the enumeration of integer projections of parametric polytopes.  Can be done by reduction to enumeration of parametric polytopes.  No clear performance difference between PIP and heuristics.  Can solve many problems that were previously considered very difficult or unsolvable.  Software available at