Memory Allocations for Tiled Uniform Dependence Programs Tomofumi Yuki and Sanjay Rajopadhye.

Slides:



Advertisements
Similar presentations
Image Segmentation with Level Sets Group reading
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
AlphaZ: A System for Design Space Exploration in the Polyhedral Model
Chapter 6: Memory Management
Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.
Parametric Tiling of Affine Loop Nests* Sanket Tavarageri 1 Albert Hartono 1 Muthu Baskaran 1 Louis-Noel Pouchet 1 J. “Ram” Ramanujam 2 P. “Saday” Sadayappan.
Informed Search Methods How can we improve searching strategy by using intelligence? Map example: Heuristic: Expand those nodes closest in “as the crow.
Chapter 2: Understanding Structure
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Reducing the Communication Cost via Chain Pattern Scheduling Florina M. Ciorba, Theodore Andronikos, Ioannis Drositis, George Papakonstantinou and Panayotis.
CS 104 Introduction to Computer Science and Graphics Problems
Shortest Geodesics on Polyhedral Surfaces Project Summary Efrat Barak.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
Parallelizing Compilers Presented by Yiwei Zhang.
A Data Locality Optimizing Algorithm based on A Data Locality Optimizing Algorithm by Michael E. Wolf and Monica S. Lam.
Arun Kejariwal Paolo D’Alberto Alexandru Nicolau Paolo D’Alberto Alexandru Nicolau Constantine D. Polychronopoulos A Geometric Approach for Partitioning.
Data Dependences CS 524 – High-Performance Computing.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Names and Bindings Introduction Names Variables The concept of binding Chapter 5-a.
Introduction to machine learning
State-Space Searches. 2 State spaces A state space consists of –A (possibly infinite) set of states The start state represents the initial problem Each.
State-Space Searches.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Applying Data Copy To Improve Memory Performance of General Array Computations Qing Yi University of Texas at San Antonio.
Optimal Parallelogram Selection for Hierarchical Tiling Authors: Xing Zhou, Maria J. Garzaran, David Padua University of Illinois Presenter: Wei Zuo.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Chapter 4: Control Structures I (Selection). Objectives In this chapter, you will: – Learn about control structures – Examine relational and logical operators.
BSG-Route: A Length-Matching Router for General Topology T. Yan and M. D. F. Wong University of Illinois at Urbana-Champaign ICCAD 2008.
Experiences with Enumeration of Integer Projections of Parametric Polytopes Sven Verdoolaege, Kristof Beyls, Maurice Bruynooghe, Francky Catthoor Compiler.
State-Space Searches. 2 State spaces A state space consists of A (possibly infinite) set of states The start state represents the initial problem Each.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
Programming 1 DCT 1033 Control Structures I (Selection) if selection statement If..else double selection statement Switch multiple selection statement.
Obtaining Electric Field from Electric Potential Assume, to start, that E has only an x component Similar statements would apply to the y and z.
Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
CR18: Advanced Compilers L04: Scheduling Tomofumi Yuki 1.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
CR18: Advanced Compilers L06: Code Generation Tomofumi Yuki 1.
CR18: Advanced Compilers L08: Memory & Power Tomofumi Yuki.
CR18: Advanced Compilers L05: Scheduling for Locality Tomofumi Yuki 1.
Chapter 4: Control Structures I (Selection). Objectives In this chapter, you will: – Learn about control structures – Examine relational and logical operators.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Revisiting Loop Transformations with X10 Clocks Tomofumi Yuki Inria / LIP / ENS Lyon X10 Workshop 2015.
Mesh Resampling Wolfgang Knoll, Reinhard Russ, Cornelia Hasil 1 Institute of Computer Graphics and Algorithms Vienna University of Technology.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 2: Control Structures (Selection & Repetition)
1 A* search Outline In this topic, we will look at the A* search algorithm: –It solves the single-source shortest path problem –Restricted to physical.

Chapter 4 – C Program Control
Conception of parallel algorithms
Morphing and Shape Processing
Loop Restructuring Loop unswitching Loop peeling Loop fusion
A Practical Stride Prefetching Implementation in Global Optimizer
Agent-Centered Search
A Unified Framework for Schedule and Storage Optimization
Agent-Centered Search
State-Space Searches.
State-Space Searches.
Introduction to Optimization
Optimizing single thread performance
State-Space Searches.
Presentation transcript:

Memory Allocations for Tiled Uniform Dependence Programs Tomofumi Yuki and Sanjay Rajopadhye

Parametric Tiling Series of advances Perfect loop nests [Renganarayanan2007] Imperfectly nested loops [Hartono2009, Kim2009] Parallelization [Hartono2010, Kim2010] Key idea: Step out of the polyhedral model Parametric tiling is not affine Use syntactic manipulations 1/21/13 IMPACT

Memory Allocations Series of polyhedral approaches Affine Projections [Wilde & Rajopadhye 1996] Pseudo-Projections [Lefebvre & Feautrier 1998] Dimension-wise “optimal” [Quilleré & Rajopadhye 2000] Lattice-based [Darte et al. 2005] Cannot be used for parametric tiles Can be used to allocate per tile [Guelton et al. 2011] Difficult to combine parametric tiling with memory-reallocation 1/21/13 IMPACT

This paper Find allocations valid for a set of schedules Tiled execution by any tile size Based on Occupancy Vectors [Strout et al. 1998] Restrict the universe to tiled execution Quasi-Universal Occupancy Vectors More compact allocations than UOV Analytically find the shortest Quasi-UOV UOV-guided index-set splitting Separate boundaries to reduce memory usage 1/21/13 IMPACT

Outline Introduction Universal Occupancy Vectors (review) Lengths of UOVs Overview of the proposed flow Finding the shortest QUOV UOV-guided Index-set Splitting Related Work Conclusions 1/21/13 IMPACT

Universal Occupancy Vectors Find a valid allocation for any legal schedules Occupancy vector: ov Value produced at z is dead by z+ov Assumptions Same dependence pattern Single statement Legal schedule can even be from run-time scheduler 1/21/13 IMPACT Live until these 4 iterations are executed. Find an iteration that depends on all the uses.

Lengths of UOVs Shorter ≠ Better The shape of iteration space has influence A good “rule of thumb” when shape is not known Increase in Manhattan distance usually leads increase in memory usage 1/21/13 IMPACT

Proposed Flow Input: Polyhedral representation of a program no memory-based dependences Make scheduling choices The result should be (partially) tilable Apply schedules as affine transforms Lex. scan of the space now reflects schedule Apply UOV-based index-set splitting Apply QUOV-based allocation 1/21/13 IMPACT

UOV for Tilable Space We know that the iteration space will be tiled Dependences are always in the first orthant Certain order is always imposed  Implicit dependences 1/21/13 IMPACT

Finding the shortest QUOV 1. Create a bounding hyper-rectangle Smallest that contains all dependences 2. The diagonal is the shortest UOV Intuition No dependence goes “backwards” Property of tilable space 1/21/13 IMPACT

Outline Introduction Universal Occupancy Vectors (review) Lengths of UOVs Overview of the proposed flow Finding the shortest QUOV UOV-guided Index-set Splitting Related Work Conclusions 1/21/13 IMPACT

Dependences at Boundaries Many boundary conditions in polyhedral representation of programs e.g., Gauss Seidel 2D (from polybench) Single C statement, 10+ boundary cases May negatively influence storage mapping With per-statement projective allocations Different life-times at boundaries May be longer than the main body Allocating separately may also be inefficient 1/21/13 IMPACT

UOV-Based Index-Set Splitting “Smart” choice of boundaries to separate out Those that influence the shortest QUOV Example: Dashed dependences = boundary dependences Removing one has no effect Removing the other shrinks the bounding hyper-rect. 1/21/13 IMPACT

Related Work Affine Occupancy Vectors [Thies et al. 2001] Restrict the universe to affine schedules Comparison with schedule-dependent methods Schedule-dependent methods are at least as good as UOV or QUOV based approaches UOV based methods may not be as inefficient as one might think Provided O(d-1) data is required for d dimensional space UOV-based methods are single projection 1/21/13 IMPACT

Example Smith-Waterman (-like) dependences 1/21/13 IMPACT

Summary and Conclusion We “expand” the concept of UOV to a smaller universe: tiled execution We use properties in such universe to find: More compact allocations Shortest QUOVs Profitable index-set splitting Possible approach for parametrically tiled programs 1/21/13 IMPACT

Acknowledgements Michelle Strout For discussion and feedback IMPACT PC and Chairs Our paper is in a much better shape after revisions 1/21/13 IMPACT

Extensions to Multi-Statement Schedule-Independent mapping is for programs with single statement We reduce the universality to tiled execution Multi-statement programs can be handled Intuition: When tiling a loop nest, the same affine transform (schedule) is applied to all statements Dependences remain the same 1/21/13 IMPACT

Dependence Subsumption Some dependences may be excluded when considering UOVs and QUOVs A dependence f subsumes a set of dependences I if f can be expressed transitively by dependences in I 1/21/13 IMPACT Valid UOV for the left is also valid for the right.