Download presentation
Presentation is loading. Please wait.
Published byJeffery Norris Modified over 9 years ago
1
Optimal Parallelogram Selection for Hierarchical Tiling Authors: Xing Zhou, Maria J. Garzaran, David Padua University of Illinois Presenter: Wei Zuo
2
Motivation and Background The importance of loop tiling Optimize multiple nested loops (most time consuming) Improve data locality Expose parallelism
3
What is Hierarchical tiling Tile the loop hierarchically to fit the organization of the target machine
4
Why Hierarchical Tiling The advantage of hierarchy-aware optimization To unleash the potential of hierarchically organized system
5
Challenges of (hierarchical) tiling Selection of tile sizes Selection of tile shapes Shapes have significant impact of execution time Shapes at different level interacts with each other Can not select tile shape at each level separately Need a global model considering hierarchical tiling
6
Contribution of the paper An automatic system for selection of tile shapes in a hierarchical system A model that compute execution time of the tile shapes Show that the problem of optimal tile shape selection is a nonlinear bi-level programming problem
7
Math Concepts Iteration Space Tiling Representation Dependence Vectors Execution Time Vector
8
Iteration space representation The edge matrix E of iteration space I An example: The function span(E) describes the iteration space I
9
Tiling transformation representation Tiling matrix: After tiling: E’ is the new edge matrix A is the transformation matrix We have: A=T -1 is the affine transformation of tiling with shape T
10
Hierarchical tiling Recursively tile an iteration space I Bottom-up. T 0: Finest -> T n : Original Space I k : is the iteration space of k-th level T k : Tile shape of k-th level E k : Edge matrix of k-th level
11
Dependence Vectors Dependence matrix A dependence vector d = (d 0, d 1,..., d n−1 ) indicates that any iteration i must finish before iteration i + d. Assume atomic computation (not communication overlap) It is possible to topologically sort all tiles Can be no cycles in the inter-tile dependence graph Hyperplanes defining the tiles must not be crossed by two dependence vectors with different directions.
12
Dependence Vectors No cycle => each dependence vector d must be covered by the cone spanned by the extension of t 0,…t n-1 Tiles be large => inter-tile dependences only exist between adjacent tiles Combine together: After transformation D k, the dependence at k-th level tiling
13
The sequential execution time of a loop with iteration space I is Consider the parallelism, Ideal execution time is the minimal execution time of an iteration space I that can be achieved by any valid schedule of iterations (E: edge matrix, D: dependence matrix , L(E, D) denote the length of the longest path of dependent iterations in the iteration space) Example: After simplification: Execution Model
14
The Tile Size Selection Model Problem Statement The Optimization formation Compute the Longest Dependent Path Automated Framework
15
Tile Size Selection Model Problem statement: Selecting the tile shape for hierarchical tiling. Identifying the tile shapes defining an l-level hierarchical tiling that minimizes the execution time of the computation defined by giving an n- dimensional hyperparallelepiped-shaped iteration space I and m dependence vectors; i.e. Determing the sequency of tiling matrices T 0, T 1, … T l-1. Assumptions The model considers parallelogram tile shapes At a given level, all nonboundary tiles have the same shape Tiling is an affine transformation Computation within a tile is atomic Infinite resource for parallelism
16
Iteration space Execution time per-tile for bottom-level tile The recursion for upper tiles: Model Formulation Each tile at level below is considered as a single iteration The per-iteration execution time is Time(T k-1 ). D k is dependence at k-th level with D 0 = D t s k be the synchronization and communication overhead of each tile
17
Model Formulation Optimization: Select t 1, … t l-1 to minimize total execution time Constraints (dependence) Question: How to compute “L” ?
18
Contribution of the paper An automatic system for selection of tile shapes in a hierarchical system A model that compute execution time of the tile shapes Show that the problem of optimal tile shape selection is a nonlinear bi-level programming problem Computing L(T k, I n ) 0<k<n-1 Computing L(E, I n )
19
Computation of the L Computing L(T k, I n ) By affine transformation Since: To compute L(Tk,1n), we must find the longest path P (p 0, p 1,..., p L−1 ) Therefore: L(Tk, 1n) = max{L}
20
Computation of the L Computation of L(E, I n ) Since dependence vectors d can point in any direction, the longest dependent path does not necessarily start from origin (0,0,...,0) of the hypercube iteration space. Approximately estimate the L using binary optimization
21
The Automatic Framework Multidimensional non-linear optimization problem w/o a known analytical solution NOMAD
22
Experiments Platforms: Bluewater super computer First level: 256 nodes Second level: Each node has an NVIDIA Tesla GPU accelerator with 2688 CUDA cores Tiling Schemes Scheme 1 & 2: The common tile shapes The hierarchical overlapped tiling method Note: tiles shapes include Square, Diamond and Skewing1 & 2
23
Comparing the performance
24
Testing the model accuracy The accuracy of the analytical model for execution estimation 15% except for 1D-Jacobi Reasons cause inaccuracy: The variation of communication time and execution time for different program The hardware resource for parallelism is not unlimited
25
Conclusion An automatic system for selection of tile shapes in a hierarchical system A model that compute execution time of the tile shapes Show that the problem of optimal tile shape selection is a nonlinear bi-level programming problem Review the limitations, these can be future work Affine, regular parallelism, adding the hardware resource model, considering the different metrics, e.g. power, area …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.