Download presentation
Presentation is loading. Please wait.
Published byCornelius Phelps Modified over 8 years ago
1
fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.
2
- 2 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Schedule of the course TimeMondayTuesdayWednesdayThursdayFriday 09:30- 11:00 1: Orientation, introduction 2: Models of computation + specs 5: Models of computation + specs 9: Mapping of applications to platforms 13: Memory aware compilation 17: Memory aware compilation 11:00 Brief break 11:15- 12:30 6: Lab*: Ptolemy 10: Lab*: Scheduling 14: Lab*: Mem. opt. 18: Lab*: Mem. opt. 12:30Lunch 14:00- 15:20 3: Models of computation + specs 7: Mapping of applications to platforms 11: High-level optimizations* 15: Memory aware compilation 19: WCET & compilers* 15:20Break 15:40- 17:00 4: Lab*: Kahn process networks 8: Mapping of applications to platforms 12: High-level optimizations* 16: Memory aware compilation 20: Wrap-up * Dr. Heiko Falk
3
- 3 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Outline Motivation of Prepass Optimizations Loop Nest Splitting Introduction and Code Examples Workflow of Loop Nest Splitting Condition Satisfiability Condition Optimization Search Space Generation Search Space Exploration Results References & Summary
4
- 4 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Lexical Analysis Source Code Tokens Syntactical Analysis Syntax Tree Semantical Analysis High- Level IR Code Selection Register Allocation Instruction Scheduling ASM Code Optimization High- Level IR Low- Level IR Code Optimization Low- Level IR Low- Level IR Structure of an optimizing compiler: Question: Does only the compiler optimize code?
5
- 5 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Optimizations outside a compiler are called Postpass optimization if applied after the compiler, Prepass optimization if applied before the compiler. Advantages of prepass optimizations: source code transformations easier to understand, allow manual experimentation of an optimization technique before a costly full implementation, independence of the actual compiler; basically applicable for every compiler supporting the source language, independence of the actual target processor; basically applicable for arbitrary processors.
6
- 6 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Application Domain of Loop Nest Splitting Embedded multimedia applications: Data flow dominated, i.e. are applied to huge amounts of data, produce huge amounts of data as output. Most part of execution time spent in (deeply) nested loops. Simple loop structures with known or statically analyzable lower and upper loop bounds. Manipulation of large multi-dimensional arrays. Typical example: Streaming applications like e.g. MPEG4.
7
- 7 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Example: MPEG4 Motion Estimation Reference Frame Search Area 36x36 Pixels v4x1 v4y1 144 Pixels 196 Pixels Current Frame 4x4 Pixels x4 y4 x y i
8
- 8 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Source Code MPEG4 Motion Estimation for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) for (vx=0; vx<9; vx++) for (vy=0; vy<9; vy++) for (x4=0; x4<4; x4++) for (y4=0; y4<4; y4++) { if (4*x+x4 35 || 4*y+y4 48) then_block_1; else else_block_1; if (4*x+vx+x4-4 35 || 4*y+vy+y4-4 48) then_block_2; else else_block_2; }
9
- 9 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Observations Compilation and execution of this source yields: Overall execution of 91,445,760 if-statements. Very irregular control flow due to if-statements. Additional arithmetical overhead: Multiplications, additions, comparisons, logical or, … Performance of this code constrained by control flow, and not by computation of motion vectors!
10
- 10 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Loop Nest Splitting Automatic analysis of loops & if-statements: x, y, x4 and y4 never take values such that conditions 4*x+x4<0 and 4*y+y4<0 would ever be satisfied. Conditions can be replaced by constant truth value ‘ 0 ’. For x ≥ 10 or y ≥ 14: both if-statements are provably satisfied so that their then-parts are provably executed. Both if-statements are satisfied for more than 92% of all executions of the innermost y4 -loop.
11
- 11 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Source Code after Loop Nest Splitting for (; y<49; y++) // 2nd y-Loop for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) // Splitting-If else for (vx=0; vx<9; vx++)... { if (0 || 4*x+x4>35 || 0 || 4*y+y4>48) // Old then_block_1; else else_block_1; // If-Stmts if (4*x+vx+x4-4 35 ||...) then_block_2; else else_block_2; } for (vx=0; vx<9; vx++)... { then_block_1; then_block_2; } // No If-Stmts
12
- 12 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Structure of Optimized Code Splitting-If: Satisfied splitting-if automatically implies that conditions of all original if-statements are satisfied. Then-part of splitting-if does not contain original if- statements any more, but only their then-parts. Unsatisfied splitting-if does not allow any statement about satisfaction of original if-statements. Else-part of splitting-if contains all original if-statements in order to keep optimized code correct.
13
- 13 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Why a Second Y-Loop? y = 14 1516 for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (vx=0; vx<9; vx++)... Intuitive code: Splitting-If: 1 execution for every single y ∈ [14, 48] y = 14 for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (; y<49; y++) for (vx=0; vx<9; vx++)... Optimized code: 1516 Splitting-If: 1 single execution for all y ∈ [14, 48]
14
- 14 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Stages of Loop Nest Splitting Condition Satisfiability: Find single conditions of if-statements that are either always satisfiable or always unsatisfiable. Condition Optimization: For each condition C, find a “simpler” condition C’ such that C’ ⇒ C always holds (if C’ is true, C is also true). Search Space Generation: Combine all conditions C’ to a structure G modeling all if-statements including their logical structures ( &&, || ). Search Space Exploration: Using G, determine a condition for the splitting-if leading to an overall minimization of if-statement executions.
15
- 15 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting x x4 x x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 1 - Condition Satisfiability ( && ) || ||
16
- 16 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || x x4 x x
17
- 17 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x
18
- 18 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x 4 - Search Space Exploration x x4 x>=7 || x4>=1
19
- 19 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Assumptions Loop Bounds: All lower & upper bounds (l L, u L ) are constant. If-Statements: Sequence of loop-dependent conditions, connected with logical AND or logical OR. Format: if ( C 1 ⊗ C 2 ⊗ … ) ⊗ ∈ { &&, || } Loop-dependent Conditions: Linear terms depending on index variables i L of loops. Format:C x ≅ ∑ (c L * i L ) + c ≥ 0c L, c ∈ ℤ L=1 N
20
- 20 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Polytopes & Linear Conditions 4 3 1 0 -1 0 0 1 0 -1 36 0 -35 0 -3 x x4 Definition (Polyhedron & Polytope): Polyhedron P = { x ∈ ℤ N | Ax ≥ b }A ∈ ℤ mxN, b ∈ ℤ m Polyhedron P is called Polytope iff |P| < ∞. Model of linear conditions in nested loops: 4*x + 3*x4 > 35 for x ∈ [0, 35], x4 ∈ [0, 3] as polytope P = {p ∈ ℤ 2 | p ≥ }
21
- 21 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Stage 1 – Condition Satisfiability Goal: To determine loop-dependent conditions C x that constantly result in ‘true’ or ‘false’ for all values of index variables of all surrounding loops. Approach: Translate each condition C x into polytope P x (cf. prev. slide) Compare with empty set: P x == ∅ ⇒ C x always ‘false’ Compare with universe: P x == U ⇒ C x always ‘true’ Modification of Source Code: Replace such constant conditions by truth values ‘ 0 ’ and ‘ 1 ’, resp.
22
- 22 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Stage 2 – Condition Optimization Given: Loop-dependent condition C = ∑ (c L * i L ) + c ≥ 0 Approach: Use a genetic algorithm (GA) to determine values l C,L and u C,L L=1 N Goal: To determine values l C,L and u C,L per condition C and per loop L such that: C is provably satisfied for all l C,L ≤ i L ≤ u C,L and values l C,L and u C,L lead to minimization of if-statement executions
23
- 23 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (1) Analogy to natural evolution, “Survival of the fittest” Optimization in loop i = 0, 1, … Iteration i maintains population P i ; a population consists of several individuals. An individual represents one possible solution for the modeled optimization problem.
24
- 24 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (2) Data structure of an individual: chromosome. Chromosome is sequence of many genes storing data. The actual value stored in a gene is called allele.
25
- 25 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (3) Fitness function computes fitness of each individual of P i. Selection determines subset P i ’ of P i with highest/lowest fitness. Variation generates next population P i+1 by adding random individuals to P i ’.
26
- 26 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Variation bases on two fundamental genetic operators: Workflow of Genetic Algorithms (4) Crossover: Mutation:
27
- 27 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Randomized variation may cause P i to contain individuals not representing a valid solution ↝ Repair mechanism. Workflow of Genetic Algorithms (5) Termination of optimization if - N th iteration performed, - best determined fitness does not improve for k iterations, - … Return individual with best fitness from last population P i as final result.
28
- 28 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Result of Condition Optimization Input of Condition Optimization: Linear loop-dependent condition C Loop bounds [l L, u L ] Output of Genetic Algorithm: Values (l C,1, u C,1, …, l C,N, u C,N ) of individual with best fitness. Output of Condition Optimization: Polytope P’ C = { (x 1, …, x N ) ∈ ℤ N | ∀ Loops L: l C,L ≤ x L ≤ u C,L }
29
- 29 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Stage 3 – Search Space Generation Given: If-statements, conditions & polytopes IF i = (C i,1 ⊗ C i,2 ⊗ … ⊗ C i,n ), ⊗ ∈ { &&, || } ∀ C i,j ↝ P i,j Construction of a polytope P i for each if-statement IF i : if C i,j-1 && C i,j : ∩ ⇒ if C i,j-1 || C i,j : ∪ ⇒ P i,j-1 P i,j Construction of a global polytope: Global Search Space G models iteration space where all if-statements are satisfied. G = ⋂ P i
30
- 30 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Stage 4 – Search Space Exploration Given: Polytope G containing points where all if-statements are satisfied. Goal: To determine final polytope G’ ⊆ G such that: Translation of G’ into conditions of splitting-if leads to overall minimization of if-statement executions. Approach: Use a second genetic algorithm (omitted here) Resulting Splitting-if: Placed into outermost possible loop. Consists of all linear constraints included in G’.
31
- 31 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Relative Runtimes after LNS
32
- 32 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Relative Energy Dissipation (ARM7) after LNS
33
- 33 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Relative Code Sizes after LNS
34
- 34 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund References Loop Nest Splitting: H. Falk, P. Marwedel, Control Flow driven Splitting of Loop Nests at the Source Code Level, DATE Conference, Munich 2003. H. Falk, Control Flow Optimization by Loop Nest Splitting at the Source Code Level, University of Dortmund, Research Report N o 773, Dortmund 2003.
35
- 35 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Summary Non-Compiler Optimizations Postpass if performed after compiler, e.g. at linker level Prepass if performed before compiler, e.g. at source code level Loop Nest Splitting Control flow optimization in data flow dominated embedded multimedia applications Polytopes model linear conditions and loops Genetic algorithms optimize polytope models Huge improvements in terms of ACET and energy (and by the way WCET), but potentially large increases in code size
36
- 36 - technische universität dortmund fakultät für informatik h. falk, informatik 12, 2008 TU Dortmund Coffee/tea break (if on schedule) Q&A?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.