Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

fakultät für informatik informatik 12 technische universität dortmund Additional compiler optimizations Peter Marwedel TU Dortmund Informatik 12 Germany.
Fakultät für informatik informatik 12 technische universität dortmund Standard Optimization Techniques Peter Marwedel Informatik 12 TU Dortmund Germany.
Technische universität dortmund fakultät für informatik informatik 12 Discrete Event Models Peter Marwedel TU Dortmund, Informatik 12 Germany
Technische universität dortmund fakultät für informatik informatik 12 Specifications and Modeling Peter Marwedel TU Dortmund, Informatik
fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
Technische universität dortmund fakultät für informatik informatik 12 Discrete Event Models Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund,
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
Fakultät für informatik informatik 12 technische universität dortmund Lab 4: Exploiting the memory hierarchy - Session 14 - Peter Marwedel Heiko Falk TU.
Fakultät für informatik informatik 12 technische universität dortmund Specifications - Session 5 - Peter Marwedel TU Dortmund Informatik 12 Germany Slides.
ISBN Chapter 3 Describing Syntax and Semantics.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
Describing Syntax and Semantics
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Query Processing Presented by Aung S. Win.
Genetic Programming.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Invitation to Computer Science 5th Edition
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
EECE **** Embedded System Design
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
1 COMP3040 Tutorial 1 Analysis of algorithms. 2 Outline Motivation Analysis of algorithms Examples Practice questions.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
CS 363 Comparative Programming Languages Semantics.
Fuzzy Genetic Algorithm
Fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time Analysis - Session 19 - Heiko Falk TU Dortmund Informatik.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Data Structure Introduction.
Algorithms & FlowchartsLecture 10. Algorithm’s CONCEPT.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
Semantics In Text: Chapter 3.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Optimization Problems
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
1 Overview of Programming Principles of Computers.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Evolution of C and C++ n C was developed by Dennis Ritchie at Bell Labs (early 1970s) as a systems programming language n C later evolved into a general-purpose.
Analysis of Algorithms Spring 2016CS202 - Fundamentals of Computer Science II1.
CS510 Compiler Lecture 1. Sources Lecture Notes Book 1 : “Compiler construction principles and practice”, Kenneth C. Louden. Book 2 : “Compilers Principles,
Fakultät für informatik informatik 12 technische universität dortmund HIR Optimizations and Transformations - Session 12 - Heiko Falk TU Dortmund Informatik.
Advanced Computer Systems
Optimization Code Optimization ©SoftMoore Consulting.
CSCI1600: Embedded and Real Time Software
Semantics In Text: Chapter 3.
Programming Languages and Compilers (CS 421)
CSCI1600: Embedded and Real Time Software
Multidisciplinary Optimization
Presentation transcript:

fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.

- 2 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Schedule of the course TimeMondayTuesdayWednesdayThursdayFriday 09:30- 11:00 1: Orientation, introduction 2: Models of computation + specs 5: Models of computation + specs 9: Mapping of applications to platforms 13: Memory aware compilation 17: Memory aware compilation 11:00 Brief break 11:15- 12:30 6: Lab*: Ptolemy 10: Lab*: Scheduling 14: Lab*: Mem. opt. 18: Lab*: Mem. opt. 12:30Lunch 14:00- 15:20 3: Models of computation + specs 7: Mapping of applications to platforms 11: High-level optimizations* 15: Memory aware compilation 19: WCET & compilers* 15:20Break 15:40- 17:00 4: Lab*: Kahn process networks 8: Mapping of applications to platforms 12: High-level optimizations* 16: Memory aware compilation 20: Wrap-up * Dr. Heiko Falk

- 3 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Outline Motivation of Prepass Optimizations Loop Nest Splitting  Introduction and Code Examples  Workflow of Loop Nest Splitting Condition Satisfiability Condition Optimization Search Space Generation Search Space Exploration  Results References & Summary

- 4 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Lexical Analysis Source Code Tokens Syntactical Analysis Syntax Tree Semantical Analysis High- Level IR Code Selection Register Allocation Instruction Scheduling ASM Code Optimization High- Level IR Low- Level IR Code Optimization Low- Level IR Low- Level IR  Structure of an optimizing compiler: Question: Does only the compiler optimize code?

- 5 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Motivation of Prepass Optimizations Optimizations outside a compiler are called  Postpass optimization if applied after the compiler,  Prepass optimization if applied before the compiler. Advantages of prepass optimizations:  source code transformations easier to understand,  allow manual experimentation of an optimization technique before a costly full implementation,  independence of the actual compiler; basically applicable for every compiler supporting the source language,  independence of the actual target processor; basically applicable for arbitrary processors.

- 6 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Application Domain of Loop Nest Splitting Embedded multimedia applications:  Data flow dominated, i.e. are applied to huge amounts of data, produce huge amounts of data as output.  Most part of execution time spent in (deeply) nested loops.  Simple loop structures with known or statically analyzable lower and upper loop bounds.  Manipulation of large multi-dimensional arrays.  Typical example: Streaming applications like e.g. MPEG4.

- 7 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Example: MPEG4 Motion Estimation Reference Frame Search Area 36x36 Pixels v4x1 v4y1 144 Pixels 196 Pixels Current Frame 4x4 Pixels x4 y4 x y i

- 8 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Source Code MPEG4 Motion Estimation for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) for (vx=0; vx<9; vx++) for (vy=0; vy<9; vy++) for (x4=0; x4<4; x4++) for (y4=0; y4<4; y4++) { if (4*x+x4 35 || 4*y+y4 48) then_block_1; else else_block_1; if (4*x+vx+x || 4*y+vy+y4-4 48) then_block_2; else else_block_2; }

- 9 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Observations Compilation and execution of this source yields:  Overall execution of 91,445,760 if-statements.  Very irregular control flow due to if-statements.  Additional arithmetical overhead: Multiplications, additions, comparisons, logical or, …  Performance of this code constrained by control flow, and not by computation of motion vectors!

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Loop Nest Splitting Automatic analysis of loops & if-statements:  x, y, x4 and y4 never take values such that conditions 4*x+x4<0 and 4*y+y4<0 would ever be satisfied.  Conditions can be replaced by constant truth value ‘ 0 ’.  For x ≥ 10 or y ≥ 14: both if-statements are provably satisfied so that their then-parts are provably executed.  Both if-statements are satisfied for more than 92% of all executions of the innermost y4 -loop.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Source Code after Loop Nest Splitting for (; y<49; y++) // 2nd y-Loop for (i=0; i<20; i++) for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) // Splitting-If else for (vx=0; vx<9; vx++)... { if (0 || 4*x+x4>35 || 0 || 4*y+y4>48) // Old then_block_1; else else_block_1; // If-Stmts if (4*x+vx+x ||...) then_block_2; else else_block_2; } for (vx=0; vx<9; vx++)... { then_block_1; then_block_2; } // No If-Stmts

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Structure of Optimized Code Splitting-If:  Satisfied splitting-if automatically implies that conditions of all original if-statements are satisfied.  Then-part of splitting-if does not contain original if- statements any more, but only their then-parts.  Unsatisfied splitting-if does not allow any statement about satisfaction of original if-statements.  Else-part of splitting-if contains all original if-statements in order to keep optimized code correct.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Why a Second Y-Loop? y = for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (vx=0; vx<9; vx++)... Intuitive code: Splitting-If: 1 execution for every single y ∈ [14, 48] y = 14 for (x=0; x<36; x++) for (y=0; y<49; y++) if (x>=10 || y>=14) for (; y<49; y++) for (vx=0; vx<9; vx++)... Optimized code: 1516 Splitting-If: 1 single execution for all y ∈ [14, 48]

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stages of Loop Nest Splitting Condition Satisfiability: Find single conditions of if-statements that are either always satisfiable or always unsatisfiable. Condition Optimization: For each condition C, find a “simpler” condition C’ such that C’ ⇒ C always holds (if C’ is true, C is also true). Search Space Generation: Combine all conditions C’ to a structure G modeling all if-statements including their logical structures ( &&, || ). Search Space Exploration: Using G, determine a condition for the splitting-if leading to an overall minimization of if-statement executions.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting x x4 x x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4< Condition Satisfiability ( && ) || ||

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || x x4 x x

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization x x4 x x 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Loop Nest Splitting 1 - Condition Satisfiability 2 - Condition Optimization 4*x+3*x4>20 x-x4>3 3*x+x4<0 6*x-20*x4<61 ( && ) || || 3 - Search Space Generation x x4 x 4 - Search Space Exploration x x4 x>=7 || x4>=1

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Assumptions Loop Bounds: All lower & upper bounds (l L, u L ) are constant. If-Statements: Sequence of loop-dependent conditions, connected with logical AND or logical OR. Format: if ( C 1 ⊗ C 2 ⊗ … ) ⊗ ∈ { &&, || } Loop-dependent Conditions: Linear terms depending on index variables i L of loops. Format:C x ≅ ∑ (c L * i L ) + c ≥ 0c L, c ∈ ℤ L=1 N

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Polytopes & Linear Conditions x x4 Definition (Polyhedron & Polytope):  Polyhedron P = { x ∈ ℤ N | Ax ≥ b }A ∈ ℤ mxN, b ∈ ℤ m  Polyhedron P is called Polytope iff |P| < ∞. Model of linear conditions in nested loops: 4*x + 3*x4 > 35 for x ∈ [0, 35], x4 ∈ [0, 3] as polytope  P = {p ∈ ℤ 2 | p ≥ }

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 1 – Condition Satisfiability Goal: To determine loop-dependent conditions C x that constantly result in ‘true’ or ‘false’ for all values of index variables of all surrounding loops. Approach:  Translate each condition C x into polytope P x (cf. prev. slide)  Compare with empty set: P x == ∅ ⇒ C x always ‘false’  Compare with universe: P x == U ⇒ C x always ‘true’ Modification of Source Code: Replace such constant conditions by truth values ‘ 0 ’ and ‘ 1 ’, resp.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 2 – Condition Optimization Given: Loop-dependent condition C = ∑ (c L * i L ) + c ≥ 0 Approach:  Use a genetic algorithm (GA) to determine values l C,L and u C,L L=1 N Goal: To determine values l C,L and u C,L per condition C and per loop L such that: C is provably satisfied for all l C,L ≤ i L ≤ u C,L and values l C,L and u C,L lead to minimization of if-statement executions

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (1)  Analogy to natural evolution, “Survival of the fittest”  Optimization in loop i = 0, 1, …  Iteration i maintains population P i ; a population consists of several individuals.  An individual represents one possible solution for the modeled optimization problem.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (2)  Data structure of an individual: chromosome.  Chromosome is sequence of many genes storing data.  The actual value stored in a gene is called allele.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Genetic Algorithms (3)  Fitness function computes fitness of each individual of P i.  Selection determines subset P i ’ of P i with highest/lowest fitness.  Variation generates next population P i+1 by adding random individuals to P i ’.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Variation bases on two fundamental genetic operators: Workflow of Genetic Algorithms (4)  Crossover:  Mutation:

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund  Randomized variation may cause P i to contain individuals not representing a valid solution ↝ Repair mechanism. Workflow of Genetic Algorithms (5)  Termination of optimization if - N th iteration performed, - best determined fitness does not improve for k iterations, - …  Return individual with best fitness from last population P i as final result.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Result of Condition Optimization Input of Condition Optimization:  Linear loop-dependent condition C  Loop bounds [l L, u L ] Output of Genetic Algorithm:  Values (l C,1, u C,1, …, l C,N, u C,N ) of individual with best fitness. Output of Condition Optimization:  Polytope P’ C = { (x 1, …, x N ) ∈ ℤ N | ∀ Loops L: l C,L ≤ x L ≤ u C,L }

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 3 – Search Space Generation Given: If-statements, conditions & polytopes IF i = (C i,1 ⊗ C i,2 ⊗ … ⊗ C i,n ), ⊗ ∈ { &&, || } ∀ C i,j ↝ P i,j Construction of a polytope P i for each if-statement IF i : if C i,j-1 && C i,j : ∩ ⇒ if C i,j-1 || C i,j : ∪ ⇒ P i,j-1 P i,j Construction of a global polytope: Global Search Space G models iteration space where all if-statements are satisfied.  G = ⋂ P i

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Stage 4 – Search Space Exploration Given: Polytope G containing points where all if-statements are satisfied. Goal: To determine final polytope G’ ⊆ G such that: Translation of G’ into conditions of splitting-if leads to overall minimization of if-statement executions. Approach: Use a second genetic algorithm (omitted here) Resulting Splitting-if:  Placed into outermost possible loop.  Consists of all linear constraints included in G’.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Runtimes after LNS

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Energy Dissipation (ARM7) after LNS

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Code Sizes after LNS

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund References Loop Nest Splitting:  H. Falk, P. Marwedel, Control Flow driven Splitting of Loop Nests at the Source Code Level, DATE Conference, Munich  H. Falk, Control Flow Optimization by Loop Nest Splitting at the Source Code Level, University of Dortmund, Research Report N o 773, Dortmund 2003.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Summary Non-Compiler Optimizations  Postpass if performed after compiler, e.g. at linker level  Prepass if performed before compiler, e.g. at source code level Loop Nest Splitting  Control flow optimization in data flow dominated embedded multimedia applications  Polytopes model linear conditions and loops  Genetic algorithms optimize polytope models  Huge improvements in terms of ACET and energy (and by the way WCET), but potentially large increases in code size

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Coffee/tea break (if on schedule) Q&A?