Spring 2014Jim Hogg - UW - CSE - P501S-1 CSE P501 – Compiler Construction Optimizing transformations : a sampler Will dig deeper later, after analyzing.

Slides:



Advertisements
Similar presentations
CSC 4181 Compiler Construction Code Generation & Optimization.
Advertisements

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 3 Developed By:
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Recap from last time Saw several examples of optimizations –Constant folding –Constant Prop –Copy Prop –Common Sub-expression Elim –Partial Redundancy.
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Introduction to Program Optimizations Chapter 11 Mooly Sagiv.
Register Allocation (via graph coloring)
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Parallelizing Compilers Presented by Yiwei Zhang.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Hardware-Software Interface Machine Program Performance = t cyc x CPI x code size X Available resources statically fixed Designed to support wide variety.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
Intermediate Code. Local Optimizations
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Prof. Fateman CS164 Lecture 211 Local Optimizations Lecture 21.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Order from Chaos — the big picture — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Order from Chaos — the big picture — C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Dependence Analysis and Loop Transformations.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
High-Level Transformations for Embedded Computing
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Code Optimization More Optimization Techniques. More Optimization Techniques  Loop optimization  Code motion  Strength reduction for induction variables.
More Code Generation and Optimization Pat Morin COMP 3002.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Advanced Computer Systems
Code Optimization Overview and Examples
High-level optimization Jakub Yaghob
Code Optimization.
Optimization Code Optimization ©SoftMoore Consulting.
Optimizing Transformations Hal Perkins Autumn 2011
Register Pressure Guided Unroll-and-Jam
Optimizing Transformations Hal Perkins Winter 2008
Code Optimization Overview and Examples Control Flow Graph
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Lecture 19: Code Optimisation
Code Generation Part II
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Introduction to Optimization
Presentation transcript:

Spring 2014Jim Hogg - UW - CSE - P501S-1 CSE P501 – Compiler Construction Optimizing transformations : a sampler Will dig deeper later, after analyzing loops

Spring 2014Jim Hogg - UW - CSE P501S-2 Source Target Front End Back End Scan chars tokens AST IR AST = Abstract Syntax TreeIR = Intermediate Representation ‘Middle End’ Optimize Select Instructions Parse Convert Allocate Registers Emit Machine Code IR Optimizations

Spring 2014Jim Hogg - UW - CSE - P501S-3 Role of Transformations Data-flow analysis discovers opportunities for code improvement Compiler rewrites the IR to make these improvements A transformation may reveal additional opportunities for further optimization May also block opportunities by obscuring information!

Optimizations & Code Transformations operate on the program Flowgraph. Recall: Directed graph of Nodes and Edges Node = Basic Block Edge = possible flow between nodes Basic Block = branch-free sequence of instructions Reminder: A Flowgraph Spring 2014Jim Hogg - UW - CSE - P501S-4 B0 B6 B3 B5 B1 B2 B4

Spring 2014Jim Hogg - UW - CSE - P501S-5 Organizing Transformations in a Compiler Typically, optimizer runs many 'phases': Analyze the IR Identify optimization Rewrite the IR to apply that optimization And repeat! (50 phases in a commercial compiler is typical) Each individual optimization is supported by rigorous formal theory But no formal theory for what order, or how often, to apply each Rules of thumb and best practices May apply same transformations several times, at different phases

Optimization 'Phases' Spring 2014Jim Hogg - UW - CSE - P501S-6 IR Each optimization requires a 'pass' (linear scan) over the IR IR may sometimes shrink, sometimes expand Some optimizations may be repeated 'Best' ordering is heuristic Don't try to beat an optimizing compiler - you will lose! Note: not all programs are written by humans! Machine-generated code can pose a challenge for optimizers eg: a single function with 10,000 statements, 1,000+ local variables, loops nested 15 deep, spaghetti of "GOTOs", etc

Spring 2014Jim Hogg - UW - CSE - P501S-7 Taxonomy of Optimizations Machine-Independent Transformations Mostly independent of the target machine Eg: unrolling a loop will likely make it run faster, whether for x86, ARM, MIPS, etc "mostly"? - eg: vectorize only if target is SIMD-capable Good investment - applies to all targets Machine-Dependent Transformations Mostly concerned with instruction selection, instruction scheduling and register allocation Need to tune for each different target

Spring 2014Jim Hogg - UW - CSE - P501S-8 Machine Independent Transformations Dead code elimination Unreachable Not actually used later Code motion 'hoist' loop-invariant code out of a loop Strength reduction x + x versus 2*x, + ((i * numcols) + j ) * elemsize versus p += 4 Enable other transformations Eliminate redundant computations eg: remember & re-use  (b 2 - 4*a*c) rather than re-calculate eg: Value-Numbering

Spring 2014Jim Hogg - UW - CSE - P501S-9 Machine-Dependent Transformations Take advantage of special hardware Eg: expose instruction-level parallelism (ILP) Eg: use special instructions (eg: VAX polyf, etc) Eg: use SIMD instructions & registers Manage or hide latencies Eg: tiling/blocking and loop interchange Improves cache behavior - hugely important Deal with finite resources eg: number of functional units in chip Compilers generate for a vanilla machine eg: SSE2 But provide switches to tune: /arch:AVX /arch:IA32 etc JIT compiler knows its target architecture!

Optimizer Contracts Prime directive No optimization will change observable program behavior! This can be subtle. Eg: What is "observable"? (via IO? to another thread?) Dead-Code-Eliminate a throw ? Language Reference Manual is ambiguous/undefined/negotiable Avoid harmful optimizations If an optimization does not improve code significantly, don't do it: it harms thruput If an optimization degrades code quality, don't do it Spring 2014Jim Hogg - UW - CSE - P501S-10

Is this hoist legal? Spring 2014Jim Hogg - UW - CSE - P501S-11 for (int i = start; i < finish; ++i) a[i] += 7; i = start loop: if (i >= finish) goto done if (i = a.length) throw OutOfBounds a[i] += 7 goto loop done: if (start = a.length) throw OutOfBounds i = start loop: if (i >= finish) goto done a[i] += 7 goto loop done: Another example: "volatile" pretty much kills all attempts to optimize

Spring 2014Jim Hogg - UW - CSE - P501S-12 Dead Code Elimination (DCE) If compiler can prove a computation is not required, remove it. Either: Unreachable operations - always safe to remove Useless operations - reachable, and executed, but results not actually required Dead code often results from other transformations Often want to do DCE several times

Spring 2014Jim Hogg - UW - CSE - P501S-13 Dead Code Elimination (DCE) Classic algorithm is similar to "garbage collection" Pass I – Mark all useful operations Instructions whose result does, or can, affect visible behavior: Input or Output Updates to object fields that might be used later Instructions that may throw an exception (eg: array bounds check) Calls to functions that might perform IO or affect visible behavior (Remember, for many languages, compiler does not entire program at one time) Mark all useful instructions Repeat until no more changes Pass II – delete all unmarked instructions

Spring 2014Jim Hogg - UW - CSE - P501S-14 Code Motion Idea: move an operation to a location where it is executed less frequently Classic situation: hoist loop-invariant code: execute once, rather than on every iteration Lazy code motion & partial redundancy b = b + 1a = b * c = a... b = b + 1 a = b * c = a... Replicate, so a need not be re-calculated a must be re-calculated - wasteful if control took right-hand arm

Spring 2014Jim Hogg - UW - CSE - P501S-15 Specialization:1 Idea: Replace general operation in IR with more specific Constant folding speedInFeetPerMinute = speedInMilesPerHour * FeetPerMile / MinutesPerHour speedInFeetPerMinute = speedInMilesPerHour * 5280 / 60 = speedInMilesPerHour * 88 Replace multiplication-by-a-constant with left-shift Replace division-by-a-constant with right-shift Peephole optimizations mov eax, 0=> xor eax, eax

Spring 2014Jim Hogg - UW - CSE - P501S-16 Specialization:2 - Eliminate Tail Recursion Factorial - recursive int fac(n) = if (n <= 2) return 1; else return n * fac(n - 1); 'accumulating' Factorial - tail-recursive fac'(n, r) = if (n <= 2) return 1; else return fac'(n - 1, n*r) call fac'(10, 1) Optimize-away the call overhead; replace with simple jump fac'(n, r) = if (n <= 2) return 1; else n = n - 1; r = n*r; jump back to start of fac' So replace rercursive call with a loop and just one stack frame Issue? Avoid stack overflow - good! - "observable" change?

Spring 2014Jim Hogg - UW - CSE - P501S-17 Strength Reduction Classic example: Array references in a loop for (k = 0; k < n; k++) a[k] = 0; Naive codegen for a[k] = 0 in loop body mov eax, 4// elemsize = 4 bytes imul eax, [ebp+offset k ]// k * elemsize add eax, [ebp+offset a ]// &a[0] + k * elemsize mov [eax], 0// a[k] = 0 Better! mov eax, [ebp+offset a ]// &a[0], once-off mov [eax], 0// a[k] = 0 add eax, 4// eax = &a[k+1] Note: pointers allow a user to do this directly in C or C++ Eg: for (p = a; p < a + n; ) *p++ = 0;

Spring 2014Jim Hogg - UW - CSE - P501S-18 Implementing Strength Reduction Idea: look for operations in a loop involving: A value that does not change in the loop, the region constant, and A value that varies systematically from iteration to iteration, the induction variable Create a new induction variable: it directly computes the required sequence of values use an addition in each iteration to update its value

Spring 2014Jim Hogg - UW - CSE - P501S-19 Other Common Transformations Inline substitution ("inlining") Cloning/Replicating Loop Unrolling Loop Unswitching

Inline Substitution - "inlining" Eliminates call overhead Opens opportunities for more optimizations Can be applied to large method bodies too Aggressive optimizer will inline 2 or more deep Increases total code size With care, is a huge win for OO code Spring 2014Jim Hogg - UW - CSE - P501S-20 class C { int x; int getx() { return x; } } class X { void f() { C c = new C(); int total = c.getx() + 42; } class X { void f() { C c = new C(); int total = c.x + 42; } Class with trivial getter Method f calls getx Compiler inlines body of getx into f

Spring 2014Jim Hogg - UW - CSE - P501S-21 Code Replication if (x < y) { p = x + y; } else { p = z + 1; } q = p * 3; w = y + x; if (x < y) { p = x + y; q = p * 3; w = y + x; } else { p = z + 1; q = p * 3; w = y + x; } Original Replicated code + : extra opportunities to optimize in larger basic blocks (eg: LVN) - : increase total code size - may impact effectiveness of I-cache

Spring 2014Jim Hogg - UW - CSE - P501S-22 Loop Unroll Idea: Replicate loop body more opportunity to optimize loop body reduce loop overhead (eg: cut checks & jumps by 75%) Catches must ensure unrolled code produces the same answer: "loop-carried dependency analysis" code bloat don't overwhelm registers

Spring 2014Jim Hogg - UW - CSE - P501S-23 Loop Unroll Example for (i = 1, i <= n, i++) { a[i] = a[i] + b[i]; } i = 1; while (i + 3 <= n) { a[i] = a[i] + b[i]; a[i+1] = a[i+1] + b[i+1]; a[i+2] = a[i+2] + b[i+2]; a[i+3] = a[i+3] + b[i+3]; i += 4; } while (i <= n) { a[i] = a[i] + b[i]; i++; } Original Unrolled Unroll 4x Need tidy-up loop for remainder

Spring 2014Jim Hogg - UW - CSE - P501S-24 Loop Unswitch Example for (i = 1, i <= n, i++) { if (x > y) { a[i] = b[i]*x; } else { a[i] = b[i]*y; } Original if (x > y) { for (i = 1; i <= n; i++) { a[i] = b[i]*x; } } else { for (i = 1; i <= n; i++) { a[i] = b[i]*y; } Unswitched IF condition does not change value in this code snippet No need to check x > y on every iteration Do the IF check once!

Spring 2014Jim Hogg - UW - CSE - P501S-25 Summary Just a sampler 100s of transformations in the literature Will examine several in more detail, particularly involving loops Big part of engineering a compiler is: decide which transformations to use decide in what order decide if & when to repeat each transformation Compilers offer options: optimize for speed optimize for codesize optimize for specific target micro-architecture Competitive bench-marking will investigate many permutations

Spring 2014Jim Hogg - UW - CSE - P501S-26 Loops, loops, loops Dominators Loop-invariant Code Loop Transforms Next