Case Studies of Compilers and Future Trends Chapter 21 Mooly Sagiv.

Slides:

Advertisements

Similar presentations

CSC 4181 Compiler Construction Code Generation & Optimization.

Advertisements

Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.

Comparison and Evaluation of Back Translation Algorithms for Static Single Assignment Form Masataka Sassa #, Masaki Kohama + and Yo Ito # # Dept. of Mathematical.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.

Lecture 11: Code Optimization CS 540 George Mason University.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-

1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.

The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.

Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber

TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.

Cpeg421-08S/final-review1 Course Review Tom St. John.

Compiler Optimizations for Memory Hierarchy Chapter 20 High Performance Compilers.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)

9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.

Introduction to Program Optimizations Chapter 11 Mooly Sagiv.

Data-Flow Analysis (Chapter 11-12) Mooly Sagiv Make-up class 18/ :00 Kaplun 324.

Hardware-Software Interface Machine Program Performance = t cyc x CPI x code size X Available resources statically fixed Designed to support wide variety.

1 Lecture 7: Computer Arithmetic Today’s topics:  Chapter 2 wrap-up  Numerical representations  Addition and subtraction Reminder: Assignment 3 will.

Improving Code Generation Honors Compilers April 16 th 2002.

Compiler Optimization Overview

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:

1 Lecture 6: Compilers, the SPIM Simulator Today’s topics:  SPIM simulator  The compilation process Additional TA hours: Liqun Cheng, legion at.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.

Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

24/06/2004Programming Language Design and Implementation 1 Optimizations in XSLT tokyo.ac.jp/schuko/XSLT-opt.ppt 24/June/04.

CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.

Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.

What’s in an optimizing compiler?

1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.

Advanced Compiler Design Early Optimizations. Introduction Constant expression evaluation (constant folding)  dataflow independent Scalar replacement.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:

High-Level Transformations for Embedded Computing

Lexical analyzer Parser Semantic analyzer Intermediate-code generator Optimizer Code Generator Postpass optimizer String of characters String of tokens.

ISA's, Compilers, and Assembly

3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.

Memory-Aware Compilation Philip Sweany 10/20/2011.

CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Topics to be covered Instruction Execution Characteristics

Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Code Optimization Overview and Examples

High-level optimization Jakub Yaghob

Code Optimization.

Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)

Optimization Code Optimization ©SoftMoore Consulting.

Code Generation Part III

Optimizing Transformations Hal Perkins Autumn 2011

Optimizing Transformations Hal Perkins Winter 2008

Code Optimization Overview and Examples Control Flow Graph

Code Generation Part III

Chapter 12 Pipelining and RISC

Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.

Intermediate Code Generation

Lecture 4: Instruction Set Design/Pipelining

CSc 453 Final Code Generation

Presentation transcript:

Case Studies of Compilers and Future Trends Chapter 21 Mooly Sagiv

Goals Learn on exiting compilers –Which of the studied subjects implemented –Mention techniques not studied Future trends (textbook) Other trends Techniques used in optimizing compilers Have some fun

Compilers Studied SUN compilers for SPARC 8, 9 IBM XL compilers for Power and PoewerPC architectures Digital compiler for Alpha Intel reference compiler for 386 Comparison criteria –Duration and history –Structure –Optimizations performed on two programs

int length, width, radius; enum figure {RECTANGLE, CIRCLE} ; main() {int area=0, volume=0; height; enum figure kind=RECTANGLE; for (height=0; height < 10; height++) {if (kind == RECTANGLE) { area += length * width; volume += length * width * height; } else if (kind==CIRCLE){ area += 3.14 * radius * radius ; volume += 3.14 * radius * height; } } process(area, volume); }

integer a(500, 500), k, l; do 20 k=1,500 do 20 l=1,500 a(k, l)= k+l 20 continue call s1(a, 500) end subroutine s1(a,n) integer a(500, 500), n do 100 i = 1 1,n do 100 j = i + 1, n do 100 k = 1, n l = a(k, i) m = a(k, j) a(k, j) = l + m 100 continue end

The SPARC Architecture Sparc 8 –32 bit RISC superscalar system with pipeline –integer and floating point units –8 general purpose integer registers (r 0  0) –load, store, arithmetic, shift, branch, call and system control –addresses (register+register, register+displ.) –Three address instructions –Several 24 register windows (spilling by OS) Sparc 9 – 64 bit architecture (upward compatible)

The Sun SPARC Compilers Front-ends for C, C++, Fortran 77, Pascal Originated from Berkeley 4.2 BSD Unix Developed at Sun since 1982 Original backend for Motorola Migrated to M6800 and then to SPARC Global optimization developed at 1984 Interprocedural optimization began at 1984 Mixed compiler model

The Sun SPARC Compiler Front-End Sun IR yabe Automatic inliner aliaser iropt (global optimizer) Sun IR Code generator Relocatable

ENTRY “s1_” {IS_EXT_ENTRY, ENTRY_IS_GLOBAL} goto LAB_32 LAB_32: LTEMP.1 = (.n { ACCESS V41} ); i = 1 CBRANCH (i <= LTEMP.1, 1: LAB_36, 0: LAB_35); LAB_36: LTEMP.2 = (.n { ACCESS V41} ); j=i+1 CBRANCH (j <= LTEMP.2, 1: LAB_41, 0: LAB_40); LAB_41: LTEMP.3 = (.n { ACCESS V41} ); k=1 CBRANCH (k <= LTEMP.3, 1: LAB_46, 0: LAB_45); LAB_46: l = (.a[k, i] ACCESS V20} ); m = (.a[k, j] ACCESS V20}); *(a[k,j] = l+m {ACCESS V20, INT}); LAB_34: k = k+1; CBRANCH(k>LEMP.3, 1: LAB_45, 0: LAB_46); LAB_45: j=j+1 … LAB_35:

SUNOptimization Levels O1 Limited optimizations O2 –Optimize expressions not involving global, aliased local, and volatile variables O3 Worst case assumptions on pointer aliases –Automatic inlining –software pipelining – loop unrolling – instruction scheduling O4 Front-end provides aliases

iropt Processes each procedure separately Use basic blocks Control flow analysis using dominators Parallelizer uses structural analysis Other optimizations using iterative algorithms Optimizations translate Sun-IR  Sun-IR

Optimizations in iropt Scalar replacement of aggregates and expansion of Fortran arithmetic on complex numbers dependence-based analysis and transformations (O3, O4) linearization of array addresses algebraic simplification and reassociation of address expressions loop invariant code motion strength reduction and induction variable removal global common-subexpression elimination dead-code elimination

Dependence Based Analysis Constant propagation dead-code elimination structural control flow analysis loop discovery (index variables, lower and upper bounds) segregation of loops that have calls and early exists dependence analysis using GCD loop distribution (split loops 20) loop interchange loop fusion scalar replacement of array elements recognition of reductions data-cache tiling profitability analysis for parallel code generation

Code Generator First translate Sun-IR to asm+ instruction selection inline of assembly language templates local optimizations (dead-code elimination, branch chaining, …) macro expansion data-flow analysis of live variables early instruction selection register allocation by graph coloring stack frame layout macro expansion (MAX, MIN, MOV) late instruction scheduling inline of assembly language constructs macro expansion emission of relocatable code

Optimizations on main Removal of unreachable code in else (except  ) Move of loop invariant “length*width” Strength reduction of “height” Loop unrolling by factor of four Local variables in registers All computations in registers Identifying tail call Stack frame eliminated

Missed optimizations on main Removal of  computations Compute area in one instruction Completely unroll the loop

Optimizations on Fortran example Procedure integration of s1 (n=500) Common subexpression elimination of “a[k,j]” Loop unrolling Local variables in registers Software pipelining

Optimizations missed Fortran example Eliminating s1 Eliminating addition in loop via linear function test replacement

POWER/PowerPC Architecture Power –32 bit RISC superscalar system with –branch, integer and floating point units –optional multiprocessors (one branch) –32 (shared) general purpose integer registers (gr 0  0) –load, store, arithmetic, shift, branch, call and system control –addresses (register+register, register+displ.) –Three address instructions PowerPC – Both 32 and 64 bit architecture

The IBM XL Compilers Front-ends for PL.8, C, C++, Fortran 77, Pascal Originated in 1983 Written in PL.8 First released for PC/RT Generates code for Power, Intel 386, SPARC and PowerPC No interprocedural optimizations (Almost) all optimizations on low level IR (XIR)

The IBM/XIL Compiler Translator XIL Instruction scheduler Register allocation Instruction scheduler Instruction Selection Relocatable Optimizer XIL Final assembly XIL

TOBEY Processes each procedure separately Use basic blocks Control flow analysis in using DFS and intervals YIL a higher level representation –loops –SSA form Data-flow analysis by interval analysis Iterative algorithm for non reducible

Optimizations in TOBEY Switches  Compare|Table branch Mapping local variables to register+offset Inline for current module Aggressive value numbering global common subexpression elimination loop-invariant code motion downward store motion dead-store motion reassociation, strength reduction global constant propagation architecture specific optimizations (MAX) value numbering global common subexpression elimination dead code elimination elimination of dead induction variables

Final Assembly Two passes on XIL –Peephole optimizations –Generate relocatable immage

Optimizations on main Removal of unreachable code in else Move of loop invariant “length*width” Strength reduction of “height” Loop unrolling by factor of two Local variables in registers All computations in registers

Missed optimizations on main Identifying tail call Compute area in one instruction

Optimizations on Fortran example n=500 Common subexpression elimination of “a[k,j]” Loop unrolling 9 Local variables in registers Software pipelining

Optimizations missed Fortran example Procedure integration of s1 Eliminating addition in loop via linear function test replacement

Intel 386 Architecture 32 bit CISC system 8 32 bit integer registers Support 16 and 8 bit registers Dedicated registers (e.g., stack frame) Many address modes Two address instructions 80 bits floating point

The Intel Compilers Front-ends for C, C++, Fortran 77, Fortran 90 Front-End from Multiflow and Edison Design Group (EDG) Generates 386 code Interprocedural optimization were added (1991) Mixed optimization mode Many optimizations based on partial redundency elimination

The Intel Compiler Front-End IL-1 Memory optimizer Global optimizer Relocatable Interprocedural Optimizer Code selector Register allocation Instruction scheduler IL-1+IL-2

Interprocedural Optimizer Cross module Saves intermediate representations Interprocedural constant propagation

Memory Optimizer Improves memory and caches loop transformations Uses SSA form Data dependence

Global Optimizer Constant propagation dead code elimination local common subexpression elimination copy propagation partial redundency elimination copy propagation dead code elimination

Optimizations on main Removal of unreachable code in else Move of loop invariant “length*width” Strength reduction of “height” Local variables in registers

Missed optimizations on main Compute area in one instruction Identifying tail call Loop unrolling

Optimizations on Fortran example Inlinining s1 n=500 Common subexpression elimination of “a[k,j]” Local variables in registers Linear function test replacement

Optimizations missed Fortran example Eliminating s1 Loop unrolling in the inlined loop

optSunIBMIntel constant kind  dead-code  (almost)  loop invariant  strength reduction  loop-unrolling425 register allocation  stack-frame eliminated  tail-call 

optSunIBMIntel CSE a(k, j)  integrate s1  loop-unrolling42  instructions (inner-loop) 2194 linear function test replace  software pipelined  register allocation 

Future Trends in Compiler Design and Implementation SSA is being used more and more: –generalizes basic block optimizations to extended basic blocks –leads to performance improvements Partial redundency elimination is being used more Partial redundency and SSA are being combined Paralizations and vectorization are being integrated into production compilers Data-dependence testing, data-cache optimization and software pipeline will advance significantly The most active research area in scalar compilation will be optimization

Other Trends More and more work will be shifted from hardware to compilers More advanced hardware will be available Higher order programming languages will be used –Memory management will be simpler –Modularity facilities –Assembly programming will hardly be used Dynamic (runtime) compilation will become more significant

Theoretical Techniques in Compilers