Compiler Optimization Overview

Slides:

Advertisements

Similar presentations

Code Optimization and Performance Chapter 5 CS 105 Tour of the Black Holes of Computing.

Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

CSCI 4717/5717 Computer Architecture

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Compiler techniques for exposing ILP

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

Programmability Issues

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-

1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber

Fall 2011SYSC 5704: Elements of Computer Systems 1 SYSC 5704 Elements of Computer Systems Optimization to take advantage of hardware.

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

Chapter 2 Instruction-Level Parallelism and Its Exploitation

Introduction to Program Optimizations Chapter 11 Mooly Sagiv.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Data-Flow Analysis (Chapter 11-12) Mooly Sagiv Make-up class 18/ :00 Kaplun 324.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.

CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.

TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.

What have mr aldred’s dirty clothes got to do with the cpu

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.

RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.

Lexical analyzer Parser Semantic analyzer Intermediate-code generator Optimizer Code Generator Postpass optimizer String of characters String of tokens.

ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.

Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.

Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.

ECE 4100/6100 Advanced Computer Architecture Lecture 2 Instruction-Level Parallelism (ILP) Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.

RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.

3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.

Memory-Aware Compilation Philip Sweany 10/20/2011.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Topics to be covered Instruction Execution Characteristics

Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Code Optimization Overview and Examples

High-level optimization Jakub Yaghob

Code Optimization.

Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)

William Stallings Computer Organization and Architecture 8th Edition

Optimization Code Optimization ©SoftMoore Consulting.

Superscalar Processors & VLIW Processors

Optimizing Transformations Hal Perkins Autumn 2011

Optimizing Transformations Hal Perkins Winter 2008

Code Optimization Overview and Examples Control Flow Graph

Chapter 12 Pipelining and RISC

Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.

Dynamic Hardware Prediction

How to improve (decrease) CPI

Lecture 4: Instruction Set Design/Pipelining

CSc 453 Final Code Generation

Presentation transcript:

Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

Review: Phases of a Compiler Intermediate code optimizations are not machine specific Low level optimizations can be machine specific

Review: Compiler Options

Review: Basic Processor Parts

Review: CISC vs RISK CISC x86 Intel Multi-clock complex instructions Memory access incorporated in instruction Complex instruction set RISC Mac Powerbook Single clock instructions Memory accesses are separate instructions Simple instruction set

Review: Memory Hierarchy Memory access becomes exponentially slower at higher levels Memory access intensive programs require special optimizations

Review: Multiple Cores Need to create and use ILP Multiple cores on the same die can share cache working together faster Can only execute trivial parallelism (Dr. Doughty)‏ Must eliminate hazards

Review: Pipelines

Review: Pipelines

Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

Speed Executable Size Memory Access Power Usage – Embedded Debugging Optimization Goals Speed Executable Size Memory Access Power Usage – Embedded Debugging

Optimizing for Speed* Useful for CPU intensive applications (graphics, video editing, sorting)‏ Scheduling – out of order execution Removal of dependencies increase ILP Instruction latency Multiple ALUs, Cores, etc Mix instruction types (int, float, mult, read, write)‏ Eliminate jumps Buffer writes (cannot write out of order)‏

Optimizing for Size More common for embedded applications Competing with power/speed optimizations Limiting code size to keep critical loops in memory Choose form of instruction that is smaller (CISC)‏ Use short constants for jumps (simpler form of addressing)‏ Increase instruction length for loop alignment

Optimizing for Memory Useful for memory I/O intensive applications Consideration of proper alignment of data and instructions to reduce cache misses and improve results of paging Use instructions for controlling cache Partially addresses Von Neumann bottleneck Reading lowest level cache in P4 is 3 clocks Each higher level is an order of magnitude larger (10, 100)‏

Analysis Alias Control flow Data flow Dependence Interprocedural

Alias Analysis Determines if there are multiple ways to access a single data point Knowing aliases helps identify optimizations by recognizing data dependencies and locating redundant code/data updates Alias analysis is critical for global optimizations (reference parameters, globally defined data, pointers)‏

Control Flow Analysis Precursor to critical loop reductions Replacement of inefficient code Gathers information concerning hierarchical flow of control Identifies potential branches in program execution useful for mitigating pipeline hazards

Example: Fibonacci

Example: Fibonacci

Example: Fibonacci

Data Flow Analysis Procure information about how a procedure uses data Builds on structures from control flow analysis There are many ways to achieve goal: Reaching definitions Calculate potential definitions at a give point in the code Iterative Analysis Use control graph Structural Analysis etc

Dependence Analysis* Recognizes relationships using a DAG True/Flow dependence Anitdependence Output dependence Input dependence (does not affect execution order)‏ Instruction scheduling Data caching

Interprocedural Analysis Incorporates analysis methods discussed earlier, but on a broader level OOD and high level coding methodologies are optimal for human understanding, not computer processing Includes analysis of relationships between function calls to mitigate overhead of OOD oriented code

Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

Loop Optimizations* Loop optimizations have the greatest impact on overall code performance Desire to reduce dependencies to allow ILP Desire to reduce overhead of jumping and branching in loop Predictability – predicting loop behavior to mitigate pipeline hazards Loops must be well behaved Single return No breaks, branches, etc

Loop Strength Reduction

Procedure Optimizations Based on control flow Desire to eliminate overhead of context switches Possibly turn function calls into branches Optimizations occur at high and low level High level – Procedure integration Low level – In line expansion Conventions Leaf routines (call no others) have reduced overhead Shrink wrapping creates pseudo leaves by adding data flow analysis

Tail Call Optimization: Tail Recursion

Code Scheduling* Block Scheduling Branch Scheduling Blocks optimized as independent pieces of code Cross block scheduling applied to optimized blocks Branch Scheduling Fill stall cycles after branch with independent code Reduces effect of bad branch predictions in HW pipeline Software Pipelining Executes multiple iterations of loops synchronously

Register Allocation Applies to low level assembly Loops and nesting are used to weigh which values should be maintained in registers Nested loops weigh more heavily Considers variable activity before and after block of code is accessed Use of operation costs and number of times they are performed

Register Allocation Calculation

Register Allocation: Graph Coloring Use subset of objects that should be allocated to registers Arcs represent points where two objects exist at the same time Arcs represent conflicts where the object cannot be assigned a register (int, float)‏ Color graph with number of colors equal to number of registers Assign registers based on color

Redundancy Elimination Based on data flow analysis Intermediate level optimization Includes: Common subexpression elimination Loop invariant code motion Partial redundancy elimination Code hoisting

Peephole Optimizations Focused on very small subsets of code Generally performed late in the code process Arguably covers up bad and incomplete optimizations from earlier processes Some examples include: Dead code elimination (created from earlier optimizations)‏ Strength reductions Constant folding Instruction combining Copy propagation Algebraic simplifications

Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

Continuous Relevance of Compiler Development Back end of compilers for older languages are reworked to take advantage of advances in hardware Pipelines are becoming longer Multiple cores are now common allowing more use of parallel instructions

Research Areas Domain specific subjects: security, reliability, parallel, distributed, embedded, mobile Analysis, prediction, and debugging tools Embedded JIT compilation Development of a research compiler (GCC)‏ Enhancing compiler optimization times, specifically iterative and whole program optimizations MS F# - functional language for .NET like ML

Compiler Job Options Additional exploitation of parallel computing environments for desktop platforms Multiple OS/Environment support Integration of AI techniques, machine learning, to know when, how, where to apply optimizations (GCC)‏ Special purpose languages for video, graphics, and audio processing (nVidea)‏ Special purpose vendors for embedded products (Wind River, VxWorks)‏

Compiler Job Options Library adaptation for reconfigurable processors (GCC)‏ Fault tolerance and exception handling for security

Compiler Optimization Problems Many optimizations are localized Non-local optimizations create increased overhead in the computation process Multiple objectives of optimizations create conflicts For example: speed vs executable size