Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler Optimization Overview

Similar presentations


Presentation on theme: "Compiler Optimization Overview"— Presentation transcript:

1 Compiler Optimization Overview
1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

2 Review: Phases of a Compiler
Intermediate code optimizations are not machine specific Low level optimizations can be machine specific

3 Review: Compiler Options

4 Review: Basic Processor Parts

5 Review: CISC vs RISK CISC x86 Intel Multi-clock complex instructions
Memory access incorporated in instruction Complex instruction set RISC Mac Powerbook Single clock instructions Memory accesses are separate instructions Simple instruction set

6 Review: Memory Hierarchy
Memory access becomes exponentially slower at higher levels Memory access intensive programs require special optimizations

7 Review: Multiple Cores
Need to create and use ILP Multiple cores on the same die can share cache working together faster Can only execute trivial parallelism (Dr. Doughty)‏ Must eliminate hazards

8 Review: Pipelines

9 Review: Pipelines

10 Compiler Optimization Overview
1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

11 Speed Executable Size Memory Access Power Usage – Embedded Debugging
Optimization Goals Speed Executable Size Memory Access Power Usage – Embedded Debugging

12 Optimizing for Speed* Useful for CPU intensive applications (graphics, video editing, sorting)‏ Scheduling – out of order execution Removal of dependencies increase ILP Instruction latency Multiple ALUs, Cores, etc Mix instruction types (int, float, mult, read, write)‏ Eliminate jumps Buffer writes (cannot write out of order)‏

13 Optimizing for Size More common for embedded applications
Competing with power/speed optimizations Limiting code size to keep critical loops in memory Choose form of instruction that is smaller (CISC)‏ Use short constants for jumps (simpler form of addressing)‏ Increase instruction length for loop alignment

14 Optimizing for Memory Useful for memory I/O intensive applications
Consideration of proper alignment of data and instructions to reduce cache misses and improve results of paging Use instructions for controlling cache Partially addresses Von Neumann bottleneck Reading lowest level cache in P4 is 3 clocks Each higher level is an order of magnitude larger (10, 100)‏

15 Analysis Alias Control flow Data flow Dependence Interprocedural

16 Alias Analysis Determines if there are multiple ways to access a single data point Knowing aliases helps identify optimizations by recognizing data dependencies and locating redundant code/data updates Alias analysis is critical for global optimizations (reference parameters, globally defined data, pointers)‏

17 Control Flow Analysis Precursor to critical loop reductions
Replacement of inefficient code Gathers information concerning hierarchical flow of control Identifies potential branches in program execution useful for mitigating pipeline hazards

18 Example: Fibonacci

19 Example: Fibonacci

20 Example: Fibonacci

21 Data Flow Analysis Procure information about how a procedure uses data
Builds on structures from control flow analysis There are many ways to achieve goal: Reaching definitions Calculate potential definitions at a give point in the code Iterative Analysis Use control graph Structural Analysis etc

22 Dependence Analysis* Recognizes relationships using a DAG
True/Flow dependence Anitdependence Output dependence Input dependence (does not affect execution order)‏ Instruction scheduling Data caching

23 Interprocedural Analysis
Incorporates analysis methods discussed earlier, but on a broader level OOD and high level coding methodologies are optimal for human understanding, not computer processing Includes analysis of relationships between function calls to mitigate overhead of OOD oriented code

24 Compiler Optimization Overview
1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

25 Loop Optimizations* Loop optimizations have the greatest impact on overall code performance Desire to reduce dependencies to allow ILP Desire to reduce overhead of jumping and branching in loop Predictability – predicting loop behavior to mitigate pipeline hazards Loops must be well behaved Single return No breaks, branches, etc

26 Loop Strength Reduction

27 Procedure Optimizations
Based on control flow Desire to eliminate overhead of context switches Possibly turn function calls into branches Optimizations occur at high and low level High level – Procedure integration Low level – In line expansion Conventions Leaf routines (call no others) have reduced overhead Shrink wrapping creates pseudo leaves by adding data flow analysis

28 Tail Call Optimization: Tail Recursion

29 Code Scheduling* Block Scheduling Branch Scheduling
Blocks optimized as independent pieces of code Cross block scheduling applied to optimized blocks Branch Scheduling Fill stall cycles after branch with independent code Reduces effect of bad branch predictions in HW pipeline Software Pipelining Executes multiple iterations of loops synchronously

30 Register Allocation Applies to low level assembly
Loops and nesting are used to weigh which values should be maintained in registers Nested loops weigh more heavily Considers variable activity before and after block of code is accessed Use of operation costs and number of times they are performed

31 Register Allocation Calculation

32 Register Allocation: Graph Coloring
Use subset of objects that should be allocated to registers Arcs represent points where two objects exist at the same time Arcs represent conflicts where the object cannot be assigned a register (int, float)‏ Color graph with number of colors equal to number of registers Assign registers based on color

33 Redundancy Elimination
Based on data flow analysis Intermediate level optimization Includes: Common subexpression elimination Loop invariant code motion Partial redundancy elimination Code hoisting

34 Peephole Optimizations
Focused on very small subsets of code Generally performed late in the code process Arguably covers up bad and incomplete optimizations from earlier processes Some examples include: Dead code elimination (created from earlier optimizations)‏ Strength reductions Constant folding Instruction combining Copy propagation Algebraic simplifications

35 Compiler Optimization Overview
1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development

36 Continuous Relevance of Compiler Development
Back end of compilers for older languages are reworked to take advantage of advances in hardware Pipelines are becoming longer Multiple cores are now common allowing more use of parallel instructions

37 Research Areas Domain specific subjects: security, reliability, parallel, distributed, embedded, mobile Analysis, prediction, and debugging tools Embedded JIT compilation Development of a research compiler (GCC)‏ Enhancing compiler optimization times, specifically iterative and whole program optimizations MS F# - functional language for .NET like ML

38 Compiler Job Options Additional exploitation of parallel computing environments for desktop platforms Multiple OS/Environment support Integration of AI techniques, machine learning, to know when, how, where to apply optimizations (GCC)‏ Special purpose languages for video, graphics, and audio processing (nVidea)‏ Special purpose vendors for embedded products (Wind River, VxWorks)‏

39 Compiler Job Options Library adaptation for reconfigurable processors (GCC)‏ Fault tolerance and exception handling for security

40 Compiler Optimization Problems
Many optimizations are localized Non-local optimizations create increased overhead in the computation process Multiple objectives of optimizations create conflicts For example: speed vs executable size


Download ppt "Compiler Optimization Overview"

Similar presentations


Ads by Google