Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Slides:



Advertisements
Similar presentations
Target Code Generation
Advertisements

Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
Intermediate Code Generation
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
1 CS 201 Compiler Construction Machine Code Generation.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
From AST to Code Generation Professor Yihjia Tsai Tamkang University.
Lecture 26 Epilogue: Or Everything else you Wanted to Know about Compilers (more accurately Everything else I wanted you to Know) Topics Getreg – Error.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Introduction to Optimizations
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
More Dataflow Analysis CS153: Compilers Greg Morrisett.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
Code Generation Professor Yihjia Tsai Tamkang University.
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
Lecture 25 Generating Code for Basic Blocks Topics Code Generation Readings: April 19, 2006 CSCE 531 Compiler Construction.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Code Generation.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
What’s in an optimizing compiler?
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Detecting Equality of Variables in Programs Bowen Alpern, Mark N. Wegman, F. Kenneth Zadeck Presented by: Abdulrahman Mahmoud.
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Compilers Modern Compiler Design
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
Topic #9: Target Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
More Code Generation and Optimization Pat Morin COMP 3002.
Chapter 8 Code Generation
Compilation (Semester A, 2013/14)
High-level optimization Jakub Yaghob
Unit IV Code Generation
Chapter 6 Intermediate-Code Generation
CS 201 Compiler Construction
Code Optimization Overview and Examples Control Flow Graph
Interval Partitioning of a Flow Graph
TARGET CODE -Next Usage
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Code Generation Part II
Target Code Generation
Code Optimization.
CS 201 Compiler Construction
Presentation transcript:

Improving code generation

Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression elimination Register tracking with last-use information Over procedures: global register allocation, register coloring Over the program: Interprocedural flow analysis

Basic blocks Better code generation requires information about points of definition and points of use of variables In the presence of flow of control, value of a variable can depend on multiple points in the program y := 12; x := y * 2; -- here x = 24 label1: … x := y * 2; -- 24? Can’t tell, y may be different A basic block is a single-entry, single-exit code fragment: values that are computed within a basic block have a single origin: more constant folding and common subexpression elimination, better register use.

Finding basic blocks To partition a program into basic blocks: Call the first instruction (quadruple) in a basic block its leader The first instruction in the program is a leader Any instruction that is the target of a jump is a leader Any instruction that follows a jump is a leader In the presence of procedures with side-effects, every procedure call ends a basic block A basic block includes the leader and all instructions that follow, up to but not including the next leader

Transformations on basic blocks Common subexpression elimination: recognize redundant computations, replace with single temporary Dead-code elimination: recognize computations not used subsequently, remove quadruples Interchange statements, for better scheduling Renaming of temporaries, for better register usage All of the above require symbolic execution of the basic block, to obtain definition/use information

Simple symbolic interpretation: next-use information If x is computed in quadruple i, and is an operand of quadruple j, j > i, its value must be preserved (register or memory) until j. If x is computed at k, k > i, the value computed at i has no further use, and be discarded (i.e. register reused) Next-use information is annotation over quadruples and symbol table. Computed on one backwards pass over quadruple.

Computing next-use Use symbol table to annotate status of variables Each operand in a quadruple carries additional information: Operand liveness (boolean) Operand next use (later quadruple ) On exit from block, all temporaries are dead (no next- use) For quadruple q: x := y op z; Record next uses of x, y,z into quadruple Mark x dead (previous value has no next use) Next use of y is q; next use of z is q; y, z are live

Dead-Code Elimination (within basic blocks) Remove quadruple x := y op z, if x is dead. Examples: If x is a temporary, not referenced in any later quadruple. x := y Dead. Can be removed. … No reference to x x := z + 2 After elimination, needs to update/recompute next-use information.

Register allocation over basic block: tracking Goal is to minimize use of registers and memory references Doubly linked data structure: For each register, indicate current contents (set of variables with equal values): register descriptor. For each variable, indicate location of current value: memory and/or registers: address descriptor. Procedure getreg determines “optimal” choice to hold result of next quadruple

Getreg: heuristics For quadruple x := y op z; if y is in R i, R i contains no other variable, y is not live, and there is no next use of y, use R i Else, try the same for z, provided architecture supports operation Else if there is an available register R j, use it Else if there is a register R k that holds a dead variable, use it If y is in R i, R i contains no other variable, and y is also in memory, use R i. Else find a register that holds a live variable, store variable in memory (spill), and use register Choose variable whose next use is farthest away

Using getreg: For x := y op z; Call getreg to obtain target register R Find current location of y, generate load into register if in memory, update address descriptor for y Ditto for z Emit instruction Update register descriptor for R, to indicate it holds x Update address descriptor for x to indicate it resides in R For x := y; Single load, register descriptor indicates that both x and y are in R. On block exit, store registers that contain live values

Using getreg: StatementsCode Generated Register Decsription Address Description Registers empty t := a – bMOV a, R0 SUB b, R0 R0 contains tt in R0 u := a – cMOV a, R1 SUB c, R1 R0 contains t R1 contains u t in R0 u in R1 v := t + uADD R1,R0R0 contains v R1 contains u u in R1 v in R0 d := v + uADD R1, R0R0 contains dd in R0 and memory The assignment d := (a-b) + (a-c) + (a-c) can be translated to t := a - b; u := a – c; v := t + u; d := v+u;

Computing dependencies in a basic block: the dag Use directed acyclic graph (dag) to recognize common subexpressions and remove redundant quadruples. Intermediate code optimization: basic block => dag => improved block => assembly Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and identifiers

Dag construction Forward pass over basic block For x := y op z; Find node labeled y, or create one Find node labeled z, or create one Create new node for op, or find an existing one with descendants y, z (need hash scheme) Add x to list of labels for new node Remove label x from node on which it appeared For x := y; Add x to list of labels of node which currently holds y

Example: dot product prod := 0; for j in loop prod := prod + a (j) * b (j); -- assume 4-byte integer end loop; Quadruples: prod := 0; -- basic block leader J := 1; start: T1 := 4 * j; -- basic block leader T2 := a (T1); T3 := 4 * j; -- redundant T4 := b (T3); T5 := T2 * T4; T6 := prod + T5 prod := T6; T7 := j + 1; j := T7 If j <= 20 goto start:

Dag for body of loop Common subexpression identified + [ ] * * + <= a b 4j prod 0 T6, prod T5 T4 T1, T3 j0j Start: T7, j T2

From dag to improved block Any topological sort of the dag is a legal evaluation order A node without a label is a dead value Prefer the label of a live variable over a temporary start: T1 := 4 * j; T2 := a [ T1] T4 := b [ T1] T5 := T2 * T4 prod := prod + T5 J := J +1 If j <=20 goto start: Fewer quadruples, fewer temporaries

Programmers don’t produce common subexpressions, code generators do! A, B : array (lo1.. hi1, lo2.. hi2); -- component size w bytes A (j, k) is at location: base_a + ((j –lo1) * (hi2 – lo2 + 1) + k –lo2) * w The following requires 19 quadruples: for k in lo.. hi loop A ( j, k) := 1 + B (j, k); end loop; Can reduce to 11 with a dag base_a + (j – lo1) * (hi2 – lo2 +1) * w is loop invariant ( loop optimization ) w is often a power of two (peephole optimization)

Beyond basic blocks: data flow analysis Basic blocks are nodes in the flow graph Can compute global properties of program as iterative algorithms on graph: Constant folding Common subexpression elimination Live-dead analysis Loop invariant computations Requires complex data structures and algorithms

Using global information: register coloring Optimal use of registers in subprogram: keep all variables in registers throughout To reuse registers, need to know lifetime of variable (set of instructions in program) Two variables cannot be assigned the same register if their lifetimes overlap Lifetime information is translated into interference graph: Each variable is a node in a graph There is an edge between two nodes if the lifetimes of the corresponding variables overlap Register assignment is equivalent to graph coloring

Graph coloring Given a graph and a set of N colors, assign a color to each vertex so two vertices connected by an edge have different colors Problem is NP-complete Fast heuristic algorithm (Chaitin) is usually linear: Any node with fewer than N -1 neighbors is colorable, so can be deleted from graph. Start with node with smallest number of neighbors. Iterate until graph is empty, then assign colors in inverse order If at any point a node has more that N -1 neighbors, need to free a register (spill). Can then remove node and continue.

Example F A B F A D E C D E C Order of removal: B, C, A, E, F, D Assume 3 colors are available : assign colors in reverse order, constrained by already colored nodes. D (no constraint) F (D) E (D) A (F, E) C (D, A ) B (A, C)

Better approach to spilling Compute required number of colors in second pass: R Need to place R – N variables in memory Spill variables with lowest usage count. Use loop structure to estimate usage.