Introduction to Code Generation Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

Slides:



Advertisements
Similar presentations
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Advertisements

Intermediate Code Generation
Programming Languages and Paradigms
INSTRUCTION SET ARCHITECTURES
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
1 Compiler Construction Intermediate Code Generation.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Program Representations. Representing programs Goals.
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
Code Shape IV Procedure Calls & Dispatch Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Code Shape III Booleans, Relationals, & Control flow Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Code Shape II Expressions & Arrays Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
The Procedure Abstraction Part III: Allocating Storage & Establishing Addressability Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Run time vs. Compile time
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
Context-sensitive Analysis, II Ad-hoc syntax-directed translation, Symbol Tables, andTypes.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
Improving Code Generation Honors Compilers April 16 th 2002.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Compiler Construction
The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
The Procedure Abstraction, Part V: Support for OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Compiler Chapter# 5 Intermediate code generation.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
The Procedure Abstraction, Part IV Addressability Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Code Shape II Expressions & Assignment Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Instruction Selection, Part I Selection via Peephole Optimization Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places  within a main.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Intermediate Code Generation CS 671 February 14, 2008.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Introduction to Optimization
Optimization Code Optimization ©SoftMoore Consulting.
The Procedure Abstraction Part IV: Allocating Storage & Establishing Addressability Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights.
Chapter 9 :: Subroutines and Control Abstraction
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Code Shape III Booleans, Relationals, & Control flow
Instruction Selection, II Tree-pattern matching
Intermediate Representations
Code Shape IV Procedure Calls & Dispatch
Code Shape II Expressions & Assignment
Lecture 15: Code Generation - Instruction Selection
Introduction to Optimization
Procedure Linkages Standard procedure linkage Procedure has
Presentation transcript:

Introduction to Code Generation Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010

Comp 412, Fall Structure of a Compiler A compiler is a lot of fast stuff followed by some hard problems —The hard stuff is mostly in code generation and optimization —For multicores, we need to manage parallelism & sharing —For unicore performance, allocation & scheduling are critical Instruction Selection Register Allocation Instruction Scheduling ScannerParser Analysis & Optimization O(n log n) to Exponential O(n) NP-Complete Either fast or NP-Complete words IR asm ∞ regs ∞ regs k regs

Comp 412, Fall Structure of a Compiler For the rest of 412, we assume the following model Selection is fairly simple (problem of the 1980s) Allocation & scheduling are complex Operation placement is not yet critical (unified register set) What about the IR ? Low-level, R ISC -like IR such as I LOC Has “enough” registers I LOC was designed for this stuff Instruction Selection Instruction Scheduling Register Allocation Analysis & Optimization IR asm  regs  regs  regs  regs k regs asm IR Branches, compares, & labels Memory tags Hierarchy of loads & stores Provision for multiple ops/cycle Parallelism either comes from the application, or is derived with techniques that are beyond the level of this course.

Comp 412, Fall Definitions Instruction selection Mapping IR into assembly code Assumes a fixed storage mapping & code shape Combining operations, using address modes Instruction scheduling Reordering operations to hide latencies Assumes a fixed program (set of operations) Changes demand for registers Register allocation Deciding which values will reside in registers Changes the storage mapping, may add false sharing Concerns about placement of data & memory operations These 3 problems are tightly coupled.

Comp 412, Fall Code Shape (Chapter 7) Definition All those nebulous properties of the code that effect performance Includes code, approach for different constructs, cost, storage requirements & mapping, & choice of operations Code shape is the end product of many decisions (big & small) Impact Code shape influences algorithm choice & results Code shape can encode important facts, or hide them Rule of thumb: expose as much derived information as possible Example: explicit branch targets in I LOC simplify analysis Example: hierarchy of memory operations in I LOC ( EaC, p 237 ) See Morgan’s book for more I LOC examples

Comp 412, Fall Code Shape My favorite example What if x is 2 and z is 3? What if y+z is evaluated earlier? The “best” shape for x+y+z depends on contextual knowledge —There may be several conflicting options x + y + z x + y  t1 t1+ z  t2 x + z  t1 t1+ y  t2 y + z  t1 t1+ z  t2  zyx   z yx   y zx   x zy Addition is commutative & associative for integers

Comp 412, Fall Code Shape Another example -- the case statement Implement it as cascaded if-then-else statements —Cost depends on where your case actually occurs —O(number of cases) Implement it as a binary search —Need a dense set of conditions to search —Uniform (log n) cost Implement it as a jump table —Lookup address in a table & jump to it —Uniform (constant) cost Compiler must choose best implementation strategy No amount of massaging or transforming will convert one into another Performance depends on order of cases!

Comp 412, Fall Code Shape Why worry about code shape? Can’t we just trust the optimizer and the back end? Optimizer and back end approximate the answers to many hard problems The compiler’s individual passes must run quickly It often pays to encode useful information into the IR —Shape of an expression or a control structure —A value kept in a register rather than in memory Deriving such information may be expensive, when possible Recording it explicitly in the IR is often easier and cheaper

Comp 412, Fall Generating Code for Expressions The Concept Assume an AST as input & ILOC as output Use a postorder treewalk evaluator (visitor pattern in OOD) > Visits & evaluates children > Emits code for the op itself > Returns register with result Bury complexity of addressing names in routines that it calls > base(), offset(), & val() Works for simple expressions Easily extended to other operators Does not handle control flow The Concept Assume an AST as input & ILOC as output Use a postorder treewalk evaluator (visitor pattern in OOD) > Visits & evaluates children > Emits code for the op itself > Returns register with result Bury complexity of addressing names in routines that it calls > base(), offset(), & val() Works for simple expressions Easily extended to other operators Does not handle control flow expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; } expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; }

Comp 412, Fall Generating Code for Expressions Example: Produces:  xy expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; } expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; }

Comp 412, Fall Generating Code for Expressions Example: Produces:   x y 2 expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; } expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; }

Comp 412, Fall Extending the Simple Treewalk Algorithm More complex cases for IDENTIFIER What about values that reside in registers? —Modify the IDENTIFIER case —Already in a register  return the register name —Not in a register  load it as before, but record the fact —Choose names to avoid creating false dependences What about parameter values? —Many linkages pass the first several values in registers —Call-by-value  just a local variable with a negative offset —Call-by-reference  negative offset, extra indirection What about function calls in expressions? —Generate the calling sequence & load the return value —Severely limits compiler’s ability to reorder operations

Comp 412, Fall Extending the Simple Treewalk Algorithm Adding other operators Evaluate the operands, then perform the operation Complex operations may turn into library calls Handle assignment as an operator Mixed-type expressions Insert conversions as needed from conversion table Most languages have symmetric & rational conversion tables —Original PL/I had asymmetric tables for BCD & binary integers Typical Table for Addition

Comp 412, Fall Extending the Simple Treewalk Algorithm What about evaluation order? Can use commutativity & associativity to improve code This problem is truly hard Commuting operands at a single operation is much easier 1 st operand must be preserved while 2 nd is evaluated Takes an extra register for 2 nd operand Should evaluate more demanding operand expression first (Ershov in the 1950’s, Sethi in the 1970’s) Taken to its logical conclusion, this creates Sethi-Ullman scheme for register allocation [ See Bibliography in EaC ] Local rather than global

Comp 412, Fall Generating Code in the Parser Need to generate an initial IR form Chapter 4 talks about A ST s & I LOC Might generate an A ST, use it for some high-level, near- source work such as type checking and optimization, then traverse it and emit a lower-level IR similar to I LOC for further optimization and code generation The Big Picture Recursive treewalk performs its work in a bottom-up order —Actions on non-leaves occur after actions on children Can encode same basic structure into ad-hoc SDT scheme —Identifiers load themselves & stack virtual register name —Operators emit appropriate code & stack resulting VR name —Assignment requires evaluation to an lvalue or an rvalue

Comp 412, Fall Ad-hoc SDT versus a Recursive Treewalk Goal : Expr { $$ = $1; } ; Expr:Expr PLUS Term { t = NextRegister(); emit(add,$1,$3,t); $$ = t; } |Expr MINUS Term {…} |Term { $$ = $1; } ; Term:Term TIMES Factor { t = NextRegister(); emit(mult,$1,$3,t); $$ = t; }; |Term DIVIDES Factor {…} |Factor { $$ = $1; }; Factor:NUMBER { t = NextRegister(); emit(loadI,val($1),none, t ); $$ = t; } |ID { t1 = base($1); t2 = offset($1); t = NextRegister(); emit(loadAO,t1,t2,t); $$ = t; } Goal : Expr { $$ = $1; } ; Expr:Expr PLUS Term { t = NextRegister(); emit(add,$1,$3,t); $$ = t; } |Expr MINUS Term {…} |Term { $$ = $1; } ; Term:Term TIMES Factor { t = NextRegister(); emit(mult,$1,$3,t); $$ = t; }; |Term DIVIDES Factor {…} |Factor { $$ = $1; }; Factor:NUMBER { t = NextRegister(); emit(loadI,val($1),none, t ); $$ = t; } |ID { t1 = base($1); t2 = offset($1); t = NextRegister(); emit(loadAO,t1,t2,t); $$ = t; } expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; } expr(node) { int result, t1, t2; switch (type(node)) { case , , ,  : t1  expr(left child(node)); t2  expr(right child(node)); result  NextRegister(); emit (op(node), t1, t2, result); break; case IDENTIFIER: t1  base(node); t2  offset(node); result  NextRegister(); emit (loadAO, t1, t2, result); break; case NUMBER: result  NextRegister(); emit (loadI, val(node), none, result); break; } return result; } Some details omitted for space; see Chapter 4.

Comp 412, Fall Handling Assignment (just another operator) lhs  rhs Strategy Evaluate rhs to a value ( an rvalue ) Evaluate lhs to a location ( an lvalue ) —lvalue is a register  move rhs —lvalue is an address  store rhs If rvalue & lvalue have different types —Evaluate rvalue to its “natural” type —Convert that value to the type of *lvalue Unambiguous scalars go into registers Ambiguous scalars or aggregates go into memory Keeping ambiguous values in memory lets the hardware sort out the addresses.

Comp 412, Fall Handling Assignment What if the compiler cannot determine the type of the rhs? Issue is a property of the language & the specific program For type-safety, compiler must insert a run-time check —Some languages & implementations ignore safety ( bad idea ) Add a tag field to the data items to hold type information —Explicitly check tags at runtime Code for assignment becomes more complex evaluate rhs if type(lhs)  rhs.tag then convert rhs to type(lhs) or signal a run-time error lhs  rhs evaluate rhs if type(lhs)  rhs.tag then convert rhs to type(lhs) or signal a run-time error lhs  rhs Choice between conversion & a runtime exception depends on details of language & type system Much more complex than static checking, plus costs occur at runtime rather than compile time Choice between conversion & a runtime exception depends on details of language & type system Much more complex than static checking, plus costs occur at runtime rather than compile time

Comp 412, Fall Handling Assignment Compile-time type-checking Goal is to eliminate the need for both tags & runtime checks Determine, at compile time, the type of each subexpression Use runtime check only if compiler cannot determine types Optimization strategy If compiler knows the type, move the check to compile-time Unless tags are needed for garbage collection, eliminate them If check is needed, try to overlap it with other computation Can design the language so all checks are static

Comp 412, Fall Handling Assignment with Reference Counts Reference counting is an incremental strategy for implicit storage deallocation ( alternative to batch collectors ) Simple idea —Associate a count with each heap allocated object —Increment count when pointer is duplicated —Decrement count when pointer is destroyed —Free when count goes to zero Advantages —Smaller collections amortized over all pointer assignments  Useful in real-time applications, user interfaces —Counts will be in cache, ILP may reduce expense Disadvantages —Freeing root node of a graph implies a lot of work & disruption  Can adopt a protocol to bound the work done at each point —Cyclic structures pose a problem

Comp 412, Fall Handling Assignment with Reference Counts Implementing reference counts Must adjust the count on each pointer assignment Extra code on every counted ( e.g., pointer ) assignment Code for assignment becomes This adds 1 +, 1 -, 2 loads, & 2 stores With extra functional units & large caches, the overhead may become either cheap or free … evaluate rhs lhs  count - - lhs  addr(rhs) rhs  count + + if (rhs  count = 0) free rhs evaluate rhs lhs  count - - lhs  addr(rhs) rhs  count + + if (rhs  count = 0) free rhs Likely hits in the L1 cache

Evaluating Expressions What is left for next class? Boolean & relational values —Need to extend the grammar —Need to represent the value, explicitly or implicitly Loading values that are more complex than scalars —Array elements, structure elements, characters in a string Comp 412, Fall

Extra Slides Start Here Comp 412, Fall

Comp 412, Fall The Big Picture How hard are these problems? Instruction selection Can make locally optimal choices, with automated tool Global optimality is (undoubtedly) NP-Complete Instruction scheduling Single basic block  heuristics work quickly General problem, with control flow  NP-Complete Register allocation Single basic block, no spilling, & 1 register size  linear time Whole procedure is NP-Complete

Comp 412, Fall Conventional wisdom says that we lose little by solving these problems independently Instruction selection Use some form of pattern matching Assume enough registers or target “important” values Instruction scheduling Within a block, list scheduling is “close” to optimal Across blocks, build framework to apply list scheduling Register allocation Start from virtual registers & map “enough” into k The Big Picture This slide is full of “fuzzy” terms Optimal for > 85% of blocks

Comp 412, Fall The Big Picture What are today’s hard issues issues? Instruction selection Making actual use of the tools Impact of choices on power and on functional unit placement Instruction scheduling Modulo scheduling loops with control flow Schemes for scheduling long latency memory operations Finding enough ILP to keep functional units (& cores) busy Register allocation Cost of allocation, particularly for JITs Better spilling ( space & speed )? Meaning of optimality in SSA-based allocation