2/24/2016© 2002-09 Hal Perkins & UW CSEN-1 CSE P 501 – Compilers Instruction Selection Hal Perkins Autumn 2009.

Slides:



Advertisements
Similar presentations
CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Advertisements

Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Intermediate Code Generation
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
The University of Adelaide, School of Computer Science
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
10/22/2002© 2002 Hal Perkins & UW CSEG-1 CSE 582 – Compilers Intermediate Representations Hal Perkins Autumn 2002.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Run time vs. Compile time
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
Improving Code Generation Honors Compilers April 16 th 2002.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Instruction Selection II CS 671 February 26, 2008.
Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.
Compiler Chapter# 5 Intermediate code generation.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 7 Syntax-Directed Compilation (AST & Target Code) 1.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Simple Code Generation CS153: Compilers. Code Generation Next PS: Map Fish code to MIPS code Issues: –eliminating compound expressions –eliminating variables.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
12/18/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Winter 2008.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Compiler Design (40-414) Main Text Book:
Instruction Selection II: Tree-pattern Matching Comp 412
COMPILERS Instruction Selection
A Simple Syntax-Directed Translator
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Intermediate Representations Hal Perkins Autumn 2011
Instruction Selection Hal Perkins Autumn 2011
Instruction Scheduling Hal Perkins Summer 2004
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Chapter 6 Intermediate-Code Generation
Instruction Selection, II Tree-pattern matching
Intermediate Representations
Register Allocation Hal Perkins Summer 2004
Register Allocation Hal Perkins Autumn 2005
Instruction Selection Hal Perkins Autumn 2005
Intermediate Representations Hal Perkins Autumn 2005
Instruction Scheduling Hal Perkins Autumn 2005
Parsing & Context-Free Grammars Hal Perkins Summer 2004
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Presentation transcript:

2/24/2016© Hal Perkins & UW CSEN-1 CSE P 501 – Compilers Instruction Selection Hal Perkins Autumn 2009

2/24/2016© Hal Perkins & UW CSEN-2 Agenda Compiler back-end organization Instruction selection – tree pattern matching Credits: Slides by Keith Cooper (Rice); Appel ch. 9; burg/iburg slides by Preston Briggs, CSE 501 Sp09 Burg/iburg paper: “Engineering a Simple, Efficient Code Generator”, Fraser, Hanson, & Proebsting, ACM LOPLAS v1, n3 (Sept. 1992)

2/24/2016© Hal Perkins & UW CSEN-3 Compiler Organization parsescansemantics front end opt2opt1optn middle isntr. sched instr. selectreg. alloc back end infrastructure – symbol tables, trees, graphs, etc

2/24/2016© Hal Perkins & UW CSEN-4 Big Picture Compiler consists of lots of fast stuff followed by hard problems Scanner: O(n) Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete

2/24/2016© Hal Perkins & UW CSEN-5 IR for Code Generation Assume a low-level RISC-like IR 3 address, register-register instructions + load/store r1 <- r2 op r3 Could be tree structure or linear Expose as much detail as possible Assume “enough” (i.e.,  ) registers Invent new temporaries for intermediate results Map to actual registers later

2/24/2016© Hal Perkins & UW CSEN-6 Overview Instruction Selection Map IR into assembly code Assume known storage layout and code shape i.e., the optimization phases have already done their thing Combine low-level IR operations into machine instructions (take advantage of addressing modes, etc.)

2/24/2016© Hal Perkins & UW CSEN-7 Overview Instruction Scheduling Reorder operations to hide latencies – processor function units; memory/cache Originally invented for supercomputers (1960s) Now important everywhere Even non-RISC machines, i.e., x86 Even if processor reorders on the fly Assume fixed program

2/24/2016© Hal Perkins & UW CSEN-8 Overview Register Allocation Map values to actual registers Previous phases change need for registers Add code to spill values to temporaries as needed, etc. Usually worth doing another pass of instruction scheduling afterwards if spill code inserted

2/24/2016© Hal Perkins & UW CSEN-9 How Hard? Instruction selection Can make locally optimal choices Global is undoubtedly NP-Complete Instruction scheduling Single basic block – quick heuristics General problem – NP Complete Register allocation Single basic block, no spilling, interchangeable registers – linear General – NP Complete

2/24/2016© Hal Perkins & UW CSEN-10 Conventional Wisdom We typically lose little by solving these independently But not always, of course Instruction selection Use some form of pattern matching  virtual registers – create as needed Instruction scheduling Within a block, list scheduling is close to optimal Across blocks: EBBs or trace scheduling if list scheduling not good enough Register allocation Start with unlimited virtual registers and map “enough” to K

2/24/2016© Hal Perkins & UW CSEN-11 An Simple Low-Level IR (1) Details not important for our purposes; point is to get a feeling for the level of detail involved This example is from Appel Expressions CONST(i) – integer constant i TEMP(t) – temporary t (i.e., register) BINOP(op,e1,e2) – application of op to e1,e2 MEM(e) – contents of memory at address e Means value when used in an expression Means address when used on left side of assignment CALL(f,args) – application of function f to argument list args

2/24/2016© Hal Perkins & UW CSEN-12 Simple Low-Level IR (2) Statements MOVE(TEMP t, e) – evaluate e and store in temporary t MOVE(MEM(e1), e2) – evaluate e1 to yield address a; evaluate e2 and store at a EXP(e) – evaluate expressions e and discard result SEQ(s1,s2) – execute s1 followed by s2 NAME(n) – assembly language label n JUMP(e) – jump to e, which can be a NAME label, or more compex (e.g., switch) CJUMP(op,e1,e2,t,f) – evaluate e1 op e2; if true jump to label t, otherwise jump to f LABEL(n) – defines location of label n in the code

2/24/2016© Hal Perkins & UW CSEN-13 Low-Level IR Example (1) Expr for a local variable at a known offset k from the frame pointer fp Linear MEM(BINOP(PLUS, TEMP fp, CONST k)) Tree MEM + TEMP fp CONST k

2/24/2016© Hal Perkins & UW CSEN-14 Low-Level IR Example (2) For an array element e(k), where each element takes up w storage locations MEM + * e k CONST w

2/24/2016© Hal Perkins & UW CSEN-15 Generating Low-Level IR Assuming initial IR is an AST, a simple treewalk can be used to generate the low-level IR Can be done before, during, or after optimizations in the middle part of the compiler Typically AST is lowered to some lower-level IR, but maybe not final lowest-level one used in instruction selection Create registers (temporaries) for values and intermediate results Value can be safely allocated to a register when only 1 name can reference it Trouble: pointers, arrays, reference parameters Assign a virtual register to anything that can go into one Generate loads/stores for other values

2/24/2016© Hal Perkins & UW CSEN-16 Instruction Selection Issues Given the low-level IR, there are many possible code sequences that implement it correctly e.g. to set eax to 0 on x86 mov eax,0xor eax,eax sub eax,eaximul eax,0 Many machine instructions do several things at once – e.g., register arithmetic and effective address calculation lea rdst,[rbase+rindex*scale+offset]

2/24/2016© Hal Perkins & UW CSEN-17 Instruction Selection Criteria Several possibilities Fastest Smallest Minimize power consumption (ex: don’t use a function unit if leaving it powered-down is a win) Sometimes not obvious e.g., if one of the function units in the processor is idle and we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldn’t be chosen normally (Some interaction with scheduling here…)

2/24/2016© Hal Perkins & UW CSEN-18 Implementation Problem: We need some representation of the target machine instruction set that facilitates code generation Idea: Describe machine instructions using same low-level IR used for program Use pattern matching techniques to pick machine instructions that match fragments of the program IR tree Want this to run quickly Would like to automate as much as possible

2/24/2016© Hal Perkins & UW CSEN-19 Matching: How? Tree IR – pattern match on trees Tree patterns as input Each pattern maps to target machine instruction (or sequence) and at least one simple target match for each kind of tree node so we can always generate something Various algorithms – max. munch, dynamic programming, … Linear IR – some sort of string matching Strings as input Each string maps to target machine instruction sequence Use text matching or peephole matching (parsing!, but with a way, way ambiguous grammar for target machine) Both work well in practice; algorithms are quite different

2/24/2016© Hal Perkins & UW CSEN-20 An Example Target Machine (1) Arithmetic Instructions – result in reg. (unnamed) ri TEMP ADD ri <- rj + rk MUL ri <- rj * rk SUB and DIV are similar + *

2/24/2016© Hal Perkins & UW CSEN-21 Immediate Instructons ADDI ri <- rj + c SUBI ri <- rj - c An Example Target Machine (2) + CONST + -

2/24/2016© Hal Perkins & UW CSEN-22 Load LOAD ri <- M[rj + c] An Example Target Machine (3) + CONST + MEM

2/24/2016© Hal Perkins & UW CSEN-23 Store – not an expression; no result STORE M[rj + c] <- ri An Example Target Machine (4) + CONST + MEM MOVE

2/24/2016© Hal Perkins & UW CSEN-24 mem->mem copy – also not an expression MOVEM M[rj] <- M[ri] An Example Target Machine (5) MEM MOVE MEM

2/24/2016© Hal Perkins & UW CSEN-25 Tree Pattern Matching (1) Goal: Tile the low-level tree with operation (instruction) trees A tiling is a collection of pairs node is a node in the tree op is an operation tree means that op could implement the subtree at node

2/24/2016© Hal Perkins & UW CSEN-26 Tree Pattern Matching (2) A tiling “implements” a tree if it covers every node in the tree and the overlap between any two tiles (trees) is limited to a single node If is in the tiling, then node is also covered by a leaf in another operation tree in the tiling – unless it is the root Where two operation trees meet, they must be compatible (i.e., expect the same value in the same location)

Example – Tree for a[i]:=x 2/24/2016© Hal Perkins & UW CSEN-27 MEM MOVE MEM + CONST x FP + MEM + CONST a FP * CONST 4 TEMP i

2/24/2016© Hal Perkins & UW CSEN-28 Generating Code Given a tiled tree, to generate code Postorder treewalk; node-dependant order for children Emit code sequences corresponding to tiles in order Connect tiles by using same register name to tie boundaries together

Tiling Algorithms (1) Maximal Munch Start at root of tree, find largest tile that fits. Cover the root node and possibly other nearby nodes. Then repeat for each subtree Generate instruction as each tile is placed Generates instructions in reverse order Generates an optimum tiling – tiles sum to lowest possible cost But not optimal – there may be another tiling where two adjacent tiles can be combined into one of lower cost 2/24/2016© Hal Perkins & UW CSEN-29

2/24/2016© Hal Perkins & UW CSEN-30 Tiling Algorithms (2) Dynamic Programming There may be many tiles that could match at a particular node Idea: Walk the tree and accumulate the set of all possible tiles that could match at that point – Tiles(n) Then: Select minimal cost for subtrees (bottom up), and go top-down to select and emit lowest-cost instructions

2/24/2016© Hal Perkins & UW CSEN-31 Tile(Node n) Tiles(n) <- empty; if n has two children then Tile(left child of n) Tile(right child of n) for each rule r that implements n if (left(r) is in Tiles(left(n)) and right(r) is in Tiles(right(n))) Tiles(n) <- Tiles(n) + r else if n has one child then Tile(child of n) for each rule r that implements n if(left(r) is in Tiles(child(n))) Tiles(n) <- Tiles(n) + r else /* n is a leaf */ Tiles(n) <- { all rules that implement n }

Tools iburg and burg and others use a combination of dynamic programming and tree pattern matching to find optimal translations Product of years of research going back to peephole optimizers Not really a research area now (like parsing), but still room for newer/better tools 2/24/2016© Hal Perkins & UW CSEN-32

Peephole Optimization Idea: Find pairs of instructions (not necessarily adjacent) and replace them with something better Instead of t2 = &c t3 = *t2 use t3 = c 2/24/2016© Hal Perkins & UW CSEN-33

Which Pairs? Chris Fraser noticed that useful pairs were connected by UD chains, so Plan: consider each instruction together with instructions that feed it UD chains reach across blocks, so instruction selector can work globally Works great with SSA too 2/24/2016© Hal Perkins & UW CSEN-34

Patterns Reduce pairs of instructions to schematics t5 = 4 t6 = t4 + t5 becomes %reg1 = %con %reg3 = %reg2 + %reg1 Find in hash table; if found, replace %reg3 = %reg2 + %con 2/24/2016© Hal Perkins & UW CSEN-35

Tree-Based Representation The tree makes the UD chains explicit Each definition in the tree is used once Typically a basic block would have a sequence of expression trees 2/24/2016© Hal Perkins & UW CSEN-36

Example Code for i = c + 4 [c a char; i an int] t1 = &i t2 = &c t3 = *t2 t4 = cvci(t3) t5 = 4 t6 = t4+t5 *t1 = t6 2/24/2016© Hal Perkins & UW CSEN-37 = &i+ 4c2i ld &c

Tree Patterns An iburg spec is a set of rules. BNF: rule = nonterm : tree = integer (cost) tree = term (tree, tree) | term (tree) | term | nonterm Terminals are IL operations Nonterminals name sets of rules 2/24/2016© Hal Perkins & UW CSEN-38

Rules Rules are numbered to the right of the = sign The cost of a rule is its cost (often 0) plus the cost of any subtrees A rule may be nested to define a pattern that matches more than one level in a tree 2/24/2016© Hal Perkins & UW CSEN-39

Example stmt : ASSIGN(addr, reg) = 1 (1) stmt : reg = 2 (0) reg : ADD(reg, rc) = 3 (1) reg: ADD(rc, reg) = 4 (1) reg: LD(addr) = 5 (1) reg: C2I(LD(addr)) = 6 (1) reg: addr = 7 (1) reg: con = 8 (1) addr: ADD(reg, con) = 9 (0) addr: ADD(con, reg) = 10 (0) addr: ADDRLP = 11 (0) // addr of local var rc: con = 12 (0) rc: reg = 13 (0) con: CNST = 14 (0) 2/24/2016© Hal Perkins & UW CSEN-40

Algorithm Pass I: bottom up Label each node with lowest cost rule to produce result from available operands (cost = cost of rule + cost of subtrees) Label is: (r, c) = best is rule r, cost is c Pass II: top down Find cheapest node that generates result needed by parent tree Emit instructions in reverse order as choices are made 2/24/2016© Hal Perkins & UW CSEN-41

Bottom-up (Labeling) 2/24/2016N-42 = &i+ 4c2i ld &c a b c e d f g stmt : ASSIGN(addr, reg) = 1 (1) stmt : reg = 2 (0) reg : ADD(reg, rc) = 3 (1) reg : ADD(rc, reg) = 4 (1) reg : LD(addr) = 5 (1) reg : C2I(LD(addr)) = 6 (1) reg : addr = 7 (1) reg : con = 8 (1) addr : ADD(reg, con) = 9 (0) addr : ADD(con, reg) = 10 (0) addr : ADDRLP = 11 (0 rc : con = 12 (0) rc : reg = 13 (0) con : CNST = 14 (0) opstmtregaddrrccon a b c d e f g

Top-Down (Reduction) 2/24/2016N-43 = &i+ 4c2i ld &c a b c e d f g stmt : ASSIGN(addr, reg) = 1 (1) stmt : reg = 2 (0) reg : ADD(reg, rc) = 3 (1) reg : ADD(rc, reg) = 4 (1) reg : LD(addr) = 5 (1) reg : C2I(LD(addr)) = 6 (1) reg : addr = 7 (1) reg : con = 8 (1) addr : ADD(reg, con) = 9 (0) addr : ADD(con, reg) = 10 (0) addr : ADDRLP = 11 (0 rc : con = 12 (0) rc : reg = 13 (0) con : CNST = 14 (0) opstmtregaddrrccon a(1,3) b(2,1)(7,1)(11,0)(13,1) c(2,2)(3,2)(9,1)(13,2) d(2,1)(6,1)(13,1) e(2,1)(8,1)(12,0)(14,0) f(2,1)(5,1)(13,1) g(2,1)(7,1)(11,0)(13,1)

2/24/2016© Hal Perkins & UW CSEN-44 Coming Attractions Instruction Scheduling Register Allocation And more….