Instruction Selection Mooly Sagiv Schrierber 317 03-640-7606 Wed 10:00-12:00 html://www.math.tau.ac.il/~msagiv/courses/wcc01.html.

Slides:



Advertisements
Similar presentations
CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.
Advertisements

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
Abstract Syntax Mooly Sagiv html:// 1.
Computer Architecture CSCE 350
Activation Records Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Chapter.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Basic Blocks Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Chapter 8.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
Register Allocation Mooly Sagiv html://
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Activation Records Mooly Sagiv html:// Chapter 6.3.
Liveness Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html://
CS 536 Spring Code generation I Lecture 20.
Activation Records Mooly Sagiv Schrierber Wed 14:00-15:00 this week only html://
Activation Records Mooly Sagiv html:// Chapter 6.3.
Register Allocation Mooly Sagiv Make-Up Class Optional Material Friday 10:00-13:00 Schriber 6 html://
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
PSUCS322 HM 1 Languages and Compiler Design II Code Generation Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
1 Basic Block, Trace and Instruction Selection Chapter 8, 9.
Activation Records Mooly Sagiv html:// Chapter 6.3.
Arithmetic Expression Consider the expression arithmetic expression: (a – b) + ((c + d) + (e * f)) that can be represented as the following tree.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Activation Records Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Chapter.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
Code Generation Mooly Sagiv html:// Chapter 4.
Register Allocation Mooly Sagiv html://
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Tree representation =>
Compiler Summary Mooly Sagiv html://
Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Register Allocation Recap Mooly Sagiv html:// Special Office Hours Wednesday 12-14, Thursday 12-14, Schriber.
Course Overview Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Abstract Syntax Mooly Sagiv html://
Code Generation Mooly Sagiv html:// Chapter 4.
Register Allocation Mooly Sagiv html://
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Instruction Selection II CS 671 February 26, 2008.
COMPILER DESIGN Fourth Year (First Semester) Lecture 1
Code Generation Compiler Baojian Hua
Automatic compilation Student name: Eldad Uzman Student ID : Lecturer : DR Itzhak Aviv.
Abstract Syntax Mooly Sagiv Schrierber Wed 10:00-12:00 html://
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Introduction CPSC 388 Ellen Walker Hiram College.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Intermediate Code Representations
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Eg: - Some expressions.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
2/24/2016© Hal Perkins & UW CSEN-1 CSE P 501 – Compilers Instruction Selection Hal Perkins Autumn 2009.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
CS 404 Introduction to Compiler Design
COMPILERS Instruction Selection
Mooly Sagiv html://
Parsing and Parser Parsing methods: top-down & bottom-up
Textbook:Modern Compiler Design
Introduction to Compilers Tim Teitelbaum
Instruction Selection Hal Perkins Autumn 2011
Compiler Construction
Instruction Selection Hal Perkins Autumn 2005
Presentation transcript:

Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://

Basic Compiler Phases Source program (string) Fin. Assembly lexical analysis syntax analysis semantic analysis Translate Instruction selection Register Allocation Tokens Abstract syntax tree Intermediate representation Assembly Frame

Instruction Selection Input: –Cannonical IR –Description of translation rules from IR into machine language Output –Machine code Unbounded number of registers Some prologue and epilogue instructions are missing

LABEL(l3) CJUMP(EQ, TEMP t128, CONST 0, l0, l1) LABEL( l1) MOVE(TEMP t131, TEMP t128) MOVE(TEMP t130, CALL(nfactor, BINOP(MINUS, TEMP t128, CONST 1))) MOVE(TEMP t129, BINOP(TIMES, TEMP t131, TEMP t130)) LABEL(l2) MOVE(TEMP t103, TEMP t129) JUMP(NAME lend) LABEL(l0) MOVE(TEMP t129, CONST 1) JUMP(NAME l2) Missing updates for static link

l3:beq t128, $0, l0 l1: or t131, $0, t128 addi t132, t128, -1 or $4, $0, t132 jal nfactor or t130, $0, $2 or t133, $0, t131 mult t133, t130 mflo t133 or t129, $0, t133 l2: or t103, $0, t129 b lend l0: addi t129, $0, 1 b l2

LABEL(l3) CJUMP(EQ, TEMP t128, CONST 0, l0, l1) LABEL( l1) MOVE(TEMP t131, TEMP t128) MOVE(TEMP t130, CALL(nfactor, BINOP(MINUS, TEMP t128, CONST 1))) MOVE(TEMP t129, BINOP(TIMES, TEMP t131, TEMP t130)) LABEL(l2) MOVE(TEMP t103, TEMP t129) JUMP(NAME lend) LABEL(l0) MOVE(TEMP t129, CONST 1) JUMP(NAME l2) l3:beq t128, $0, l0 l1: or t131, $0, t128 addi t132, t128, -1 or $4, $0, t132 jal nfactor or t130, $0, $2 or t133, $0, t131 mult t133, t130 mflo t133 or t129, $0, t133 l2: or t103, $0, t129 b lend l0: addi t129, $0, 1 b l2 prologue epilogue

The Challenge “Clumps” of trees can be translated into a single machine instruction MOVE MEM BINOP TEMP t1 PLUSTEMP t2CONST c lw t1, c(t2)

Outline The “Tiling” problem An optimal solution An optimum solution (via dynamic programming) Tree grammars The Pentium architecture Instruction selection for Tiger –Abstract data type for machine instructions

Instruction set in the Jouette Machine ADDr i  r j + r k MUL r i  r j * r k SUBr i  r j - r k DIVr i  r j / r k ADDIr i  r j + c SUBIr i  r j - c LOADr i  M[r j + c] STORE M[r i + c]  r j MOVEM M[r i ]  M[r j ] r0 0r0 0

Tree Patterns for Jouette Machine NameEffectTrees rjrj TEMP ADD MUL r i  r j + r j r i  r j * r k +(e 1, e 2 ) *(e 1, e 2 ) SUB DIV r i  r j - r k r i  r j / r k -(e 1, e 2 ) /(e 1, e 2 )

Tree Patterns for Jouette Machine(cont) NameEffectTrees ADDI r i  r j + c +(e, CONST c) +(CONST c, e) CONST c SUBI r i  r j - c -(e, CONST c) LOAD r i  M[r j + c] MEM(+(e, CONST(c)) MEM(+(CONST(c),e) MEM(CONST c) MEM(e) STORE M[r i + c]  r j MOVE(MEM(+(e 1, CONST c)), e 2 ) MOVE(MEM(+(CONST c, e 1 )), e 2 ) MOVE(MEM(CONST c), e 2 ) MOVE(MEM(e 1 ), e 2 ) MOVEM M[r i ]  M[r j ] MOVE(MEM(e 1 ), MEM(e 2 ))

The Tiling Problem Cover the tree with non overlapping tiles from the tree patterns Minimize “the cost” of the generated code

Example Tiger input a[e] := x PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r LOAD r 1  M[FP + -8] MUL r 2  te * r 2 ADD r 1  r 1 +r 2 MEM LOAD r 2  M[FP + -4] STORE M[ r 1 + 0]  r 2

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMES TEMP te CONST 4 BINOP PLUS TEMP FP CONST -4 LOAD r 1  M[FP + -8] ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 LOAD r 2  M[FP + -4] STORE M[ r 1 + 0]  r 2 MEM

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r LOAD r 1  M[FP + -8] MUL r 2  te * r 2 ADD r 1  r 1 +r 2 MEM MOVEM M[ r 1 ]  M[ r 2 ] ADDI r 2  FP + -4

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMES TEMP te CONST 4 BINOP PLUS TEMP FP CONST -4 LOAD r 1  M[FP + -8] ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADD r 2  FP + r 2 MOVEM M[ r 1 ]  M[ r 2 ] MEM

The Tiling Problem Cover the tree with non overlapping tiles from the tree patterns Minimize “the cost” of the generated code Assures that every tree can be covered –Tree patterns for all the “tiny” tiles

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADDI r 2  FP + -4 ADDI r 1  r ADD r 1  FP + r 1 LOAD r 1  M[r 1 +0] MEM LOAD r 2  M[r 2 + 0] STORE M[ r 1 + 0]  r 2

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMES TEMP te CONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 1  r ADD r 1  FP + r 1 LOAD r 1  M[r 1 + 0] ADDI r 2  r 0 +4 MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADDI r 2  r ADD r 2  FP + r 2 LOAD r 2  M[r 2 + 0] STORE M[ r 1 ]  r 2 MEM

Optimal vs. Optimum Tiling Optimum Tiling –Minimum cost of tile sum Optimal Tiling –No two adjacent tiles can be combined to reduce the cost

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r LOAD r 1  M[FP + -8] MUL r 2  te * r 2 ADD r 1  r 1 +r 2 MEM LOAD r 2  M[FP + -4] STORE M[ r 1 + 0]  r 2

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r LOAD r 1  M[FP + -8] MUL r 2  te * r 2 ADD r 1  r 1 +r 2 MEM MOVEM M[ r 1 ]  M[ r 2 ] ADDI r 2  FP + -4

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADDI r 2  FP + -4 ADDI r 1  r ADD r 1  FP + r 1 LOAD r 1  M[r 1 +0] MEM LOAD r 2  M[r 2 + 0] STORE M[ r 1 + 0]  r 2

Optimum Tiling LOAD r 1  M[FP + -8] ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 LOAD r 2  M[FP + -4] STORE M[ r 1 + 0]  r 2 LOAD r 1  M[FP + -8] ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADD r 2  FP + r 2 MOVEM M[ r 1 ]  M[ r 2 ]

RISC vs. CISC Machines FeatureRISCCISC Registers  32 6, 8, 16 Register ClassesOneSome Arithmetic OperandsRegistersMemory+Registers Instructions3-addr2-addr Addressing Modes r M[r+c] (l,s) several Instruction Length32 bitsVariable Side-effectsNoneSome Instruction-Cost“Uniform”Varied

Architecture and Tiling Algorithm RISC –Cost of operations is uniform –Optimal tiling usually suffices CISC –Optimum tiling may be significantly better

Optimal Tiling using “Maximal Munch” Top-down traversal of the IR tree At every node try the relevant tree patterns in “cost-order” Generate assembly code in reverse order Tiny tiles guarantees that we can never get stuck

static void munchStm(T_stm s) { switch(s->kind) { case T_MOVE: T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src; if (dst->kind==T_MEM) if (dst->u.MEM->kind==T_BINOP && dst->u.MEM->u.BINOP.op==T_PLUS && dst->u.MEM->u.BINOP.right.kind==T_CONST) { T_exp e1 =dst->u.MEM->u.BINOP.left, e2=src; /* MOVE(MEM(BINOP(PLUS, e 1, CONST c,), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); } else if (dst->u.MEM->kind==T_BINOP && dst->u.MEM->u.BINOP.op==T_PLUS && dst->u.MEM->u.BINOP.left.kind==T_CONST) { T_exp e1 =dst->u.MEM->u.BINOP.right, e2=src; /* MOVE(MEM(BINOP(PLUS, CONST c, e1), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); }

static void munchStm(T_stm s) { switch(s->kind) { case T_MOVE: T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src; if (dst->kind==T_MEM) if (… ) { /* MOVE(MEM(BINOP(PLUS, e 1, CONST c,), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); } else if (…) { /* MOVE(MEM(BINOP(PLUS, CONST c, e1), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); } else if (src->kind==T_MEM) { T_exp e1= dst->u.MEM, e2=src->u.MEM; /* MOVE(MEM(e1), MEM(e2)) */ munchExp(e1), munchExp(e2); emit(“MOVEM”) ;} else { T_exp e1=dst->u.MEM, e2=src; /* MOVE(MEM(e1), e2) */ munchExp(e1), munchExp(e2); emit(“STORE”) ;}

case T_MOVE: T_exp dst = s->u.MOVE.dst, src=s->u.MOVE.src; if (dst->kind==T_MEM) if (… ) { /* MOVE(MEM(BINOP(PLUS, e 1, CONST c,), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); } else if (…) { /* MOVE(MEM(BINOP(PLUS, CONST c, e1), e 2 ) */ munchExp(e1); munchExp(e2); emit(“STORE”); } else if (…) { /* MOVE(MEM(e1), MEM(e2)) */ munchExp(e1), munchExp(e2); emit(“MOVEM”) ;} else {/* MOVE(MEM(e1), e2) */ munchExp(e1), munchExp(e2); emit(“STORE”) ;} else if (dst->kind==T_TEMP) { T_exp e=src; /* MOVE(TEMP t, e) */ munchExp(e); emit(“ADD”); } else assert(0);

static void munchStm(T_stm s) { MOVE(MEM(BINOP(PLUS, e 1, CONST c), e 2 )  munchExp(e1); munchExp(e2); emit(“STORE”); MOVE(MEM(BINOP(PLUS, CONST c, e1), e 2 )  munchExp(e1); munchExp(e2); emit(“STORE”); MOVE(MEM(e1), MEM(e2))  munchExp(e1), munchExp(e2); emit(“MOVEM”) ; MOVE(MEM(e1), e2)  munchExp(e1), munchExp(e2); emit(“STORE”) ; MOVE(TEMP t, e)  munchExp(e); emit(“ADD”); JUMP(e)  … CJUMP(e)  … LABEL(l)  }

static void munchExp(T_exp e) { MEM(BINOP(PLUS, e, CONST c))  munchExp(e); emit(“LOAD”); MEM(BINOP(PLUS, CONST c, e 1 )  munchExp(e); emit(“LOAD”); MEM(CONST c)  emit(“LOAD”); MEM(e)  munchExp(e); emit(“LOAD”); BINOP(PLUS, e, CONST c)  munchExp(e); emit(“ADDI”); BINOP(PLUS, CONST c, e)  munchExp(e); emit(“ADDI”); BINOP(CONST c)  munchExp(e); emit(“ADDI”); BINOP(PLUS, e1, e2)  munchExp(e1; munchExp(e2); emit(“ADD”); BINOP(TIMES, e1, e2)  munchExp(e1; munchExp(e2); emit(“MUL”); … TEMP t 

Example Tiger input a[e] := x PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM MOVE(MEM(BINOP(PLUS, e1, CONST c), e2) MOVE(MEM(BINOP(PLUS, CONST c, e1), e2) MOVE(MEM(e1), MEM(e2)) MOVE(MEM(e1), e2) MOVE(TEMP t, e) STOREM

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) MEM(CONST c) BINOP(PLUS, e, CONST c) BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … STOREM ADD

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) MEM(CONST c) BINOP(PLUS, e, CONST c) BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … STOREM ADD LOAD

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM MEM(BINOP(PLUS, e, CONST c)) … TEMP T STOREM ADD LOAD

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM STOREM ADD LOAD MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) … BINOP(TIMES, e1, e2) … MUL

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM STOREM ADD LOAD MUL … BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … TEMP T

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM STOREM ADD LOAD MUL MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) MEM(CONST c) BINOP(PLUS, e, CONST c) BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … ADDI

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM STOREM ADD LOAD MUL MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) MEM(CONST c) BINOP(PLUS, e, CONST c) BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … ADDI

PLUS MOVE MEM BINOP MEM BINOP PLUSTEMP FPCONST -8 TIMESTEMP teCONST 4 BINOP PLUSTEMP FPCONST -4 MEM STOREM ADD LOAD MUL MEM(BINOP(PLUS, e, CONST c)) MEM(BINOP(PLUS, CONST c, e1) MEM(CONST c) BINOP(PLUS, e, CONST c) BINOP(PLUS, CONST c, e) BINOP(CONST c) BINOP(PLUS, e1, e2) … ADDI

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMESTEMP teCONST 4 BINOP PLUS TEMP FP CONST -4 ADDI r 2  r LOAD r 1  M[FP + -8] MUL r 2  te * r 2 ADD r 1  r 1 +r 2 MEM MOVEM M[ r 1 ]  M[ r 2 ] ADDI r 2  FP + -4

PLUS MOVE MEM BINOP MEM BINOP PLUS TEMP FP CONST -8 TIMES TEMP te CONST 4 BINOP PLUS TEMP FP CONST -4 LOAD r 1  M[FP + -8] ADDI r 2  r MUL r 2  te * r 2 ADD r 1  r 1 +r 2 ADD r 2  FP + r 2 MOVEM M[ r 1 ]  M[ r 2 ] MEM

Optimum Tiling Maximal munch does not necessarily produce optimum results –CISC How to efficiently generate optimum tiling The number of potential code sequences is quite big

Optimum Tiling via Dynamic Programming Assign optimum cost to every sub-tree Two phase solution –Find the optimum cost for every subtree in a bottom up traversal –Generate the optimum solution in a top down traversal Skip arbitrary subtrees

Dynamic Programming For each subtree with root n –For each tile t which matches n of cost c Calculate the cost of t as: c +  c i –The cost of the subtree rooted at n is the minimum of all matching tiles Generate the optimum code during top- down traversal

Example MEM BINOP PLUSCONST 1CONST 2

CONST 1 TileInstructionTile Cost Leaves Cost Total Cost CONST CADDI101

CONST 2 TileInstructionTile Cost Leaves Cost Total Cost CONST CADDI101

TileInst.Tile Cost Leaves Cost Total Cost ADD11+13 ADDI BINOP PLUSCONST 1CONST 2 CONST C BINOP PLUS e BINOP PLUS e1 e2 BINOP PLUS CONST c e

MEM BINOP PLUS CONST 1 CONST 2

TileInst.Tile Cost Leaves Cost Total Cost LOAD LOAD e1 BINOP PLUS e2 MEM e BINOP PLUSe CONST c MEM BINOP PLUS CONST c e

MEM BINOP PLUS CONST 1 CONST 2 Top-Down Code Generation ADDI(1) ADDI(2) LOAD(2) ADDI r 1  r LOAD r 1  M[r 1 + 2]

The “Schizo”-Jouette Machine In the spirit of Motorola Two types of registers –data registers –address registers Arithmetic performed on data registers Load and Store using address registers Machine instructions convert between addresses and data

Tree Patterns for Schizo-Jouette NameEffectTrees djdj TEMP ajaj ADD MUL d i  d j + d j d  d j * d k d+(e 1, e 2 ) d*(e 1, e 2 ) SUB DIV d i  d j - d k d i  d j / d k d-(e 1, e 2 ) d/(e 1, e 2 )

Tree Patterns for Schizo-Jouette Machine NameEffectTrees ADDI d i  d j + c d+(e, CONST c) d+(CONST c, e) dCONST c SUBI d i  d j - c d-(e, CONST c) LOAD d i  M[a j + c] dMEM(+(ae, CONST(c)) dMEM(+(CONST(c),ae) dMEM(CONST c) dMEM(ae) STORE M[a i + c]  d j MOVE(MEM(+(ae 1, CONST c)), de 2 ) MOVE(MEM(+(CONST c, ae 1 )), de 2 ) MOVE(MEM( CONST c), de 2 ) MOVE(MEM(ae 1 ), de 2 ) MOVEM M[a i ]  M[a j ] MOVE(MEM(a 1 ), MEM(a 2 ))

Tree Patterns for Schizo-Jouette NameEffectTrees MOVEA d i  a j d a MOVED a i  d j a d

Tree Grammars A generalization of dynamic programming Input –A (usually ambiguous) context free grammar describing the machine tree patterns non-terminals correspond to machine types every production has machine cost –A linearized IR tree Output –A parse-tree with the minimum cost

Partial Grammar for Schizo-Jouette d  TEMP t a  TEMP t d  +(d, d) d  +(d, CONST) d  +(CONST, d) d  MEM(+(a, CONST)) d  MEM(+(CONST, a)) d  MEM(CONST) d  MEM(a) d  a a  d MEM(+(CONST 1, CONST 2))

Simple Instruction-Selection in the Pentium Architecture Six general purpose registers The multiply requires that the left arg. is eax Two-address instructions Arithmetic on memory Several addressing modes Variable-length instructions Instructions with side-effects Good register allocation For t 1  t 2 * t 3 –mov eax, t 1 –mul t 2 –mov t 3, eax For t 1  t 2 + t 3 –mov t 1, t 2 –add t 1, t 3 add [ebp –8], ecx –mov eax, [ebp –8] –add eax, ecx –mov [ebp-8], eax

Instruction-Selection in the Tiger Compiler Use maximal munch Store the generated code in an abstract data type –The following phases are machine-independent –Control flow of the program is explicitly represented –Special representation of MOVE Register allocation can remove

/* assem.h */ typedef struct {Temp_labelList labels;} AS_targets; AS_targets AS_Targets(Temp_labelList labels); typedef struct AS_instr_ *AS_instr; typedef enum {I_OPER, I_LABEL, I_MOVE} AS_instr_kind; struct AS_instr_ { AS_instr_kind kind; union {struct {string assem; Temp_tempList dst, src; AS_targets jumps;} OPER; struct {string assem; Temp_label label;} LABEL; struct {string assem; Temp_tempList dst, src;} MOVE; } u; }; AS_instr AS_Oper(string a, Temp_tempList d, Temp_tempList s, AS_targets j); AS_instr AS_Label(string a, Temp_label label); AS_instr AS_Move(string a, Temp_tempList d, Temp_tempList s);

Summary Type of Machines –CISC(Pentium, MC68000, IBM 370) –RISC(MIPS, Sparc) –Other VLIW(Itanium) Types of Instruction-Selection Algorithms –Ad hock –Optimal using Maximal-Munch –Optimum using Dynamic Programming –Optimum using Tree-Grammars and ambiguous parsers