COMPILERS Instruction Selection hussein suleman uct csc305h 2005.

Slides:



Advertisements
Similar presentations
CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.
Advertisements

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Greedy Algorithms.
1 Code generation Our book's target machine (appendix A): opcode source1, source2, destination add r1, r2, r3 addI r1, c, r2 loadI c, r2 load r1, r2 loadAI.
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
1 CS 201 Compiler Construction Machine Code Generation.
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
COMPILERS Basic Blocks and Traces hussein suleman uct csc3005h 2006.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
Code Generation Professor Yihjia Tsai Tamkang University.
PSUCS322 HM 1 Languages and Compiler Design II Code Generation Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
Arithmetic Expression Consider the expression arithmetic expression: (a – b) + ((c + d) + (e * f)) that can be represented as the following tree.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Tree representation =>
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Chapter 91 Memory Management Chapter 9   Review of process from source to executable (linking, loading, addressing)   General discussion of memory.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Instruction Selection II CS 671 February 26, 2008.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
Lecture 4: MIPS Instruction Set Reminders: –Homework #1 posted: due next Wed. –Midterm #1 scheduled Friday September 26 th, 2014 Location: TODD 430 –Midterm.
Execution of an instruction
1 Code Generation. 2 Position of a Code Generator in the Compiler Model Front-End Code Optimizer Source program Symbol Table Lexical error Syntax error.
M. Mateen Yaqoob The University of Lahore Spring 2014.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Eg: - Some expressions.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
Internal and External Sorting External Searching
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
2/24/2016© Hal Perkins & UW CSEN-1 CSE P 501 – Compilers Instruction Selection Hal Perkins Autumn 2009.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Topics to be covered Instruction Execution Characteristics
Overview of Instruction Set Architectures
COMPILERS Instruction Selection
Instruction Selection Hal Perkins Autumn 2011
Lecture 4: MIPS Instruction Set
Programming Languages (CS 550) Mini Language Compiler
Code Generation.
CS 201 Compiler Construction
Instruction Selection, II Tree-pattern matching
Compiler Construction
Static vs. dynamic scheduling
Instruction Selection Hal Perkins Autumn 2005
Chapter 12 Pipelining and RISC
8 Code Generation Topics A simple code generator algorithm
CPU Structure CPU must:
COMPILERS Liveness Analysis
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Programming Languages (CS 360) Mini Language Compiler
Presentation transcript:

COMPILERS Instruction Selection hussein suleman uct csc305h 2005

Introduction  IR expresses only one operation in each node.  MC performs several IR instructions in a single MC instruction. e.g., fetch and add BINOP PLUSeCONST i MEM

Preliminaries  Express each machine instruction as a fragment of an IR tree – “tree pattern”.  Instruction selection is then equivalent to tiling the tree with a minimal set of tree patterns.

Jouette Architecture 1/2 NameEffectTrees + TEMP riri rjrj +rkrk ADD * riri rjrj *rkrk MUL - riri rjrj -rkrk SUB / riri rjrj /rkrk DIV ADDI + CONST + riri rjrj +c SUBI riri rjrj -c - CONST LOAD M[r j + c] riri + CONST + MEM Note: All tiles on this page have an upward link like ADD

Jouette Architecture 2/2 + CONST + MEM MOVE NameEffectTrees STOREM[r j + c]riri MOVEMM[r j ]M[r i ] MEM MOVE MEM

Instruction Selection  The concept of instruction selection is tiling.  Tiles are the set of tree patterns corresponding to legal machine instructions.  We want to cover the tree with non- overlapping tiles.  Note: We wont worry about which registers to use - yet.

Tiled Tree 1 MOVE MEM + + FPCONST a * TEMP iCONST 4 + FPCONST x LOADM[fp + a] r1r1 ADDI MUL ADD LOAD STORE r2r2 r0r0 +4 r2r2 r2r2 *r3r3 r1r1 r1r1 +r2r2 M[fp + x] r4r4 M[r 1 + 0]r4r4 Operation: a[i] = x

Tiled Tree 2 MOVE MEM + + FPCONST a * TEMP iCONST 4 + FPCONST x LOADM[fp + a] r1r1 ADDI MUL ADD ADDI MOVEM r2r2 r0r0 +4 r2r2 r2r2 *r3r3 r1r1 r1r1 +r2r2 fp + x r4r4 M[r 1 ]M[r 4 ] Operation: a[i] = x

Optimum and Optimal Tilings  Best tiling corresponds to least cost instruction sequence.  Each instruction is costed (somehow).  Optimum tiling tiles sum to lowest possible value  Optimal tiling no two adjacent tiles can be combined to a tile of lower cost  Note: Optimum tiling is Optimal, but not vice versa!

Maximal Munch Algorithm  Start at the root.  Find the largest tile that fits.  Cover the root and possibly several other nodes with this tile.  Repeat for each subtree.  Generates instructions in reverse order.  If two tiles of equal size match the current node, choose either.

Maximal Munch Example + CONST 1 MEM CONST 2 MEM is matched by LOAD + CONST MEM CONST (2) is matched by ADDI Instructions emitted (in reverse order) are: ADDI r 1  r LOAD r 2  M[r 1 + 1] Note: In Jouette, r 0 is always zero!

Dynamic Programming Algorithm  Assign a cost to every node. Sum of instruction costs of the best instruction sequence that can tile that subtree.  For each node n, proceeding bottom-up: For each tile t of cost c that matches at n there will be zero or more subtrees, s i, that correspond to the leaves (bottom edges) of the tile.  Cost of matching t is cost of t + sum of costs of all child trees of t Assign tile with minimum cost to n.  Walk tree from root and emit instructions for assigned tiles.

Dynamic Programming Example 1/2 + CONST 1 MEM CONST 2 CONST is only matched by an ADDI instruction with cost 1 The + node can be matched by + + CONST + ADD cost 1 leaves 2 Total 3 ADDI cost 1 leaves 1 Total 2

Dynamic Programming Example 2/2 The MEM node can be matched by MEM + CONST + MEM LOAD cost 1 leaves 2 Total 3 LOAD cost 1 leaves 1 Total 2 Instructions emitted (in reverse order, in second pass) are: ADDI r 1  r LOAD r 2  M[r 1 + 2]

Efficiency of Algorithms  Assume (on average): T tiles K non-leaf nodes in matching tile Kp is largest number of nodes to check to find matching tile Tp no of different tiles matching at each node N nodes in tree  Cost of MM: O((Kp + Tp)N/K)  Cost of DP: O((Kp + Tp)N)  In both cases, with Kp, Tp, K constant O(N)

Handling CISC Machine Code  Fewer registers: E.g., Pentium has only 6 general registers Allocate TEMPs and solve problem later!  Register use is restricted: E.g., MUL on Pentium requires use of eax Introduce additional LOAD/MOVE instructions to copy values.  Complex addressing modes: E.g., Pentium allows ADD [ebp-8],ecx Simple code generation still works, but is not as size-efficient, and can trash registers.

Implementation Issues  If registers are allocated after instruction selection, generated code must have “holes”. Assembly code template: LOAD d0,s0 List of source registers: s0 List of destination registers: d0  Including registers trashed by instruction (e.g., return address and return value registers for CALLs)  Register allocation will then fill in the holes, by (simplistically) matching source and destination registers and eliminating redundancy.