COMPILERS Instruction Selection

Slides:



Advertisements
Similar presentations
CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.
Advertisements

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Greedy Algorithms.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 Code generation Our book's target machine (appendix A): opcode source1, source2, destination add r1, r2, r3 addI r1, c, r2 loadI c, r2 load r1, r2 loadAI.
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
1 CS 201 Compiler Construction Machine Code Generation.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
COMPILERS Basic Blocks and Traces hussein suleman uct csc3005h 2006.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
PSUCS322 HM 1 Languages and Compiler Design II Code Generation Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
Arithmetic Expression Consider the expression arithmetic expression: (a – b) + ((c + d) + (e * f)) that can be represented as the following tree.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Instruction Selection II CS 671 February 26, 2008.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
CSC 3210 Computer Organization and Programming Chapter 1 THE COMPUTER D.M. Rasanjalee Himali.
Lecture 4: MIPS Instruction Set Reminders: –Homework #1 posted: due next Wed. –Midterm #1 scheduled Friday September 26 th, 2014 Location: TODD 430 –Midterm.
Execution of an instruction
M. Mateen Yaqoob The University of Lahore Spring 2014.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Eg: - Some expressions.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
2/24/2016© Hal Perkins & UW CSEN-1 CSE P 501 – Compilers Instruction Selection Hal Perkins Autumn 2009.
Computer Organization Instructions Language of The Computer (MIPS) 2.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
Heaps, Heap Sort, and Priority Queues. Background: Binary Trees * Has a root at the topmost level * Each node has zero, one or two children * A node that.
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Topics to be covered Instruction Execution Characteristics
Assembly language.
Overview of Instruction Set Architectures
Instruction Selection II: Tree-pattern Matching Comp 412
Chapter 11 Instruction Sets
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Compiler Construction
Instruction Selection Hal Perkins Autumn 2011
Lecture 4: MIPS Instruction Set
Programming Languages (CS 550) Mini Language Compiler
Code Generation.
Chapter 6 Intermediate-Code Generation
CS 201 Compiler Construction
Instruction Selection, II Tree-pattern matching
Compiler Construction
Static vs. dynamic scheduling
Instruction Selection Hal Perkins Autumn 2005
Data Structures and Analysis (COMP 410)
CS 261 – Data Structures Trees.
Chapter 12 Pipelining and RISC
8 Code Generation Topics A simple code generator algorithm
CPU Structure CPU must:
COMPILERS Liveness Analysis
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Programming Languages (CS 360) Mini Language Compiler
(via graph coloring and spilling)
Presentation transcript:

COMPILERS Instruction Selection hussein suleman uct csc3003s 2009

Introduction IR expresses only one operation in each node. MC performs several IR instructions in a single MC instruction. e.g., fetch and add MEM BINOP PLUS e CONST i

Preliminaries Express each machine instruction as a fragment of an IR tree – “tree pattern”. Instruction selection is then equivalent to tiling the tree with a minimal set of tree patterns.

Jouette Architecture 1/2 Name Effect Trees TEMP ADD ri rj + rk + * Note: All tiles on this page have an upward link like ADD MUL ri rj * rk SUB ri rj - rk - / DIV ri rj / rk + + CONST ADDI ri rj + c CONST CONST ri rj - c - SUBI CONST ri MEM MEM MEM MEM LOAD M[rj + c] + + CONST CONST CONST

Jouette Architecture 2/2 Name Effect Trees MOVE MOVE MOVE MOVE STORE M[rj + c] ri MEM MEM MEM MEM + + CONST CONST CONST MOVE MOVEM M[rj] M[ri] MEM MEM

Instruction Selection The concept of instruction selection is tiling. Tiles are the set of tree patterns corresponding to legal machine instructions. We want to cover the tree with non- overlapping tiles. Note: We wont worry about which registers to use - yet.

Tiled Tree 1 MOVE MEM MEM + + MEM * FP CONST x + TEMP i CONST 4 FP CONST a LOAD r1 M[fp + a] r2 r0 + 4 ADDI r2 r2 * r3 MUL r1 r1 + r2 ADD LOAD r4 M[fp + x] Operation: a[i] = x STORE M[r1 + 0] r4

Tiled Tree 2 MOVE MEM MEM + + MEM * FP CONST x + TEMP i CONST 4 FP CONST a LOAD r1 M[fp + a] r2 r0 + 4 ADDI r2 * r3 MUL r2 r1 r1 + r2 ADD ADDI r4 fp + x Operation: a[i] = x MOVEM M[r1] M[r4]

Optimum and Optimal Tilings Best tiling corresponds to least cost instruction sequence. Each instruction is costed (somehow). Optimum tiling tiles sum to lowest possible value Optimal tiling no two adjacent tiles can be combined to a tile of lower cost Note: Optimum tiling is Optimal, but not vice versa!

Maximal Munch Algorithm Start at the root. Find the largest tile that fits. Cover the root and possibly several other nodes with this tile. Repeat for each subtree. Generates instructions in reverse order. If two tiles of equal size match the current node, choose either.

Maximal Munch Example + MEM is matched by LOAD + CONST 1 CONST 2 + CONST MEM MEM is matched by LOAD CONST (2) is matched by ADDI Instructions emitted (in reverse order) are: ADDI r1  r0 + 2 LOAD r2  M[r1 + 1] Note: In Jouette, r0 is always zero!

Dynamic Programming Algorithm Assign a cost to every node. Sum of instruction costs of the best instruction sequence that can tile that subtree. For each node n, proceeding bottom-up: For each tile t of cost c that matches at n there will be zero or more subtrees, si, that correspond to the leaves (bottom edges) of the tile. Cost of matching t is cost of t + sum of costs of all child trees of t Assign tile with minimum cost to n. Walk tree from root and emit instructions for assigned tiles.

Dynamic Programming Example 1/2 MEM + CONST 1 CONST 2 CONST is only matched by an ADDI instruction with cost 1 The + node can be matched by + ADD cost 1 leaves 2 Total 3 + ADDI cost 1 leaves 1 Total 2 CONST + ADDI cost 1 leaves 1 Total 2 CONST

Dynamic Programming Example 2/2 The MEM node can be matched by MEM LOAD cost 1 leaves 2 Total 3 MEM + LOAD cost 1 leaves 1 Total 2 CONST MEM LOAD cost 1 leaves 1 Total 2 + CONST Instructions emitted (in reverse order, in second pass) are: ADDI r1  r0 + 1 LOAD r2  M[r1 + 2]

Efficiency of Algorithms Assume (on average): T tiles K non-leaf nodes in matching tile Kp is largest number of nodes to check to find matching tile Tp no of different tiles matching at each node N nodes in tree Cost of MM: O((Kp + Tp)N/K) Cost of DP: O((Kp + Tp)N) In both cases, with Kp, Tp, K constant O(N)

Handling CISC Machine Code Fewer registers: E.g., Pentium has only 6 general registers Allocate TEMPs and solve problem later! Register use is restricted: E.g., MUL on Pentium requires use of eax Introduce additional LOAD/MOVE instructions to copy values. Complex addressing modes: E.g., Pentium allows ADD [ebp-8],ecx Simple code generation still works, but is not as size-efficient, and can trash registers.

Implementation Issues If registers are allocated after instruction selection, generated code must have “holes”. Assembly code template: LOAD d0,s0 List of source registers: s0 List of destination registers: d0 Including registers trashed by instruction (e.g., return address and return value registers for CALLs) Register allocation will then fill in the holes, by (simplistically) matching source and destination registers and eliminating redundancy.