CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Advertisements

Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
CH14 Instruction Level Parallelism and Superscalar Processors
CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Control Statements. Define the way of flow in which the program statements should take place. Control Statements Implement decisions and repetitions.
Optimizing Compilers for Modern Architectures Compiler Improvement of Register Usage Chapter 8, through Section 8.4.
Intermediate Representations CS 671 February 12, 2008.
Problems and Their Classes
Pipeline Hazards CSCE430/830 Pipeline: Hazards CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U. of Maine Fall,
6. Intermediate Representation Prof. O. Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and.
Topics Left Superscalar machines IA64 / EPIC architecture
Compiler Construction
©UCB CPSC 161 Lecture 3 Prof. L.N. Bhuyan
Target Code Generation
MIPS Assembly Tutorial
Target code Generation Made by – Siddharth Rakesh 11CS30036 Date – 12/11/2013.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
CENG 311 Decisions in C/Assembly Language
Intermediate Code Generation
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 CS 201 Compiler Construction Machine Code Generation.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
ELEN 468 Advanced Logic Design
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
COMPILERS Basic Blocks and Traces hussein suleman uct csc3005h 2006.
Program Representations. Representing programs Goals.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Chapter 14: Building a Runnable Program Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Basic Blocks Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Chapter 8.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
CS 536 Spring Code generation I Lecture 20.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Improving Code Generation Honors Compilers April 16 th 2002.
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Precision Going back to constant prop, in what cases would we lose precision?
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
Instruction Selection II CS 671 February 26, 2008.
11/02/2009CA&O Lecture 03 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Simple Code Generation CS153: Compilers. Code Generation Next PS: Map Fish code to MIPS code Issues: –eliminating compound expressions –eliminating variables.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Computer Architecture CSE 3322 Lecture 3 Assignment: 2.4.1, 2.4.4, 2.6.1, , Due 2/3/09 Read 2.8.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
Decision Making and Branching (cont.)
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
Code Generation CPSC 388 Ellen Walker Hiram College.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
CS 404 Introduction to Compiler Design
COMPILERS Instruction Selection
6.001 SICP Compilation Context: special purpose vs. universal machines
ELEN 468 Advanced Logic Design
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
COMPILERS Basic Blocks and Traces
Conditional Branches What distinguishes a computer from a simple calculator is its ability to make decisions Decisions are made using the if statement,
COMPILERS Basic Blocks and Traces
Intermediate Representations Hal Perkins Autumn 2011
Instructions - Type and Format
Lecture 4: MIPS Instruction Set
Unit IV Code Generation
CS 201 Compiler Construction
UNIT V Run Time Environments.
8 Code Generation Topics A simple code generator algorithm
Presentation transcript:

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras

CSE 5317/4305 L9: Instruction Selection2 Basic Blocks and Traces Many computer architectures have instructions that do not exactly match our IR representations –they do not support two-way branching as in CJUMP(op,e1,e2,l1,l2) –nested calls, such as CALL(f,[CALL(g,[...])]), will cause interference between register arguments and returned results –the nested SEQs, such as SEQ(SEQ(s1,s2),s3), impose an order of a evaluation, which restricts optimization if s1 and s2 do not interfere with each other, we want to be able to switch the SEQ(s1,s2) with the SEQ(s2,s1) because it may result to a more efficient program We will fix these problems in two phases: –transforming IR trees into a list of canonical trees, and –transforming unrestricted CJUMPs into CJUMPs that are followed by their false target label

CSE 5317/4305 L9: Instruction Selection3 Canonical Trees An IR is a canonical tree if it does not contain SEQ or ESEQ and the parent node of each CALL node is either an EXP or a MOVE(TEMP(t),...) node –we transform an IR in such a way that all ESEQs are pulled up in the IR and become SEQs at the top of the tree. At the end, we are left with nested SEQs at the top of the tree, which are eliminated to form a list of statements For example, the IR: SEQ(MOVE(NAME(x),ESEQ(MOVE(TEMP(t),CONST(1)),TEMP(t))), JUMP(ESEQ(MOVE(NAME(z),NAME(L)),NAME(z)))) is translated into: SEQ(SEQ(MOVE(TEMP(t),CONST(1)), MOVE(NAME(x),TEMP(t))) SEQ(MOVE(NAME(z),NAME(L)), JUMP(NAME(z))) which corresponds to a list of statements: [ MOVE(TEMP(t),CONST(1)), MOVE(NAME(x),TEMP(t)), MOVE(NAME(z),NAME(L)), JUMP(NAME(z)) ]

CSE 5317/4305 L9: Instruction Selection4 Some Rules ESEQ(s1,ESEQ(s2,e)) = ESEQ(SEQ(s1,s2),e) BINOP(op,ESEQ(s,e1),e2) = ESEQ(s,BINOP(op,e1,e2)) MEM(ESEQ(s,e)) = ESEQ(s,MEM(e)) JUMP(ESEQ(s,e)) = SEQ(s,JUMP(e)) CJUMP(op,ESEQ(s,e1),e2,l1,l2) = SEQ(s,CJUMP(op.e1,e2,l1,l2)) BINOP(op,e1,ESEQ(s,e2)) = ESEQ(MOVE(temp(t),e1),ESEQ(s,BINOP(op,TEMP(t),e2))) CJUMP(op,e1,ESEQ(s,e2),l1,l2) = SEQ(MOVE(temp(t),e1),SEQ(s,CJUMP(op,TEMP(t),e2,l1,l2))) MOVE(ESEQ(s,e1),e2) = SEQ(s,MOVE(e1,e2)) To handle function calls, we store the function results into a new register: CALL(f,a) = ESEQ(MOVE(TEMP(t),CALL(f,a)),TEMP(t)) That way expressions, such as +(CALL(f,a),CALL(g,b)), would not rewrite each others result register

CSE 5317/4305 L9: Instruction Selection5 Basic Blocks Need to transform any CJUMP into a CJUMP whose false target label is the next instruction after CJUMP –this reflects the conditional JUMP found in most architectures We will do that using basic blocks A basic block is a sequence of statements whose first statement is a LABEL, the last statement is a JUMP or CJUMP, and does not contain any other LABELs, JUMPs, or CJUMPs –we can only enter at the beginning of a basic block and exit at the end

CSE 5317/4305 L9: Instruction Selection6 Algorithm We first create the basic blocks for an IR tree then we reorganize the basic blocks in such a way that every CJUMP at the end of a basic block is followed by a block the contains the CJUMP false target label A secondary goal is to put the target of a JUMP immediately after the JUMP –that way, we can eliminate the JUMP (and maybe merge the two blocks) The algorithm is based on traces

CSE 5317/4305 L9: Instruction Selection7 Traces You start a trace with an unmark block and you consider the target of the JUMP of this block or the false target block of its CJUMP then, if the new block is unmarked, you append the new block to the trace, you use it as your new start, and you apply the algorithm recursively otherwise, you close this trace and you start a new trace by going back to a point where there was a CJUMP and you choose the true target this time You continue until all blocks are marked

CSE 5317/4305 L9: Instruction Selection8 Traces (cont.) This is a greedy algorithm At the end, there may be still some CJUMPs that have a false target that does not follow the CJUMP –this is the case where this false target label was the target of another JUMP or CJUMP found earlier in a trace –in that case: if we have a CJUMP followed by a true target, we negate the condition and switch the true and false targets otherwise, we create a new block LABEL(L) followed by JUMP(F) and we replace CJUMP(op,a,b,T,F) with CJUMP(op,a,b,T,L) Also, if there is a JUMP(L) followed by a LABEL(L), we remove the JUMP

CSE 5317/4305 L9: Instruction Selection9 Instruction Selection After IR trees have been put into a canonical form, they are used in generating assembly code The obvious way to do this is to macro-expand each IR tree node For example, MOVE(MEM(+(TEMP(fp),CONST(10))),CONST(3)) is macro-expanded into the pseudo-assembly code: TEMP(fp)t1 := fp CONST(10)t2 := 10 +(TEMP(fp),CONST(10))t3 := t1+t2 CONST(3)t4 := 3 MOVE(MEM(...),CONST(3))M[t3] := t4 where ti stands for a temporary variable This method generates very poor quality code It can be done using only one instruction in most architectures M[fp+10] := 3

CSE 5317/4305 L9: Instruction Selection10 Maximum Munch Maximum munch generates better code, especially for RISC machines The idea is to use tree pattern matching to map a tree pattern (a fragment of an IR tree) into a list of assembly instructions –these tree patterns are called tiles For RISC we always have one-to-one mapping (one tile to one assembly instruction) –for RISC machines the tiles are small (very few number of IR nodes) –for CISC machines the tiles are usually large since the CISC instructions are very complex

CSE 5317/4305 L9: Instruction Selection11 Tiles The following is the mapping of some tiles into MIPS code: IR Tile CONST(c)li 'd0, c +(e0,e1)add 'd0, 's0, 's1 +(e0,CONST(c))add 'd0, 's0, c *(e0,e1)mult 'd0, 's0, 's1 *(e0,CONST(2^k))sll 'd0, 's0, k MEM(e0)lw 'd0, ('s0) MEM(+(e0,CONST(c)))lw 'd0, c('s0) MOVE(MEM(e0),e1)sw 's1, ('s0) MOVE(MEM(+(e0,CONST(c))),e1)sw 's1, c('s0) JUMP(NAME(X))b X JUMP(e0)jr 's0 LABEL(X)X: nop e0e1 en IR s0s1sn tile d0

CSE 5317/4305 L9: Instruction Selection12 Tiling To translate an IR tree into assembly code, we perform tiling: –we cover the IR tree with non-overlapping tiles –we can see that there are many different tilings eg, the IR for a[i]:=x is: MOVE(MEM(+(MEM(+(TEMP(fp),CONST(20))), *(TEMP(i),CONST(4)))), MEM(+(TEMP(fp),CONST(10)))) The following are two possible tilings of the IR: lw r1, 20($fp)add r1, $fp, 20 lw r2, ilw r1, (r1) sll r2, r2, 2lw r2, i add r1, r1, r2sll r2, r2, 2 lw r2, 10($fp)add r1, r1, r2 sw r2, (r1)add r2, $fp, x lw r2, (r2) sw r2, (r1) The left tiling is obviously better since it can be executed faster

CSE 5317/4305 L9: Instruction Selection13 Optimum Tiling It's highly desirable to do optimum tiling: –to generate the shortest instruction sequence –alternatively the sequence with the fewest machine cycles This is not easy to achieve Two main ways of performing optimum tiling –using maximal munch (a greedy algorithm): you start from the IR root and from all matching tiles you select the one with the maximum number of IR nodes you go to the children of this tile and apply the algorithm recursively until you reach the tree leaves –using dynamic programming: it works from the leaves to the root it assigns a cost to every tree node by considering every tile that matches the node and calculating the minimum value of: cost of a node = (number of nodes in the tile) + (total costs of all the tile children)

CSE 5317/4305 L9: Instruction Selection14 Maximal Munch Example A lw r1, fp B lw r2, 8(r1) C lw r3, i D sll r4, r3, 2 E add r5, r2, r4 F lw r6, fp G lw r7, 16(r6) H add r8, r7, 1 I sw r8, (r5)