CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.

Slides:

Advertisements

Similar presentations

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.

Advertisements

Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.

Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)

Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://

1 CS 201 Compiler Construction Machine Code Generation.

A simple register allocation optimization scheme.

Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.

CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.

Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.

PZ13A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13A - Processor design Programming Language Design.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.

CS 536 Spring Code generation I Lecture 20.

11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.

Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.

4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)

ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.

CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.

Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.

Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦

Linked Lists in MIPS Let’s see how singly linked lists are implemented in MIPS on MP2, we have a special type of doubly linked list Each node consists.

Instruction Selection II CS 671 February 26, 2008.

1 4.2 MARIE This is the MARIE architecture shown graphically.

CS412/413 Introduction to Compilers and Translators May 3, 1999 Lecture 34: Compiler-like Systems JIT bytecode interpreter src-to-src translator bytecode.

Code Generation Compiler Baojian Hua

Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.

Lecture 4: MIPS Instruction Set Reminders: –Homework #1 posted: due next Wed. –Midterm #1 scheduled Friday September 26 th, 2014 Location: TODD 430 –Midterm.

11/02/2009CA&O Lecture 03 by Engr. Umbreen Sabir Computer Architecture & Organization Instructions: Language of Computer Engr. Umbreen Sabir Computer Engineering.

Execution of an instruction

1 ICS 51 Introductory Computer Organization Fall 2009.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

M. Mateen Yaqoob The University of Lahore Spring 2014.

Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.

The x86 Instruction Set Lecture 16 Mon, Mar 14, 2005.

COMPILERS Instruction Selection hussein suleman uct csc305h 2005.

MS108 Computer System I Lecture 3 ISA Prof. Xiaoyao Liang 2015/3/13 1.

Compilers Modern Compiler Design

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.

Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.

Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.

1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

CS 404 Introduction to Compiler Design

Assembly language.

COMPILERS Instruction Selection

Data Transfers, Addressing, and Arithmetic

Optimization Code Optimization ©SoftMoore Consulting.

Introduction to Compilers Tim Teitelbaum

Lecture 4: MIPS Instruction Set

Programming Languages (CS 550) Mini Language Compiler

Lecture 30 (based on slides by R. Bodik)

The University of Adelaide, School of Computer Science

CS 201 Compiler Construction

Compiler Construction

EECE.3170 Microprocessor Systems Design I

EECE.3170 Microprocessor Systems Design I

Lecture 15: Code Generation - Instruction Selection

Lecture 4: Instruction Set Design/Pipelining

Programming Languages (CS 360) Mini Language Compiler

Computer Architecture and System Programming Laboratory

Presentation transcript:

CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02

CS 412/413 Spring 2002 Introduction to Compilers2 Outline Tiles: review Maximal munch algorithm Some tricky tiles –conditional jumps –instructions with fixed registers Dynamic programming algorithm

CS 412/413 Spring 2002 Introduction to Compilers3 Instruction Selection Current step: converting low-level intermediate code into abstract assembly Implement each IR instruction with a sequence of one or more assembly instructions DAG of IR instructions are broken into tiles associated with one or more assembly instructions

CS 412/413 Spring 2002 Introduction to Compilers4 Tiles + 1 t1 mov t1, t2 add $1, t2 Tiles capture compilers understanding of instruction set Each tile: sequence of machine instructions that match a subgraph of the DAG May need additional move instructions Tiling = cover the DAG with tiles t2

CS 412/413 Spring 2002 Introduction to Compilers5 Maximal Munch Algorithm Maximal Munch = find largest tiles (greedy algorithm) Start from top of tree Find largest tile that matches top node Tile remaining subtrees recursively = [ ]4 + + ebp 8 * 4 [ ] + ebp 12

CS 412/413 Spring 2002 Introduction to Compilers6 DAG Representation DAG: a node may have multiple parents Algorithm: same, but nodes with multiple parents occur inside tiles only if all parents are in the tile = [ ] + + ebp 8 * 4 [ ] + ebp 12

CS 412/413 Spring 2002 Introduction to Compilers7 Another Example x = x + 1; = [ ] ebp [ ] ebp + 8 1

CS 412/413 Spring 2002 Introduction to Compilers8 Example x = x + 1; mov 8(%ebp),t1 mov t1, t2 add $1, t2 mov t2, 8(%ebp) = [ ] ebp [ ] ebp t2 t1

CS 412/413 Spring 2002 Introduction to Compilers9 Alternate (CISC) Tiling add $1, 8(%ebp) x = x + 1; = + const r/m32 = [ ] ebp [ ] ebp + 8 1

CS 412/413 Spring 2002 Introduction to Compilers10 ADD Expression Tiles mov t2, t1 add r/m32, t1 + t2 t1 t3 + t2 r/m32 t1 + t2 const t1 mov t2, t1 add imm32, t1

CS 412/413 Spring 2002 Introduction to Compilers11 ADD Statement Tiles = + constr/m32 Intel Architecture add r/m32, r32 add r32, r/m32 add imm32, %eax add imm32, r/m32 add imm8, r/m32 = + r/m32 = + r32

CS 412/413 Spring 2002 Introduction to Compilers12 Designing Tiles Only add tiles that are useful to compiler Many instructions will be too hard to use effectively or will offer no advantage Need tiles for all single-node trees to guarantee that every tree can be tiled, e.g. mov t2, t1 add t3, t1 + t2 t1 t3

CS 412/413 Spring 2002 Introduction to Compilers13 More Handy Tiles lea instruction computes a memory address lea (t1,t2), t3 + t2 t1 t3 + lea c1(t1,t2,c2), t3 * c2 t1 t2 t3 + c1

CS 412/413 Spring 2002 Introduction to Compilers14 br t1, L Matching Jump for RISC As defined in lecture, have tjump(cond, destination) fjump(cond, destination) Our tjump/fjump translates easily to RISC ISAs that have explicit comparison result tjump L t1 cmplt t2, t3, t1 < t2 t3 MIPS

CS 412/413 Spring 2002 Introduction to Compilers15 Condition Code ISA Pentium: condition encoded in jump instruction cmp: compare operands and set flags jcc: conditional jump according to flags cmp t1, t2 jl L set condition codes test condition codes tjump L < t2 t3

CS 412/413 Spring 2002 Introduction to Compilers16 Fixed-register instructions mul r/m32 Multiply value in register eax Result: low 32 bits in eax, high 32 bits in edx jecxz L Jump to label L if ecx is zero add r/m32, %eax Add to eax No fixed registers in low IR except frame pointer Need extra move instructions

CS 412/413 Spring 2002 Introduction to Compilers17 Implementation Maximal Munch: start from top node Find largest tile matching top node and all of the children nodes Invoke recursively on all children of tile Generate code for this tile Code for children will have been generated already in recursive calls How to find matching tiles?

CS 412/413 Spring 2002 Introduction to Compilers18 Matching Tiles abstract class LIR_Stmt { Assembly munch(); } class LIR_Assign extends LIR_Stmt { LIR_Expr src, dst; Assembly munch() { if (src instanceof IR_Plus && ((IR_Plus)src).lhs.equals(dst) && is_regmem32(dst) { Assembly e = ((LIR_Plus)src).rhs.munch(); return e.append(new AddIns(dst, e.target())); } else if... } = + r/m32

CS 412/413 Spring 2002 Introduction to Compilers19 Tile Specifications Previous approach simple, efficient, but hard-codes tiles and their priorities Another option: explicitly create data structures representing each tile in instruction set –Tiling performed by a generic tree-matching and code generation procedure –Can generate from instruction set description: code generator generators –For RISC instruction sets, over-engineering

CS 412/413 Spring 2002 Introduction to Compilers20 How Good Is It? Very rough approximation on modern pipelined architectures: execution time is number of tiles Maximal munch finds an optimal but not necessarily optimum tiling Metric used: tile size

CS 412/413 Spring 2002 Introduction to Compilers21 Improving Instruction Selection Because greedy, Maximal Munch does not necessarily generate best code –Always selects largest tile, but not necessarily the fastest instruction –May pull nodes up into tiles inappropriately – it may be better to leave below (use smaller tiles) Can do better using dynamic programming algorithm

CS 412/413 Spring 2002 Introduction to Compilers22 Timing Cost Model Idea: associate cost with each tile (proportional to number of cycles to execute) –may not be a good metric on modern architectures Total execution time is sum of costs of all tiles Total cost: 5 = [ ] ebp [ ] ebp Cost=1 Cost = 2

CS 412/413 Spring 2002 Introduction to Compilers23 Finding optimum tiling Goal: find minimum total cost tiling of DAG Algorithm: for every node, find minimum total cost tiling of that node and sub-graph Lemma: once minimum cost tiling of all nodes in subgraph, can find minimum cost tiling of the node by trying out all possible tiles matching the node Therefore: start from leaves, work upward to top node

CS 412/413 Spring 2002 Introduction to Compilers24 Dynamic Programming: a[i] [ ] + * + ebp 8 4 [ ] + ebp12 mov 8(%ebp), t1 mov 12(%ebp), t2 mov (t1,t2,4), t3

CS 412/413 Spring 2002 Introduction to Compilers25 Recursive Implementation Dynamic programming algorithm uses memoization For each node, record best tile for node Start at top, recurse: –First, check in table for best tile for this node –If not computed, try each matching tile to see which one has lowest cost –Store lowest-cost tile in table and return Finally, use entries in table to emit code

CS 412/413 Spring 2002 Introduction to Compilers26 Memoization class IR_Move extends IR_Stmt { IR_Expr src, dst; Assembly best; // initialized to null int optTileCost() { if (best != null) return best.cost(); if (src instanceof IR_Plus && ((IR_Plus)src).lhs.equals(dst) && is_regmem32(dst)) { int src_cost = ((IR_Plus)src).rhs.optTileCost(); int cost = src_cost + CISC_ADD_COST; if (cost < best.cost()) best = new AddIns(dst, e.target); } …consider all other tiles… return best.cost(); } = + r/m32

CS 412/413 Spring 2002 Introduction to Compilers27 Problems with Model Modern processors: –execution time not sum of tile times –instruction order matters Processors is pipelining instructions and executing different pieces of instructions in parallel bad ordering (e.g. too many memory operations in sequence) stalls processor pipeline processor can execute some instructions in parallel (super-scalar) –cost is merely an approximation –instruction scheduling needed

CS 412/413 Spring 2002 Introduction to Compilers28 Summary Can specify code generation process as a set of tiles that relate low IR trees (DAGs) to instruction sequences Instructions using fixed registers problematic but can be handled using extra temporaries Maximal Munch algorithm implemented simply as recursive traversal Dynamic programming algorithm generates better code, can be implemented recursively using memoization Real optimization will also require instruction scheduling