Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

Slides:



Advertisements
Similar presentations
Peephole Optimization & Other Post-Compilation Techniques 1COMP 512, Rice University Copyright 2011, Keith D. Cooper, Linda Torczon, & Jason Eckhardt,
Advertisements

1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
2015/6/10\course\cpeg421-08s\Topic-6.ppt1 Topic 6 Basic Back-End Optimization Instruction Selection Instruction scheduling Register allocation.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11/19/2002© 2002 Hal Perkins & UW CSEN-1 CSE 582 – Compilers Instruction Selection Hal Perkins Autumn 2002.
Code Shape III Booleans, Relationals, & Control flow Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
2015/6/28\course\cpeg421-10F\Topic-2.ppt1 Topic 2: Foundamentals in Basic Back-End Optimization Instruction Selection Instruction scheduling Register allocation.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Peephole Optimization Improve code by examining and changing a small sequence (peephole) of code at a time. Improve code by examining and changing a small.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Overview of the Course Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Instruction Selection II CS 671 February 26, 2008.
Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Local Register Allocation & Lab Exercise 1 Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.
Instruction Selection and Scheduling. The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Local Instruction Scheduling — A Primer for Lab 3 — Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Intermediate Code Representations
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Instruction Selection, Part I Selection via Peephole Optimization Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Introduction to Optimization
Compiler Design (40-414) Main Text Book:
Instruction Selection II: Tree-pattern Matching Comp 412
Chapter 1 Introduction.
Introduction to Compiler Construction
Parsing — Part II (Top-down parsing, left-recursion removal)
Chapter 1 Introduction.
Local Register Allocation & Lab Exercise 1
Local Instruction Scheduling
Context-sensitive Analysis
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Unit IV Code Generation
Local Instruction Scheduling — A Primer for Lab 3 —
Code Shape III Booleans, Relationals, & Control flow
Instruction Selection, II Tree-pattern matching
Compiler Construction
Intermediate Representations
The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003
Peephole Optimization & Other Post-Compilation Techniques COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Instruction Selection Hal Perkins Autumn 2005
Parsing — Part II (Top-down parsing, left-recursion removal)
Local Register Allocation & Lab Exercise 1
Introduction to Optimization
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Presentation transcript:

Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate construction of components Front end construction is largely automated Middle is largely hand crafted (Parts of ) back end can be automated Front EndBack EndMiddle End Infrastructure Today’s lecture: Automating Instruction Selection

Definitions Instruction selection Mapping IR into assembly code Assumes a fixed storage mapping & code shape Combining operations, using address modes Instruction scheduling Reordering operations to hide latencies Assumes a fixed program (set of operations) Changes demand for registers Register allocation Deciding which values will reside in registers Changes the storage mapping, may add false sharing Concerns about placement of data & memory operations

The Problem Modern computers (still) have many ways to do anything Consider register-to-register copy in I LOC Obvious operation is i2i r i  r j Many others exist Human would ignore all of these Algorithm must look at all of them & find low-cost encoding  Take context into account ( busy functional unit? ) And I LOC is an overly-simplified case addI r i,0  r j subI r i,0  r j lshiftI r i,0  r j multI r i,1  r j divI r i,1  r j rshiftI r i,0  r j orI r i,0  r j xorI r i,0  r j … and others …

The Goal Want to automate generation of instruction selectors Machine description should also help with scheduling & allocation Front EndBack EndMiddle End Infrastructure Tables Pattern Matching Engine Back-end Generator Machine description Description-based retargeting

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 24 ) ran quickly How good was the code? x IDENT IDENT loadI4  r 5 loadAOr 0,r 5  r 6 loadI8  r 7 loadAOr 0,r 7  r 8 multr 6,r 8  r 9 loadAIr 0,4  r 5 loadAIr 0,8  r 6 multr 5,r 6  r 7 TreeTreewalk CodeDesired Code

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 24 ) ran quickly How good was the code? x IDENT IDENT loadI4  r 5 loadAOr 0,r 5  r 6 loadI8  r 7 loadAOr 0,r 7  r 8 multr 6,r 8  r 9 loadAIr 0,4  r 5 loadAIr 0,8  r 6 multr 5,r 6  r 7 TreeTreewalk CodeDesired Code Pretty easy to fix. See 1 st digression in Ch. 7

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 24 ) ran quickly How good was the code? x IDENT NUMBER loadI4  r 5 loadAOr 0,r 5  r 6 loadI2  r 7 multr 6,r 7  r 8 loadAIr 0,4  r 5 multIr 5,2  r 7 TreeTreewalk CodeDesired Code

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 24 ) ran quickly How good was the code? x IDENT NUMBER loadI4  r 5 loadAOr 0,r 5  r 6 loadI2  r 7 multr 6,r 7  r 8 loadAIr 0,4  r 5 multIr 5,2  r 7 TreeTreewalk CodeDesired Code Must combine these This is a nonlocal problem

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator ( Lec. 24 ) ran quickly How good was the code? x IDENT IDENT  r 5 loadI 4  r 6 loadAOr 5,r 6  r 7  r 7 loadI4  r 8 loadAOr 8,r 9  r 10 multr 7,r 10  r 11 loadI4  r 5 loadAIr  r 6 loadAIr  r 7 multr 6,r 7  r 8 TreeTreewalk CodeDesired Code

The Big Picture Need pattern matching techniques Must produce good code ( some metric for good ) Must run quickly Our treewalk code generator met the second criteria ( lec. 24 ) How did it do on the first ? x IDENT IDENT  r 5 loadI 4  r 6 loadAOr 5,r 6  r 7  r 7 loadI4  r 8 loadAOr 8,r 9  r 10 multr 7,r 10  r 11 loadI4  r 5 loadAIr  r 6 loadAIr  r 7 multr 6,r 7  r 8 TreeTreewalk CodeDesired Code Common offset Again, a nonlocal problem

How do we perform this kind of matching ? Tree-oriented IR suggests pattern matching on trees Tree-patterns as input, matcher as output Each pattern maps to a target-machine instruction sequence Use dynamic programming or bottom-up rewrite systems Linear IR suggests using some sort of string matching Strings as input, matcher as output Each string maps to a target-machine instruction sequence Use text matching ( Aho-Corasick ) or peephole matching In practice, both work well; matchers are quite different

Peephole Matching Basic idea Compiler can discover local improvements locally  Look at a small set of adjacent operations  Move a “peephole” over code & search for improvement Classic example was store followed by load storeAI r 1  r 0,8 loadAI r 0,8  r 15 storeAI r 1  r 0,8 i2i r 1  r 15 Original codeImproved code

Peephole Matching Basic idea Compiler can discover local improvements locally  Look at a small set of adjacent operations  Move a “peephole” over code & search for improvement Classic example was store followed by load Simple algebraic identities addI r 2,0  r 7 mult r 4,r 7  r 10 mult r 4,r 2  r 10 Original codeImproved code

Peephole Matching Basic idea Compiler can discover local improvements locally  Look at a small set of adjacent operations  Move a “peephole” over code & search for improvement Classic example was store followed by load Simple algebraic identities Jump to a jump jumpI  L 10 L 10 : jumpI  L 11 Original codeImproved code

Peephole Matching Implementing it Early systems used limited set of hand-coded patterns Window size ensured quick processing Modern peephole instruction selectors (Davidson) Break problem into three tasks Apply symbolic interpretation & simplification systematically Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM

Peephole Matching Expander Turns IR code into a low-level IR (LLIR) such as RTL Operation-by-operation, template-driven rewriting LLIR form includes all direct effects ( e.g., setting cc ) Significant, albeit constant, expansion of size Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM

Peephole Matching Simplifier Looks at LLIR through window and rewrites is Uses forward substitution, algebraic simplification, local constant propagation, and dead-effect elimination Performs local optimization within window This is the heart of the peephole system  Benefit of peephole optimization shows up in this step Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM

Peephole Matching Matcher Compares simplified LLIR against a library of patterns Picks low-cost pattern that captures effects Must preserve LLIR effects, may add new ones ( e.g., set cc ) Generates the assembly code output Expander IR  LLIR Simplifier LLIR  LLIR Matcher LLIR  ASM IR LLIR ASM

Example Original IR Code OPArg 1 Arg 2 Result mult2Yt1t1 subxt1t1 w LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Expand

Example LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Simplify LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18

Introduced all memory operations & temporary names Turned out pretty good code Example Match LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18 I LOC Code loadAI r y  r 13 multI 2 x r 13  r 14 loadAI r x  r 17 sub r 17 - r 14  r 18 storeAI r 18  r w

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 10  2 r 11 y r 12  r 0 + r 11

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 10  2 r 11 y r 12  r 0 + r 11 r 10  2 r 12  r 0 y r 13  M EM (r 12 )

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 10  2 r 12  r 0 y r 13  M EM (r 12 ) r 10  2 r 13  M EM (r 0 y) r 14  r 10 x r 13

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 13  M EM (r 0 y) r 14  2 x r 13 r 15 x r 10  2 r 13  M EM (r 0 y) r 14  r 10 x r 13

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 13  M EM (r 0 y) r 14  2 x r 13 r 15 x r 14  2 x r 13 r 15 x r 16  r 0 + r 15 1 st op it has rolled out of window

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 14  2 x r 13 r 15 x r 16  r 0 + r 15 r 14  2 x r 13 r 16  r 0 x r 17  M EM (r 16 )

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 r 14  2 x r 13 r 16  r 0 x r 17  M EM (r 16 )

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 17  M EM (r 0 x) r 18  r 17 - r 14 r 19 w r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 r 17  M EM (r 0 x) r 18  r 17 - r 14 r 19 w

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18 r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 18  r 17 - r 14 M EM (r 0 w)  r 18 r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18

Steps of the Simplifier ( 3-operation window ) LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 r 18  r 17 - r 14 M EM (r 0 w)  r 18 r 18  r 17 - r 14 r 20  r 0 w M EM (r 20 )  r 18

Example LLIR Code r 10  2 r 11 y r 12  r 0 + r 11 r 13  M EM (r 12 ) r 14  r 10 x r 13 r 15 x r 16  r 0 + r 15 r 17  M EM (r 16 ) r 18  r 17 - r 14 r 19 w r 20  r 0 + r 19 M EM (r 20 )  r 18 Simplify LLIR Code r 13  M EM (r 0 y) r 14  2 x r 13 r 17  M EM (r 0 x) r 18  r 17 - r 14 M EM (r 0 w)  r 18

Making It All Work Details LLIR is largely machine independent ( RTL ) Target machine described as LLIR  ASM pattern Actual pattern matching  Use a hand-coded pattern matcher (gcc)  Turn patterns into grammar & use LR parser (VPO) Several important compilers use this technology It seems to produce good portable instruction selectors Key strength appears to be late low-level optimization