Intermediate Representations Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

Slides:



Advertisements
Similar presentations
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Advertisements

Intermediate Code Generation
1 CS 201 Compiler Construction Machine Code Generation.
1 Compiler Construction Intermediate Code Generation.
Program Representations. Representing programs Goals.
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
10/22/2002© 2002 Hal Perkins & UW CSEG-1 CSE 582 – Compilers Intermediate Representations Hal Perkins Autumn 2002.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Hash Tables1 Part E Hash Tables  
From Cooper & Torczon1 Implications Must recognize legal (and illegal) programs Must generate correct code Must manage storage of all variables (and code)
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Precision Going back to constant prop, in what cases would we lose precision?
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
The Procedure Abstraction Part I: Basics Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Intermediate Representations. Front end - produces an intermediate representation (IR) Middle end - transforms the IR into an equivalent IR that runs.
The Procedure Abstraction, Part VI: Inheritance in OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
The Procedure Abstraction, Part V: Support for OOLs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Chapter 5 A Closer Look at Instruction Set Architectures.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
Intermediate Code Representations
12/18/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Winter 2008.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Lecture 5 A Closer Look at Instruction Set Architectures Lecture Duration: 2 Hours.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
CS 404 Introduction to Compiler Design
Introduction to Optimization
Compiler Construction (CS-636)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Optimization Code Optimization ©SoftMoore Consulting.
Compiler Construction
Intermediate Representations Hal Perkins Autumn 2011
Introduction to Optimization
Intermediate Representations
Introduction to Code Generation
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Chapter 6 Intermediate-Code Generation
Intermediate Representations
The Last Lecture COMP 512 Rice University Houston, Texas Fall 2003
Intermediate Representations Hal Perkins Autumn 2005
Introduction to Optimization
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Target Code Generation
Presentation transcript:

Intermediate Representations Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010 Most of the material in this lecture comes from Chapter 5 of EaC

Obvious answer: at the start of Chapter 5 in EaC More important answer We are on the cusp of the art, science, & engineering of compilation Scanning & parsing are applications of automata theory Context-sensitive analysis, as covered in class, is mostly software engineering The mid-section of the course will focus on issues where the compiler writer needs to choose among alternatives —The choices matter; they affect the quality of compiled code —There may be no “best answer” or “best practice” To my mind, the fun begins at this point Comp 412, Fall Where In The Course Are We?

Comp 412, Fall Intermediate Representations Front end - produces an intermediate representation (IR) Middle end - transforms the IR into an equivalent IR that runs more efficiently Back end - transforms the IR into native code IR encodes the compiler’s knowledge of the program Middle end usually consists of several passes Front End Middle End Back End IR Source Code Target Code

Comp 412, Fall Intermediate Representations Decisions in IR design affect the speed and efficiency of the compiler Some important IR properties —Ease of generation —Ease of manipulation —Procedure size —Freedom of expression —Level of abstraction The importance of different properties varies between compilers —Selecting an appropriate IR for a compiler is critical

Comp 412, Fall Types of Intermediate Representations Three major categories Structural —Graphically oriented —Heavily used in source-to-source translators —Tend to be large Linear —Pseudo-code for an abstract machine —Level of abstraction varies —Simple, compact data structures —Easier to rearrange Hybrid —Combination of graphs and linear code —Example: control-flow graph Examples: Trees, DAGs Examples: 3 address code Stack machine code Example: Control-flow graph

Comp 412, Fall Level of Abstraction The level of detail exposed in an IR influences the profitability and feasibility of different optimizations. Two different representations of an array reference: subscript Ai j loadI 1 => r 1 sub r j, r 1 => r 2 loadI 10 => r 3 mult r 2, r 3 => r 4 sub r i, r 1 => r 5 add r 4, r 5 => r 6 => r 7 add r 7, r 6 => r 8 load r 8 => r Aij High level AST: Good for memory disambiguation Low level linear code: Good for address calculation

Comp 412, Fall Level of Abstraction Structural IRs are usually considered high-level Linear IRs are usually considered low-level Not necessarily true: + * 10 j 1 - j 1 - load Low level AST loadArray A,i,j High level linear code In Chapter 11 of EaC, we will see trees that have a lower level of abstraction than the machine code!

Comp 412, Fall Abstract Syntax Tree An abstract syntax tree is the procedure’s parse tree with the nodes for most non-terminal nodes removed x - 2 * y Can use linearized form of the tree —Easier to manipulate than pointers x 2 y * - in postfix form - * 2 y x in prefix form S-expressions ( Scheme,Lisp ) are ( essentially ) ASTs - x 2 y *

Comp 412, Fall Directed Acyclic Graph A directed acyclic graph (DAG) is an AST with a unique node for each value Makes sharing explicit Encodes redundancy x 2 y * -  z /  w z  x - 2 * y w  x / 2 With two copies of the same expression, the compiler might be able to arrange the code to evaluate it only once.

Comp 412, Fall Stack Machine Code Originally used for stack-based computers, now Java Example: x - 2 * y becomes Advantages Compact form Introduced names are implicit, not explicit Simple to generate and execute code Useful where code is transmitted over slow communication links (the net ) push x push 2 push y multiply subtract Implicit names take up no space, where explicit ones do!

Comp 412, Fall Three Address Code Several different representations of three address code In general, three address code has statements of the form: x  y op z With 1 operator (op ) and, at most, 3 names (x, y, & z) Example: z  x - 2 * ybecomes Advantages: Resembles many real machines Introduces a new set of names Compact form t  2 * y z  x - t *

Comp 412, Fall Three Address Code: Quadruples Naïve representation of three address code Table of k * 4 small integers Simple record structure Easy to reorder Explicit names load1y loadi22 mult321 load4x sub543 load r1, y loadI r2, 2 mult r3, r2, r1 load r4, x sub r5, r4, r3 RISC assembly codeQuadruples The original F ORTRAN compiler used “quads”

Comp 412, Fall Three Address Code: Triples Index used as implicit name 25% less space consumed than quads Much harder to reorder Remember, for a long time, 640Kb was a lot of RAM (1) loady (2) loadI2 (3) mult(1)(2) (4) loadx (5) sub(4)(3) Implicit names occupy no space

Comp 412, Fall Three Address Code: Indirect Triples List first triple in each statement Implicit name space Uses more space than triples, but easier to reorder Major tradeoff between quads and triples is compactness versus ease of manipulation —In the past compile-time space was critical —Today, speed may be more important Stmt List Implicit Names Indirect Triples (100) loady (105)(101)loadI2 (102)mult(100)(101) (103)loadx (104)sub(103)(102)

Comp 412, Fall Two Address Code Allows statements of the form x  x op y Has 1 operator (op ) and, at most, 2 names (x and y) Example: z  x - 2 * y becomes Can be very compact Problems Machines no longer rely on destructive operations Difficult name space —Destructive operations make reuse hard —Good model for machines with destructive ops (PDP-11) t 1  2 t 2  load y t 2  t 2 * t 1 z  load x z  z - t 2

Comp 412, Fall Control-flow Graph Models the transfer of control in the procedure Nodes in the graph are basic blocks —Can be represented with quads or any other linear representation Edges in the graph represent control flow Example if (x = y) a  2 b  5 a  3 b  4 c  a * b Basic blocks — Maximal length sequences of straight-line code

Comp 412, Fall Static Single Assignment Form The main idea: each name defined exactly once Introduce  -functions to make it work Strengths of SSA-form Sharper analysis  -functions give hints about placement (sometimes) faster algorithms Original x  … y  … while (x < k) x  x + 1 y  y + x SSA-form x 0  … y 0  … if (x 0 >= k) goto next loop:x 1   (x 0,x 2 ) y 1   (y 0,y 2 ) x 2  x y 2  y 1 + x 2 if (x 2 < k) goto loop next: …

Comp 412, Fall Using Multiple Representations Repeatedly lower the level of the intermediate representation —Each intermediate representation is suited towards certain optimizations Example: the Open64 compiler —WHIRL intermediate format  Consists of 5 different IRs that are progressively more detailed and less abstract Front End Middle End Back End IR 1IR 3 Source Code Target Code Middle End IR 2

Comp 412, Fall Memory Models Two major models Register-to-register model —Keep all values that can legally be stored in a register in registers —Ignore machine limitations on number of registers —Compiler back-end must insert loads and stores Memory-to-memory model —Keep all values in memory —Only promote values to registers directly before they are used —Compiler back-end can remove loads and stores Compilers for RISC machines usually use register-to-register —Reflects programming model —Easier to determine when registers are used

Comp 412, Fall The Rest of the Story… Representing the code is only part of an IR There are other necessary components Symbol table Constant table —Representation, type —Storage class, offset Storage map —Overall storage layout —Overlap information —Virtual register assignments

Comp 412, Fall Symbol Tables Classic approach to building a symbol table uses hashing Personal preference: a two-table scheme —Sparse index to reduce chance of collisions —Dense table to hold actual data  Easy to expand, to traverse, to read & write from/to files Use chains in index to handle collisions h(“foe”) Collision occurs when h() returns a slot in the sparse index that is already full. h(“fee”) fie | char * | array | … fee | integer | scalar | … fum | float | scalar | … NextSlot Stack-like growth Sparse indexDense table See §B.3 in EaC for a longer explanation

Comp 412, Fall Hash-less Symbol Tables Classic approach to building a symbol table uses hashing Some concern about worst-case behavior —Collisions in the hash function can lead to linear search —Some authors advocate “perfect” hash for keyword lookup Automata theory lets us avoid worst-case behavior h(“foe”) Collision occurs when h() returns a slot in the sparse index that is already full. h(“fee”) My favorite hash table organization fie | char * | array | … fee | integer | scalar | … fum | float | scalar | … NextSlot Stack-like growth Sparse indexDense table

Comp 412, Fall Hash-less Symbol Tables One alternative is Paige & Cai’s multiset discrimination Order the name space offline Assign indices to each name Replace the names in the input with their encoded indices Using DFA techniques, we can build a guaranteed linear-time replacement for the hash function h DFA that results from a list of words is acyclic —RE looks like r 1 | r 2 | r 3 | … | r k —Could process input twice, once to build DFA, once to use it We can do even better Digression on page 241 of EaC

Comp 412, Fall Hash-less Symbol Tables Classic approach to building a symbol table uses hashing Some concern about worst-case behavior —Collisions in the hash function can lead to linear search —Some authors advocate “perfect” hash for keyword lookup Automata theory lets us avoid worst-case behavior Replace the hash function, h, and the sparse index with an efficient direct map, d, … d(“foe”) d(“fum”) fie | char * | array | … fee | integer | scalar | … fum | float | scalar | … NextSlot Stack-like growth Sparse indexDense table d(“fee”) d(“fie”)

Comp 412, Fall Hash-less Symbol Tables Incremental construction of an acyclic DFA To add a word, run it through the DFA —At some point, it will face a transition to the error state —At that point, start building states & transitions to recognize it Requires a memory access per character in the key —If DFA grows too large, memory access costs become excessive —For small key sets (e.g., names in a procedure), not a problem Optimizations —Last state on each path can be explicit  Substantial reduction in memory costs  Instantiate when path is lengthened —Trade off granularity against size of state representation —Encode capitalization separately  Bit strings tied to final state?