CSC 8505 Compiler Construction Intermediate Representations.

Slides:



Advertisements
Similar presentations
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Advertisements

Intermediate Code Generation
1 CS 201 Compiler Construction Machine Code Generation.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
Program Representations. Representing programs Goals.
Compiler Construction Sohail Aslam Lecture IR Taxonomy IRs fall into three organizational categories 1.Graphical IRs encode the compiler’s knowledge.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Chapter 14: Building a Runnable Program Chapter 14: Building a runnable program 14.1 Back-End Compiler Structure 14.2 Intermediate Forms 14.3 Code.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Improving Code Generation Honors Compilers April 16 th 2002.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
CH4.1 CSE244 Intermediate Code Generation Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Unit.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Precision Going back to constant prop, in what cases would we lose precision?
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
What is Three Address Code? A statement of the form x = y op z is a three address statement. x, y and z here are the three operands and op is any logical.
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.
Chapter 8 Intermediate Code Zhang Jing, Wang HaiLing College of Computer Science & Technology Harbin Engineering University.
Compiler Chapter# 5 Intermediate code generation.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Intermediate Code Generation
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1 June 3, June 3, 2016June 3, 2016June 3, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Introduction to Code Generation and Intermediate Representations
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Intermediate Code Representations
12/18/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Winter 2008.
Compilers Modern Compiler Design
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Code Generation CPSC 388 Ellen Walker Hiram College.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Road Map Regular Exprs, Context-Free Grammars Regular Exprs, Context-Free Grammars LR parsing algorithm LR parsing algorithm Building LR parse tables Building.
Lecture 12 Intermediate Code Generation Translating Expressions
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
CS 404 Introduction to Compiler Design
Intermediate code Jakub Yaghob
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Intermediate Code Generation
Intermediate Code Generation
Intermediate Representations Hal Perkins Autumn 2011
Intermediate Representations
Intermediate Code Generation
Chapter 6 Intermediate-Code Generation
Intermediate Representations
Intermediate code generation
8 Code Generation Topics A simple code generator algorithm
Intermediate Code Generation
Review: What is an activation record?
Intermediate Code Generating machine-independent intermediate form.
Presentation transcript:

CSC 8505 Compiler Construction Intermediate Representations

The Role of Intermediate Code 2 lexical analysis syntax analysis static checking intermediate code generation final code generation source code final code tokensintermediate code Intermediate Reps.

3 Why Intermediate Code? Closer to target language. –simplifies code generation. Machine-independent. –simplifies retargeting of the compiler. –Allows a variety of optimizations to be implemented in a machine-independent way. Many compilers use several different intermediate representations. Intermediate Reps.

4 Different Kinds of IRs Graphical IRs: the program structure is represented as a graph (or tree) structure. Example: parse trees, syntax trees, DAGs. Linear IRs: the program is represented as a list of instructions for some virtual machine. Example: three-address code. Hybrid IRs: combines elements of graphical and linear IRs. Example: control flow graphs with 3-address code. Intermediate Reps.

Types of Intermediate Languages Graphical Representations. –Consider the assignment a:=b*-c+b*-c: assign a+ ** uminus b cc b assign a + * uminus bc Intermediate Reps.5

6 Graphical IRs 1: Parse Trees A parse tree is a tree representation of a derivation during parsing. Constructing a parse tree: –The root is the start symbol S of the grammar. –Given a parse tree for  X , if the next derivation step is  X     1 …  n  then the parse tree is obtained as: Intermediate Reps.

7 Graphical IRs 2: Abstract Syntax Trees (AST) A syntax tree shows the structure of a program by abstracting away irrelevant details from a parse tree. –Each node represents a computation to be performed; –The children of the node represents what that computation is performed on. Intermediate Reps.

8 Abstract Syntax Trees: Example Grammar : E  E + T | T T  T * F | F F  ( E ) | id Input: id + id * id Parse tree: Syntax tree: Intermediate Reps.

9 Syntax Trees: Structure Expressions: –leaves: identifiers or constants; –internal nodes are labeled with operators; –the children of a node are its operands. Statements: –a node’s label indicates what kind of statement it is; –the children correspond to the components of the statement. Intermediate Reps.

10 Graphical IRs 3: Directed Acyclic Graphs (DAGs) A DAG is a contraction of an AST that avoids duplication of nodes. reduces compiler memory requirements; exposes redundancies. E.g.: for the expression (x+y)*(x+y), we have: AST: DAG: Intermediate Reps.

11 Linear IRs A linear IR consists of a sequence of instructions that execute in order. –“machine-independent assembly code” Instructions may contain multiple operations, which (if present) execute in parallel. They often form a starting point for hybrid representations (e.g., control flow graphs). Intermediate Reps.

12 Linear IR 1: Three Address Code Instructions are of the form ‘ x = y op z,’ where x, y, z are variables, constants, or “temporaries”. At most one operator allowed on RHS, so no ‘built-up” expressions. Instead, expressions are computed using temporaries (compiler-generated variables). The specific set of operators represented, and their level of abstraction, can vary widely. Intermediate Reps.

13 Three Address Code: Example Source: if ( x + y*z > x*y + z) a = 0; Three Address Code: t1 = y*z t2 = x+t1 // x + y*z t3 = x*y t4 = t3+z // x*y + z if (t2  t4) goto L a = 0 L: Intermediate Reps.

Three Address Code Statements of general form x:=y op z No built-up arithmetic expressions are allowed. As a result, x:=y + z * w should be represented as t 1 :=z * w t 2 :=y + t 1 x:=t 2 Intermediate Reps.14

Three Address Code Observe that given the syntax-tree or the dag of the graphical representation we can easily derive a three address code for assignments as above. In fact three-address code is a linearization of the tree. Three-address code is useful: related to machine-language/ simple/ optimizable. Intermediate Reps.15

Example of 3-address code t 1 :=- c t 2 :=b * t 1 t 5 :=t 2 + t 2 a:=t 5 t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 Intermediate Reps.16

Types of Three-Address Statements Assignment Statement:x:=y op z Assignment Statement:x:=op z Copy Statement:x:=z Unconditional Jump:goto L Conditional Jump:if x relop y goto L Stack Operations:Push/pop Intermediate Reps.17

Types of Three-Address Statements Procedure: param x 1 param x 2 … param x n call p,n Index Assignments: x:=y[i] x[i]:=y Address and Pointer Assignments: x:=&y x:=*y *x:=y Intermediate Reps.18

19 An Example Intermediate Instruction Set Assignment: –x = y op z (op binary) –x = op y (op unary); –x = y Jumps: –if ( x op y ) goto L (L a label); –goto L Pointer and indexed assignments: –x = y[ z ] –y[ z ] = x –x = &y –x = *y –*y = x. Procedure call/return: –param x, k (x is the k th param) –retval x –call p –enter p –leave p –return –retrieve x Type Conversion: –x = cvt_ A _to_ B y ( A, B base types) e.g.: cvt_int_to_float Miscellaneous –label L Intermediate Reps.

20 Three Address Code: Representation Each instruction represented as a structure called a quadruple (or “quad”): –contains info about the operation, up to 3 operands. –for operands: use a bit to indicate whether constant or Symbol Table pointer. E.g.: x = y + z if ( x  y ) goto L Intermediate Reps.

Implementations of 3-address statements Quadruples t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 oparg1arg2result (0)uminusct1t1 (1)*bt1t1 t2t2 (2)uminusc (3)*bt3t3 t4t4 (4)+t2t2 t4t4 t5t5 (5):=t5t5 a Temporary names must be entered into the symbol table as they are created. Intermediate Reps.21

Implementations of 3-address statements, II Triples t 1 :=- c t 2 :=b * t 1 t 3 :=- c t 4 :=b * t 3 t 5 :=t 2 + t 4 a:=t 5 oparg1arg2 (0)uminusc (1)*b(0) (2)uminusc (3)*b(2) (4)+(1)(3) (5)assigna(4) Temporary names are not entered into the symbol table. Intermediate Reps.22

Other types of 3-address statements e.g. ternary operations like x[i]:=yx:=y[i] require two or more entries. e.g. oparg1arg2 (0)[ ] =xi (1)assign(0)y oparg1arg2 (0)[ ] =yi (1)assignx(0) Intermediate Reps.23

Implementations of 3-address statements, III Indirect Triples oparg1arg2 (14)uminusc (15)*b(14) (16)uminusc (17)*b(16) (18)+(15)(17) (19)assigna(18) op (0)(14) (1)(15) (2)(16) (3)(17) (4)(18) (5)(19) Intermediate Reps.24

25 Linear IRs 2: Stack Machine Code Sometimes called “One-address code.” Assumes the presence of an operand stack. –Most operations take (pop) their operands from the stack and push the result on the stack. Example: code for “x*y + z” Stack machine code push x push y mult push z add Three Address Code tmp1 = x tmp2 = y tmp3 = tmp1 * tmp2 tmp4 = z tmp5 = tmp3 + tmp4 Intermediate Reps.

26 Stack Machine Code: Features Compact –the stack creates an implicit name space, so many operands don’t have to be named explicitly in instructions. –this shrinks the size of the IR. Necessitates new operations for manipulating the stack, e.g., “swap top two values”, “duplicate value on top.” Simple to generate and execute. Interpreted stack machine codes easy to port. Intermediate Reps.

27 Linear IRs 3: Register Transfer Lang. (GNU RTL) Inspired by (and has syntax resembling) Lisp lists. Expressions are not “flattened” as in three- address code, but may be nested. –gives them a tree structure. Incorporates a variety of machine-level information. Intermediate Reps.

28 RTLs (cont ’ d) Low-level information associated with an RTL expression include: “machine modes” – gives the size of a data object; information about access to registers and memory; information relating to instruction scheduling and delay slots; whether a memory reference is “volatile.” Intermediate Reps.

29 RTLs: Examples Example operations: –(plus: m x y), (minus: m x y), (compare: m x y), etc., where m is a machine mode. –(cond [test 1 value 1 test 2 value 2 …] default) –(set lval x) ( assigns x to the place denoted by lval ). –(call func argsz), (return) –(parallel [x 0 x 1 …]) (simultaneous side effects). –(sequence [ins 1 ins 2 … ]) Intermediate Reps.

30 RTL Examples (cont ’ d) A call to a function at address a passing n bytes of arguments, where the return value is in a (“hard”) register r : (set (reg:m r ) (call (mem:fm a ) n )) –here m and fm are machine modes. A division operation where the result is truncated to a smaller size: (truncate:m 1 (div:m 2 x (sign_extend:m 2 y))) Intermediate Reps.

31 Hybrid IRs Combine features of graphical and linear IRs: –linear IR aspects capture a lower-level program representation; –graphical IR aspects make control flow behavior explicit. Examples: –control flow graphs –static single assignment form (SSA). Intermediate Reps.

32 Hybrid IRs 1: Control Flow Graphs Example: L1: if x > y goto L0 t1 = x+1 x = t1 L0: y = 0 goto L1 Definition: A control flow graph for a function is a directed graph G = (V, E) such that: –each v  V is a straight-line code sequence (“basic block”); and –there is an edge a  b  E iff control can go directly from a to b. Intermediate Reps.

33 Basic Blocks Definition: A basic block B is a sequence of consecutive instructions such that: 1.control enters B only at its beginning; and 2.control leaves B only at its end (under normal execution); and This implies that if any instruction in a basic block B is executed, then all instructions in B are executed.  for program analysis purposes, we can treat a basic block as a single entity. Intermediate Reps.

34 Identifying Basic Blocks 1.Determine the set of leaders, i.e., the first instruction of each basic block: –the entry point of the function is a leader; –any instruction that is the target of a branch is a leader; –any instruction following a (conditional or unconditional) branch is a leader. 2.For each leader, its basic block consists of: –the leader itself; –all subsequent instructions upto, but not including, the next leader. Intermediate Reps.

35 Example int dotprod(int a[], int b[], int N) { int i, prod = 0; for (i = 1; i  N; i++) { prod += a[i]  b[i]; } return prod; } No.Instructionleader?Block No. 1 enter dotprod Y 1 2 prod = i = t1 = 4*i Y 2 5 t2 = a[t1] 2 6 t3 = 4*i 2 7 t4 = b[t3] 2 8 t5 = t2*t4 2 9 t6 = prod+t prod = t t7 = i+i 2 12 i = t if i  N goto retval prod Y 3 15 leave dotprod 3 16 return 3 Intermediate Reps.

36 Hybrid IRs 2: Static Single Assignment Form The Static Single Assignment (SSA) form of a program makes information about variable definitions and uses explicit. –This can simplify program analysis. A program is in SSA form if it satisfies: –each definition has a distinct name; and –each use refers to a single definition. To make this work, the compiler inserts special operations, called  -functions, at points where control flow paths join. Intermediate Reps.

37 SSA Form:  - Functions A  -function behaves as follows: x 1 = … x 2 = … x 3 =  (x 1, x 2 ) This assigns to x 3 the value of x 1, if control comes from the left, and that of x 2 if control comes from the right. On entry to a basic block, all the  -functions in the block execute (conceptually) in parallel. Intermediate Reps.

38 SSA Form: Example Example: Original code Code in SSA form Intermediate Reps.