1 CMPSC 160 Translation of Programming Languages Fall 2002 Lecture-Modules 17 and 18 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon.

Slides:



Advertisements
Similar presentations
Chapter 6 Intermediate Code Generation
Advertisements

1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
Intermediate Code Generation
Intermediate Code Generation. 2 Intermediate languages Declarations Expressions Statements.
Intermediate Representations Saumya Debray Dept. of Computer Science The University of Arizona Tucson, AZ
Short circuit code for boolean expressions: Boolean expressions are typically used in the flow of control statements, such as if, while and for statements,
8 Intermediate code generation
1 Compiler Construction Intermediate Code Generation.
Overview of Previous Lesson(s) Over View  Front end analyzes a source program and creates an intermediate representation from which the back end generates.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
Program Representations. Representing programs Goals.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Code Shape, Part II Addressing Arrays, Aggregates, & Strings Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 536 Spring Run-time organization Lecture 19.
Code Shape II Expressions & Arrays Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
1 Intermediate Code generation. 2 Intermediate Code Generation l Intermediate languages l Declarations l Expressions l Statements l Reference: »Chapter.
Run time vs. Compile time
Run-time Environment and Program Organization
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
CH4.1 CSE244 Intermediate Code Generation Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Unit.
Static checking and symbol table Chapter 6, Chapter 7.6 and Chapter 8.2 Static checking: check whether the program follows both the syntactic and semantic.
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution.
Compiler Construction
Compiler Chapter# 5 Intermediate code generation.
1 Intermediate Code Generation Part I Chapter 8 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Chapter 8: Intermediate Code Generation
Review: –What is an activation record? –What are the typical fields in an activation record? –What are the storage allocation strategies? Which program.
1 October 25, October 25, 2015October 25, 2015October 25, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Intermediate Code Generation
1 June 3, June 3, 2016June 3, 2016June 3, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 CIS 461 Compiler Design and Construction Winter 2012 Lecture-Module #16 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon.
Topic #7: Intermediate Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Intermediate Code Generation Abstraction at the source level identifiers, operators, expressions, statements, conditionals, iteration, functions (user.
Review: Syntax directed translation. –Translation is done according to the parse tree. Each production (when used in the parsing) is a sub- structure of.
Code Generation How to produce intermediate or target code.
Chap. 4, Intermediate Code Generation
Three Address Code Generation of Control Statements continued..
Intermediate Code Generation CS 671 February 14, 2008.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404 Introduction to Compiler Design
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Run-time organization
Compiler Construction
Subject Name:COMPILER DESIGN Subject Code:10CS63
Compiler Optimization and Code Generation
Intermediate Code Generation Part I
Intermediate Code Generation
Intermediate Representations
Chapter 6 Intermediate-Code Generation
Three Address Code Generation - Control Statements
Intermediate Code Generation Part I
Intermediate Representations
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
7.4 Boolean Expression and Control Flow Statements
Compiler Design 21. Intermediate Code Generation
Intermediate Code Generation Part I
Intermediate Code Generation
Intermediate Code Generation Part I
Compiler Design 21. Intermediate Code Generation
Procedure Linkages Standard procedure linkage Procedure has
Review: What is an activation record?
Presentation transcript:

1 CMPSC 160 Translation of Programming Languages Fall 2002 Lecture-Modules 17 and 18 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Department of Computer Science University of California, Santa Barbara

2 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution requires exponential time Intermediate code generation can effect the performance of the back end Instruction Selection Instruction Scheduling Register Allocation ScannerParser Semantic Analysis Code Optimization Intermediate Code Generation IR

3 Intermediate Representations Abstract Syntax Trees (AST) Directed Acyclic Graphs (DAG) Control Flow Graphs (CFG) Static Single Assignment Form (SSA) Stack Machine Code Three Address Code Hybrid approaches mix graphical and linear representations –SGI and SUN compilers use three address code but provide ASTs for loops if-statements and array references –Use three-address code in basic blocks in control flow graphs high-level low-level Graphical IRs Linear IRs

4 Abstract Syntax Trees (ASTs) if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt + * x IfStmt AssignStmt xxy+ yx y / 5 y 3 * 5 y 5

5 Directed Acyclic Graphs (DAGs) Use directed acyclic graphs to represent expressions –Use a unique node for each expression if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; Statements < AssignStmt * IfStmt AssignStmt x+y / 5 3

6 Control Flow Graphs (CFGs) Nodes in the control flow graph are basic blocks –A basic block is a sequence of statements always entered at the beginning of the block and exited at the end Edges in the control flow graph represent the control flow if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; if (x < y) goto B 1 else goto B 2 x = 5*y + 5*y/3y = 5 x = x+y B1B1 B2B2 B0B0 B3B3 Each block has a sequence of statements No jump from or to the middle of the block Once a block starts executing, it will execute till the end

7 Stack Machine Code if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; load x load y iflt L1 goto L2 L1: push 5 load y multiply push 5 load y multiply push 3 divide add store x goto L3 L2: push 5 store y L3: load x load y add store x pops the top two elements and compares them pops the top two elements, multiplies them, and pushes the result back to the stack JVM: A stack machine In the project we generate assembly code for JVM which is converted to bytecode by Jasmin assembler JVM interpreter executes the bytecode on different machines JVM has an operand stack which we use to evaluate expressions JVM provides local variables for each method The local variables are like registers so we do not have to worry about register allocation Each local variable in JVM is denoted by a number between 0 and (x and y in the example will be assigned unique numbers) pushes the value at the location x to the stack stores the value at the top of the stack to the location x

8 Three-Address Code Each instruction can have at most three operands Assignments –x := y –x := y op z op: binary arithmetic or logical operators –x := op yop: unary operators (minus, negation, integer to float conversion) Branch –goto LExecute the statement with labeled L next Conditional Branch –if x relop y goto Lrelop: =, ==, != if the condition holds we execute statement labeled L next if the condition does not hold we execute the statement following this statement next

9 Three-Address Code if (x < y) x = 5*y + 5*y/3; else y = 5; x = x+y; if x < y goto L1 goto L2 L1:t1 := 5 * y t2 := 5 * y t3 := t2 / 3 x := t1 + t2 goto L3 L2:y := 5 L3:x := x + y Temporaries: temporaries correspond to the internal nodes of the syntax tree Variables can be represented with their locations in the symbol table Three address code instructions can be represented as an array of quadruples: operation, argument1, argument2, result triples: operation, argument1, argument2 (each triple implicitly corresponds to a temporary)

10 Three-Address Code Generation for a Simple Grammar ProductionsSemantic Rules S  id := Eid.place  lookup(id.name); S.code  E.code || gen(id.place ‘:=‘ E.place); E  E 1 + E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘+’ E 2.place); E  E 1 * E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘*’ E 2.place); E  ( E 1 )E.code  E 1.code; E.place  E 1.place; E   E 1 E.place  newtemp(); E.code  E 1.code || gen(E.place ‘:=‘ ‘uminus’ E 1.place); E  id E.place  lookup(id.name); E.code  ‘’ (empty string) Attributes:E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E Procedures:newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table

11 Stack Machine Code Generation for a Simple Grammar ProductionsSemantic Rules S  id := Eid.place  lookup(id.name); S.code  E.code || gen(‘store’ id.place); E  E 1 + E 2 E.code  E 1.code || E 2.code || gen(‘add’); (arguments for the add instruction are in the top of the stack) E  E 1 * E 2 E.code  E 1.code || E 2.code || gen(‘multiply’); E  ( E 1 )E.code  E 1.code; E   E 1 E.code  E 1.code || gen( ‘negate‘); E  id E.code  gen(‘load’ id.place) Attributes:E.code: sequence of instructions that are generated for E (no place for an expression is needed since the result of an expression is stored in the operand stack) Procedures:newtemp(): Returns a new temporary each time it is called gen(): Generates instruction (have to call it with appropriate arguments) lookup(id.name): Returns the location of id from the symbol table

12 Code Generation for Boolean Expressions Two approaches –Numerical representation –Implicit representation Numerical representation –Use 1 to represent true, use 0 to represent false –For three-address code store this result in a temporary –For stack machine code store this result in the stack Implicit representation –For the boolean expressions which are used in flow-of-control statements (such as if-statements, while-statements etc.) boolean expressions do not have to explicitly compute a value, they just need to branch to the right instruction –Generate code for boolean expressions which branch to the appropriate instruction based on the result of the boolean expression

13 Boolean Expressions: Numerical Representation Attributes :E.place: location that holds the value of expression E E.code: sequence of instructions that are generated for E id.place: location for id Global variable:nextstat: Returns the location of the next instruction to be generated (each call to gen() increments nextstat by 1) ProductionsSemantic Rules E  id 1 relop id 2 E.place  newtemp(); E.code  gen(‘if’ id 1.place relop.op id 2.place ‘goto’ nextstat+3); || gen(E.place ‘:=‘ ‘0’) || gen(‘goto’ nextstat+2) || gen(E.place ‘:=‘ ‘1’); E  E 1 and E 2 E.place  newtemp(); E.code  E 1.code || E 2.code || gen(E.place ‘:=‘ E 1.place ‘and’ E 2.place);

14 Boolean Expressions: Implicit Representation ProductionsSemantic Rules E  id 1 relop id 2 E.code  gen(‘if’ id 1.place relop.op id 2.place ‘goto’ E.true) || gen(‘goto’ E.false); E  E 1 and E 2 E 1.true  newlabel(); E 1.false  E. false; (short-circuiting) E 2.true  E. true; E 2.false  E. false; E.code  E 1.code || gen(E 1.true ‘:’) || E 2.code ; Attributes :E.code: sequence of instructions that are generated for E E.false: instruction to branch to if E evaluates to false E.true: instruction to branch to if E evaluates to true (E.code is synthesized whereas E.true and E.false are inherited) id.place: location for id can be any relational operator: ==, = != These places will be filled with lables later on when they become available This generated label will be inserted to the place for E 1.true in the code generated for E 1

15 Example 100if x < y goto t1 := 0 102goto t1 := 1 104if a = b goto t2 := 0 106goto t2 := 1 108t3 := t1 and t2 Input boolean expression: x < y and a == b Numerical representation: if x < y goto L1 goto LFalse L1:if a = b goto LTrue goto LFalse... LTrue: LFalse: Implicit representation: These are the locations of three-address code instructions, they are not labels These labels will be generated later on, and will be inserted to the corresponding places

16 Flow-of-Control Statements If-then-else Branch based on the result of boolean expression Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top if condition holds Merges test with last block of loop body While, for, do, and until all fit this basic model Pre-test Loop body Post-test Next block

17 Flow-of-Control Statements: Code Structure E.code S 1.code goto S.next S 2.code    to E.true to E.false E.true: E.false: S.next: S  if E then S 1 else S 2 if E evaluates to true if E evaluates to false E.code S 1.code goto S.begin    to E.true to E.false E.true: E.false: S.begin: S  while E do S 1 Another approach is to place E.code after S 1.code

18 Flow-of-Control Statements ProductionsSemantic Rules S  if E then S 1 else S 2 E.true  newlabel(); E.false  newlabel(); S 1.next  S. next; S 2.next  S. next; S.code  E.code || gen(E.true ‘:’) || S 1.code || gen(‘goto’ S.next) || gen(E.false ‘:’) || S 2.code ; S  while E do S 1 S.begin  newlabel(); E.true  newlabel(); E.false  S. next; S 1.next  S. begin; S.code  gen(S.begin ‘:’) || E.code || gen(E.true ‘:’) || S 1.code || gen(‘goto’ S.begin); S  S 1 ; S 2 S 1.next  newlabel(); S 2.next  S.next; S.code  S 1.code || gen(S 1.next ‘:’) || S 2.code Attributes : S.code: sequence of instructions that are generated for S S.next: label of the instruction that will be executed immediately after S (S.next is an inherited attribute)

19 Example Input code fragment: while (a < b) { if (c < d) x = y + z; else x = y – z } L1:if a < b goto L2 goto LNext L2:if c < d goto L3 goto L4 L3:t1 := y + z x := t1 goto L1 L4:t2 := y – z x := t2 goto L1 LNext:...

20 Backpatching E.true, E.false, S.next may not be computed in a single pass (they are inherited attributes) Backpatching is a technique for generating labels for E.true, E.false, S.next and inserting them to the appropriate locations Basic idea –Keep lists E.truelist, E.falselist, S.nextlist E.truelist: the list of instructions where the label for E.true have to be inserted when it becomes available S.nextlist: the list of instructions where the label for S.next have to be inserted when it becomes available –When labels E.true, E.false, S.next are computed these labels are inserted to the instructions in these lists

21 Flow-of-Control Statements: Case Statements Case Statements 1 Evaluate the controlling expression 2 Branch to the selected case 3 Execute the code for that case 4 Branch to the statement after the case Part 2 is the key Strategies Linear search (nested if-then-else constructs) Build a table of case values & binary search it Directly compute an address (requires dense case set) –Use an array of labels that is addressed by the case value

22 Type Conversions Mixed-type expressions Insert conversions as needed from conversion table –i2f r 1, r 2 (convert the integer value in register r 1 to float, and store the result in register r 2 ) Most languages have symmetric conversion tables Typical Addition Table

23 Memory Layout Placing run time data structures Alignment and padding Machines have alignment restrictions –32-bit floating point numbers and integers should begin on a full-word boundary (32-bit boundary) Place values with identical restrictions next to each other Assign offsets from most restrictive to least If needed, insert padding (space left unused due to alignment restrictions) to match restrictions CodeCode S G t l a & o t b i a c l HeapHeap StackStack 0 high Stack and heap share free space Fixed-size areas together For compiler, this is the entire picture

24 Memory Allocation P  D D  D; D D  id : T T  char | int | float | array[num] of T | pointer T Attributes: T.type, T.width Basic types: char width 4, integer width 4, float width 8 Type constructors: array(size,type) width is size * (width of type) pointer(type) width is 4 Enter the variables to the symbol table with their type and memory location Set the type attribute T.type Layout the storage for variables Calculate the offset for each local variable and enter it to the symbol table Offset can be offset from a static data area or from the beginning of the local data area in the activation record

25 A Translation Scheme for Memory Allocation P  {offset  0;} D D  D; D D  id : T {enter(id.name,T.type, offset); offset  offset + T.width; } T  char { T.type  char; T.width  4; } | int { T.type  integer; T.width  14; } | float { T.type  float; T.width  8; } | array[num] of T 1 { T.type  array(num.val, T 1.type); T.width  num.val * T 1.width; } | pointer T { T.type  pointer(T 1.type); T.width  8; } Note that if the size of the array is not a constant we cannot compute its width at compile time In that case allocate the memory for the array in the heap and allocate the memory for the pointer to the heap at compile time

26 Memory Allocation for Nested Procedures Use two stacks: –Stack for symbol table: top(tblptr) points to current symbol table –Stack for offset: top(offset) gives the value of the offset in the current symbol table addwidth(table,width) stores the cumulative width of all the entries in the symbol table in the header of the symbol table mktbl(previous) creates a new symbol table linked to the previous enter(table, name, type, offset) enters a variable to the symbol table enterProc(table,name,newtable) enters a procedure to the symbol table P  D D  D; D D  id : T | proc id; D; S

27 Memory Allocation for Nested Procedures P  {t  mktable(nil); push(t,tblptr); push(0, offset)} D {addwidth(top(tblptr), top(offset)); pop(tblptr); pop(offset);} D  D; D D  proc id ; { t  mktable(top(tblptr)); push(t,tblptr); push(0,offset); } D ; S { t  top(tblptr); addwidth(t, top(offset)); pop(tblptr); pop(offset); D  id : T {enter(top(tblptr), id.name, T.type, top(offset)); top(offset)  top(offset) + T.width; }

28 Array access: How does the compiler handle A[ i ][ j ] ? First, must agree on a storage scheme Row-major order (most languages) Lay out as a sequence of consecutive rows Rightmost subscript varies fastest A[1,1], A[1,2], A[1,3], A[2,1], A[2,2], A[2,3] Column-major order (Fortran) Lay out as a sequence of columns Leftmost subscript varies fastest A[1,1], A[2,1], A[1,2], A[2,2], A[1,3], A[2,3] Indirection vectors (Java) Vector of pointers to pointers to … to values Takes more space Trades indirection for arithmetic (in modern processors memory access is more costly than arithmetic) Locality may not be good if indirection vectors are used

29 Laying Out Arrays The Concept Row-major order Column-major order Indirection vectors 1,11,21,31,42,12,22,32,4 A 1,12,11,22,21,32,31,42,4 A 1,11,21,31,4 2,12,22,32,4 A 1,11,21,31,4 2,12,22,32,4 A These have distinct & different cache behavior  The order of traversal of an array can effect the performance

30 Computing an Array Address A[ i + ( i - low )  sizeof(A[1]) What about A[i 1 ][i 2 ] ? Row-major order, two + (( i 1 - low 1 )  (high 2 - low 2 + 1) + i 2 - low 2 )  sizeof(A[1]) Column-major order, two + (( i 2 - low 2 )  (high 1 - low 1 + 1) + i 1 - low 1 )  sizeof(A[1]) Indirection vectors, two dimensions *(A[i 1 ])[i 2 ] — where A[i 1 ] is, itself, a 1-d array reference Expensive computation! Lots of +, -, x operations int A[1:10]  low is 1 Make low 0 for faster access (save a subtraction ) Almost always a power of 2, known at compile- time  use a shift for speed Base of A (starting address of the array)

31 Optimizing Address Calculation for A[ i ][ j ] In row-major + (i-low 1 )(high 2 -low 2 +1)  sizeof(A[1][1]) + (j - low 2 )  sizeof(A[1][1]) Which can be factored + i  (high 2 -low 2 +1)  sizeof(A[1][1]) + j  sizeof(A[1][1]) - (low 1  (high 2 -low 2 +1)  sizeof(A[1][1]) + low 2  sizeof(A[1][1])) If low i, high i, and sizeof(A[1][1]) are known, the last term is a constant 0 - (low 1  (high 2 -low 2 +1)  sizeof(A[1][1]) + low 2  sizeof(A[1][1])) And len 2 as (high 2 -low 2 +1) Then, the address expression 0 + (i  len 2 + j )  sizeof(A[1][1]) Compile-time constants

32 Array References What about arrays as actual parameters? Whole arrays, as call-by-reference parameters Need dimension information  build a dope vector Store the values in the calling sequence Pass the address of the dope vector in the parameter slot Generate complete address polynomial at each reference Some improvement is possible Save len i and low i rather than low i and high i Pre-compute the fixed terms in prolog sequence What about call-by-value? Most call-by-value languages pass arrays by low 1 high 1 low 2 high 2

33 Structure of a Compiler Instruction Selection Instruction Scheduling Register Allocation ScannerParser Semantic Analysis Code Optimization Intermediate Code Generation