Theory of Compilation 236360 Erez Petrank Lecture 6: Intermediate Representation 1.

Slides:



Advertisements
Similar presentations
Chapter 6 Intermediate Code Generation
Advertisements

Intermediate Code Generation
Backpatching: The syntax directed definition we discussed before can be implemented in two or more passes (we have both synthesized attributes and inheritent.
Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
Short circuit code for boolean expressions: Boolean expressions are typically used in the flow of control statements, such as if, while and for statements,
8 Intermediate code generation
Chapter 8 Intermediate Code Generation. Intermediate languages: Syntax trees, three-address code, quadruples. Types of Three – Address Statements: x :=
1 Compiler Construction Intermediate Code Generation.
Generation of Intermediate Code Compiler Design Lecture (03/30//98) Computer Science Rensselaer Polytechnic.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Overview of Previous Lesson(s) Over View  Front end analyzes a source program and creates an intermediate representation from which the back end generates.
Lecture 07 – attribute grammars + intro to IR Eran Yahav 1.
Compiler Construction Sohail Aslam Lecture Boolean Expressions E → E 1 and M E 2 {backpatch(E 1.truelist, M.quad); E.truelist = E 2.truelist; E.falselist.
Three Address Code Generation Backpatching-I Prepared By: Siddharth Tiwary 04CS3010.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Lecture 09 – IR (Backpatching) Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6
Theory of Compilation Erez Petrank Lecture 7: Intermediate Representation Part 2: Backpatching 1.
1 CMPSC 160 Translation of Programming Languages Fall 2002 Lecture-Modules 17 and 18 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon.
Intermediate Code Generation Professor Yihjia Tsai Tamkang University.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
1 Intermediate Code generation. 2 Intermediate Code Generation l Intermediate languages l Declarations l Expressions l Statements l Reference: »Chapter.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CH4.1 CSE244 Intermediate Code Generation Aggelos Kiayias Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Unit.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
1 Structure of a Compiler Front end of a compiler is efficient and can be automated Back end is generally hard to automate and finding the optimum solution.
COP4020 Programming Languages Semantics Prof. Xin Yuan.
Compiler Chapter# 5 Intermediate code generation.
1 Intermediate Code Generation Part I Chapter 8 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Chapter 8: Intermediate Code Generation
Review: –What is an activation record? –What are the typical fields in an activation record? –What are the storage allocation strategies? Which program.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Intermediate Code Generation
1 June 3, June 3, 2016June 3, 2016June 3, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Chapter 5. Syntax-Directed Translation. 2 Fig Syntax-directed definition of a simple desk calculator ProductionSemantic Rules L  E n print ( E.val.
Topic #7: Intermediate Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Intermediate Code Generation Abstraction at the source level identifiers, operators, expressions, statements, conditionals, iteration, functions (user.
Review: Syntax directed translation. –Translation is done according to the parse tree. Each production (when used in the parsing) is a sub- structure of.
Boolean expressions 1 productionsemantic action E  E1 or E2E1.trueLabel = E.trueLabel; E1.falseLabel = freshLabel(); E2.trueLabel = E.trueLabel; E2.falseLabel.
Code Generation CPSC 388 Ellen Walker Hiram College.
Code Generation How to produce intermediate or target code.
Chap. 4, Intermediate Code Generation
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
Theory of Compilation Erez Petrank Lecture 6: Intermediate Representation and Attribute Grammars 1.
Three Address Code Generation of Control Statements continued..
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Constructing Precedence Table
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Compiler Construction
Compiler Construction
Subject Name:COMPILER DESIGN Subject Code:10CS63
Compiler Optimization and Code Generation
Intermediate Code Generation Part I
Intermediate Code Generation
Chapter 6 Intermediate-Code Generation
Intermediate Code Generation Part I
Three-address code A more common representation is THREE-ADDRESS CODE . Three address code is close to assembly language, making machine code generation.
Compiler Design 21. Intermediate Code Generation
Intermediate Code Generation Part I
Three Address Code Generation – Backpatching
Intermediate Code Generation
Intermediate Code Generation Part I
Compiler Design 21. Intermediate Code Generation
Review: What is an activation record?
Intermediate Code Generating machine-independent intermediate form.
Presentation transcript:

Theory of Compilation Erez Petrank Lecture 6: Intermediate Representation 1

You are here 2 Executable code Executable code exe Source text Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep. (IR) Code Gen. characterstokens AST ( Abstract Syntax Tree) Annotated AST

GENERIC Intermediate Representation “neutral” representation between the front-end and the back-end – Abstracts away details of the source language – Abstract away details of the target language A compiler may have multiple intermediate representations and move between them 3 Java C Objective C Ada arm x86ia64 … …

Intermediate Representation(s) Annotated abstract syntax tree Three address code … 4

Example: Annotated AST makeNode – creates new node for unary/binary operator makeLeaf – creates a leaf id.place – pointer to symbol table 5 productionsemantic rule S ➞ id := E S.nptr = makeNode(‘assign’, makeLeaf(id,id.place), E.nptr) E ➞ E1 + E2 E.nptr = makeNode(‘+’,E1.nptr,E2.nptr) E ➞ E1 * E2 E.nptr = makeNode(‘*’,E1.nptr,E2.nptr) E ➞ -E1 E.nptr = makeNode(‘uminus’,E1.nptr) E ➞ (E1) E.nptr = E1.nptr E ➞ id E.nptr = makeLeaf(id,id.place)

Example 6 * * uminus cidc b b assign + aid        b 0 c 1 1uminus2 20*3 bid4 c 5 5uminus6 64* aid9 89assign10 · · ·11 a = b * -c + b* -c

Three Address Code (3AC) Every instruction operates on three addresses – result = operand1 operator operand2 Close to low-level operations in the machine language – Operator is a basic operation Statements in the source language may be mapped to multiple instructions in three address code 7

Three address code: example 8 assign a a + + * * b b uminus c c * * b b c c t5t5 :=a t 2 + t 4 :=t5t5 b * t 3 :=t4t4 – c:=t3t3 b * t 1 :=t2t2 – c:=t1t1

Three address code: example instructions 9 instructionmeaning x := y op zassignment with binary operator x := op yassignment unary operator x:= yassignment x := &yassign address of y x:=*yassignment from deref y *x := yassignment to deref x instructionmeaning goto Lunconditional jump if x relop y goto Lconditional jump

Array operations Are these 3AC operations? 10 t1 := &y ; t1 = address-of y t2 := t1 + i ; t2 = address of y[i] X := *t2 ; value stored at y[i] t1 := &x ; t1 = address-of x t2 := t1 + i ; t2 = address of x[i] *t2:= y ; store through pointer x := y[i] x[i] := y

Three address code: example 11 int main(void) { int i; int b[10]; for (i = 0; i < 10; ++i) b[i] = i*i; } i := 0 ; assignment L1: if i >= 10 goto L2 ; conditional jump t0 := i*i t1 := &b ; address-of operation t2 := t1 + i ; t2 holds the address of b[i] *t2 := t0 ; store through pointer i := i + 1 goto L1 L2: (example source: wikipedia)

Three address code Choice of instructions and operators affects code generation and optimization Small set of instructions – Easy to generate machine code – Harder to optimize Large set of instructions – Harder to generate machine code Typically prefer small set and smart optimizer 12

Creating 3AC Assume bottom up parser – Covers a wider range of grammars – LALR sufficient to cover most programming languages Creating 3AC via syntax directed translation Attributes examples: – code – code generated for a nonterminal – var – name of variable that stores result of nonterminal freshVar() – helper function that returns the name of a fresh variable 13

Creating 3AC: expressions 14 productionsemantic rule S ➞ id := E S.code := E. code || gen(id.var ‘:=‘ E.var) E ➞ E1 + E2 E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘+’ E2.var) E ➞ E1 * E2 E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘*’ E2.var) E ➞ - E1 E.var := freshVar(); E.code = E1.code || gen(E.var ‘:=‘ ‘uminu’ E1.var) E ➞ (E1) E.var := E1.var E.code = ‘(‘ || E1.code || ‘)’ E ➞ id E.var := id.var; E.code = ‘’ (we use || to denote concatenation of intermediate code fragments)

example 15 assign a a + + * * b b uminus c c * * b b c c E.var = c E.code =‘’ E.var = b E.code =‘’ E.var = t2 E.code =‘t1 = -c t2 = b*t1’ E.var = t1 E.code =‘t1 = -c’ E.var = b E.code =‘’ E.var = c E.code =‘’ E.var = t3 E.code =‘t3 = -c’ E.var = t4 E.code =‘t3 = -c t4 = b*t3’ E.var = t5 E.code =‘t1 = -c t2 = b*t1 t3 = -c t4 = b*t3 t5 = t2+t4’ a = b * -c + b* -c

Creating 3AC: control statements 3AC only supports conditional/unconditional jumps Add labels Attributes – begin – label marks beginning of code – after – label marks end of code Helper function freshLabel() allocates a new fresh label 16

Creating 3AC: control statements S  while E do S1 17 productionsemantic rule S  while E do S1S.begin := freshLabel(); S.after = freshLabel(); S.code := gen(S.begin ‘:’) || E.code || gen(‘if’ E.var ‘=‘ ‘0’ ‘goto’ S.after) || S1.code || gen(‘goto’ S.begin) || gen(S.after ‘:’) E.code S.begin: if E.var = 0 goto S.after S 1.code goto S.begin · · ·S.after:

Representing 3AC Quadruple (op,arg1,arg2,result) Result of every instruction is written into a new temporary variable Generates many variable names Can move code fragments without complicated renaming Alternative representations may be more compact 18 at5t5 :=(5) t5t5 t4t4 t2t2 +(4) t4t4 t3t3 b*(3) t3t3 cuminus(2) t2t2 t1t1 b*(1) t1t1 cuminus(0) resultarg 2arg 1op t 1 = - c t 2 = b * t 1 t 3 = - c t 4 = b * t 3 t 5 = t 2 * t 4 a = t 5

Allocating Memory Type checking helped us guarantee correctness Also tells us – How much memory allocate on the heap/stack for varaibles – Where to find fields (based on lengths) – Compute address of an element inside array (size of stride based on type of element) 19

Allocating Memory Global variable “offset” with memory allocated so far 20 productionsemantic rule P ➞ D { offset := 0} D ➞ D D D ➞ T id; { enter(id.name, T.type, offset); offset += T.width } T ➞ integer { T.type := int; T.width = 4 } T ➞ float { T.type := float; T.width = 8} T ➞ T1[num] { T.type = array (num.val,T1.Type); T.width = num.val * T1.width; } T ➞ *T1 { T.type := pointer(T1.type); T.width = 4}

Allocating Memory P P D2 D1 D4 T1 id int count T2 id float money D5 T3 id balances T4 [ [ num ] ] int 42 T 2.type = float T 2.width = 4 id.name = money T 1.type = int T 1.width = 4 id.name = count enter(count, int, 0) offset = offset + 4 enter(money, float, 4) offset = offset + 4 D6

Adjusting to bottom-up 22 productionsemantic rule P ➞ M D M ➞  { offset := 0} D ➞ D D D ➞ T id; { enter(id.name, T.type, offset); offset += T.width } T ➞ integer { T.type := int; T.width = 4 } T ➞ float { T.type := float; T.width = 8} T ➞ T1[num] { T.type = array (num.val,T1.Type); T.width = num.val * T1.width; } T ➞ *T1 { T.type := pointer(T1.type); T.width = 4}

Allocating Memory P P D2 D1 D4 T1 id int count T2 id float money D5 T3 id balances T4 [ [ num ] ] int 42 T 1.type = int T 1.width = 4 id.name = count enter(count, int, 0) offset = offset + 4 D6D6 D6D6 M M offset = 0

Generating IR code Option 1 accumulate code in AST attributes Option 2 emit IR code to a file during compilation – If for every production the code of the left-hand-side is constructed from a concatenation of the code of the RHS in some fixed order 24

Expressions and assignments 25 productionsemantic action S ➞ id := E { p:= lookup(id.name); if p ≠ null then emit(p ‘:=‘ E.var) else error } E ➞ E1 op E2 { E.var := freshVar(); emit(E.var ‘:=‘ E1.var op E2.var) } E ➞ - E1 { E.var := freshVar(); emit(E.var ‘:=‘ ‘uminus’ E1.var) } E ➞ ( E1) { E.var := E1.var } E ➞ id { p:= lookup(id.name); if p ≠ null then E.var :=p else error }

Boolean Expressions 26 productionsemantic action E ➞ E1 op E2 { E.var := freshVar(); emit(E.var ‘:=‘ E1.var op E2.var) } E ➞ not E1 { E.var := freshVar(); emit(E.var ‘:=‘ ‘not’ E1.var) } E ➞ ( E1) { E.var := E1.var } E ➞ true { E.var := freshVar(); emit(E.var ‘:=‘ ‘1’) } E ➞ false { E.var := freshVar(); emit(E.var ‘:=‘ ‘0’) } Represent true as 1, false as 0 Wasteful representation, creating variables for true/false

Boolean expressions via jumps 27 productionsemantic action E ➞ id1 op id2 { E.var := freshVar(); emit(‘if’ id1.var relop id2.var ‘goto’ nextStmt+2); emit( E.var ‘:=‘ ‘0’); emit(‘goto ‘ nextStmt + 1); emit(E.var ‘:=‘ ‘1’) } This is useful for short circuit evaluation. Let us start with an example.

Boolean Expressions: an Example 28 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and

Boolean Expressions: an Example 29 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and if a < b goto : T 1 := 0101: goto : T 1 := 1103:

Boolean Expressions: an Example 30 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and if a < b goto : T 1 := 0101: goto : T 1 := 1103: if c < d goto : T 2 := 0105: goto : T 2 := 1107:

Boolean Expressions: an Example 31 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and if a < b goto : T 1 := 0101: goto : T 1 := 1103: if c < d goto : T 2 := 0105: goto : T 2 := 1107: if e < f goto : T 3 := 0109: goto : T 3 := 1111: 112:

Boolean Expressions: an Example 32 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and if a < b goto : T 1 := 0101: goto : T 1 := 1103: if c < d goto : T 2 := 0105: goto : T 2 := 1107: if e < f goto : T 3 := 0109: goto : T 3 := 1111: 112: 113: T 4 := T 2 and T 3

Boolean Expressions: an Example 33 E E E E E E a a < < b b or E E c c < < d d E E e e < < f f and if a < b goto : T 1 := 0101: goto : T 1 := 1103: if c < d goto : T 2 := 0105: goto : T 2 := 1107: if e < f goto : T 3 := 0109: goto : T 3 := 1111: 112: 113: T 4 := T 2 and T 3 T 5 := T 1 or T 4

Short circuit evaluation Second argument of a boolean operator is only evaluated if the first argument does not already determine the outcome (x and y) is equivalent to: “if x then y else false”; (x or y) is equivalent to: “if x then true else y”; Is short circuit evaluation equivalent to standard evaluation? We use it if the language definition dictates its use. 34

More examples 35 int denom = 0; if (denom && nom/denom) { oops_i_just_divided_by_zero(); } if ( (file=open(“c:\grades”) or (die) ) printfile(file,…);

Our previous example 36 a < b or (c<d and e<f) 100: if a < b goto : T 1 := 0 102: goto : T 1 := 1 104: if c < d goto : T 2 := 0 106: goto : T 2 := 1 108: if e < f goto : T 3 := 0 110: goto : T 3 := 1 112: T 4 := T 2 and T 3 113: T 5 := T 1 and T 4 100: if a < b goto : if !(c < d) goto : if e < f goto : T := 0 104: goto : T := 1 106: naiveShort circuit evaluation

Control Structure: if, else, while. Consider the conditional jumps: One option is to create code for B, create code for S, and then create a jump to the beginning of S or the end of S according to B’s value. But it would be more efficient to create code that while executing B will jump to the correct location immediately when the value of B is discovered. if B then S 1 if B then S 1 else S 2 while B do S 1 S → |

Control Structure: if, else, while. The problem is that we do not know where to jump to… While parsing B’s tree for “if B then S”, we don’t know S’s code and where it starts or ends. Yet, we must generate jumps to these locations. Solution: for each expression B keep labels: B.trueLabel and B.falseLabel. We jump to these labels when B’s value is true or false correspondingly. Also, for each statement S, we have a label S.nextLabel which keeps the address of the code that follows the statement S. S S if B B then S S The semantic equation will be B.false = S.next. For B.true, we create a new label between B’s code and S’s code.

התכונה next While parsing statement S, generate the code with its next label. The attribute S.next is inherited: the child gets it. The attribute code is generated: the parent gets it. The label S.next is symbolic. The address will only be known after we parse S. productionsemantic action P  SS.next = freshLabel(); P.code = S.code || label(S.next) S  S1 S2S1.next = freshLabel(); S2.next = S.next; S.code = S1.code || label(S1.next) || S2.code

Control Structures: conditional 40 productionsemantic action S  if B then S1B.trueLabel = freshLabel(); B.falseLabel = S.next; S1.next = S.next; S.code = B.code || gen (B.trueLabel ‘:’) || S1.code  Are S1.next, B.falseLabel inherited or synthesized?  Is S.code inherited or synthesized?

Control Structures: conditional 41 productionsemantic action S  if B then S1 else S2 B.trueLabel = freshLabel(); B.falseLabel = freshLabel(); S1.next = S.next; S2.next = S.next; S.code = B.code || gen(B.trueLabel ‘:’) || S1.code || gen(‘goto’ S.next) || gen(B.falseLabel ‘:’) || S2.code  B.trueLabel and B.falseLabel considered inherited

S  if B then S1 else S2 42 B.code S1.code goto S.next S2.code … B.trueLabel: B.falseLabel: S.next: B.trueLabel = freshLabel(); B.falseLabel = freshLabel(); S1.next = S.next; S2.next = S.next; S.code = B.code || gen(B.trueLabel ‘:’) || S1.code || gen(‘goto’ S.next) || gen(B.falseLabel ‘:’) || S2.code

Boolean expressions 43 productionsemantic action B  B1 or B2B1.trueLabel = B.trueLabel; B1.falseLabel = freshLabel(); B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.falseLabel ‘:’) || B2.code B  B1 and B2B1.trueLabel = freshLabel(); B1.falseLabel = B.falseLabel; B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.trueLabel ‘:’) || B2.code B  not B1B1.trueLabel = B.falseLabel; B1.falseLabel = B.trueLabel; B.code = B1.code; B  (B1)B1.trueLabel = B.trueLabel; B1.falseLabel = B.falseLabel; B.code = B1.code; B  id1 relop id2B.code=gen (‘if’ id1.var relop id2.var ‘goto’ B.trueLabel)||gen(‘goto’ B.falseLabel); B  trueB.code = gen(‘goto’ B.trueLabel) B  falseB.code = gen(‘goto’ B.falseLabel); Generate code that jumps to B.trueLabel if B’s value is true and to B.falseLabel if B’s value is false.

Boolean expressions How can we determine the address of B1.falseLabel? Only possible after we know the code of B1 and all the code preceding B1 44 productionsemantic action B  B1 or B2B1.trueLabel = B.trueLabel; B1.falseLabel = freshLabel(); B2.trueLabel = B.trueLabel; B.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.falseLabel ‘:’) || B2.code

Example 45 S S if B B then S1 B1 B2 and false true B1.trueLabel = freshLabel(); B1.falseLabel = B.falseLabel; B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.trueLabel ‘:’) || B2.code B.code = gen(‘goto’ B1.trueLabel) B.code = gen(‘goto’ B2.falseLabel)

Example 46 S S if B B then S1 B1 B2 and false true B.trueLabel = freshLabel(); B.falseLabel = S.next; S1.next = S.next; S.code = B.code || gen (B.trueLabel ‘:’) || S1.code B1.trueLabel = freshLabel(); B1.falseLabel = B.falseLabel; B2.trueLabel = B.trueLabel; B2.falseLabel = B.falseLabel; B.code = B1.code || gen (B1.trueLabel ‘:’) || B2.code B.code = gen(‘goto’ B1.trueLabel) B.code = gen(‘goto’ B2.falseLabel)

Computing the labels We can build the code while parsing the tree bottom- up, leaving the jump targets vacant. We can compute the values for the labels in a second traversal of the AST Can we do it in a single pass? 47

So far… Three address code. Intermediate code generation is executed with parsing (via semantic actions). Creating code for Boolean expressions and for control statements is more involved. We typically use short circuit evaluation, value of expression is implicit in control location. We need to compute the branching addresses. Option 1: compute them in a second AST pass. 48

Backpatching ( תיקון לאחור ) Goal: generate code in a single pass Generate code as we did before, but manage labels differently Keep targets of jumps empty until values are known, and then back-patch them New synthesized attributes for B – B.truelist – list of jump instructions that eventually get the label where B goes when B is true. – B.falselist – list of jump instructions that eventually get the label where B goes when B is false. 49

Backpatching For every label, maintain a list of instructions that jump to this label When the address of the label is known, go over the list and update the address of the label Previous solutions do not guarantee a single pass – The attribute grammar we had before is not S-attributed (e.g., next is inherited), and is not L-attributed (because the inherited attributes are not necessarily from left siblings). 50

Why is it difficult? Think of an if-then-else statement. To compute S1.next, we need to know the code of all blocks S1, B, S2, in order to compute the address that follows them. On the other hand, to compute the code of S1, we need to know S1.next, or S.next, which is not known before S1 is computed. So we cannot compute S1’s code with all jump targets, but we can compute it up to “leaving space” for future insertion of S1.next. Using backpatching we maintain a list of all empty spaces that need to jump to S1.next. When we know S1.next, we go over this list and update the jump targets with S1.next’s value. if B S thenelse S1S1 S2S2

Functions for list handling makelist(addr) – create a list of instructions containing addr merge(p1,p2) – concatenate the lists pointed to by p1 and p2, returns a pointer to the new list backpatch(p,addr) – inserts i as the target label for each of the instructions in the list pointed to by p 52

Accumulating the code Assume a bottom-up parsing so that the code is created in the correct order from left to right, bottom-up. The semantic analysis is actually executed during parsing and the code is accumulated. Recall that we associated two labels B.trueLabel and B.falseLabel for each expression B, which are the labels to which we should jump if B turns out true or false. We will now replace these with two lists B.truelist and B.falselist, which record all instructions that should be updated with the address of B.trueLabel or B.falseLabel when we discover their values. In addition, we also replace S.next with S.nextlist, maintaining all instructions that should jump to S.next, when we know what it is.

Accumulating Code (cont’d) B.truelist and B.falselist are generated attributes: the descendants tell the parent where there is code to fix. When ascending to the parent of B, we know what are the relevant addresses and we can backpatch them into all the instruction locations that are in the list. Similarly for S.nextlist. We will need a function nextinstr() that retruns the address of the next instruction.

Computing Boolean expressions Recall the code jumps to B.true ot B.false according to B’s value. Previously: Now with backpatching: { B.code := gen ( ' if ' id 1.var relop.op id 2.var ' goto ' B.true ) || gen ( ' goto ' B.false ) } B → id 1 relop id 2 { B.code := gen ( ' goto ' B.true ) } B → true { B.code := gen ( ' goto ' B.false ) } B → false { B.truelist := makelist ( nextinstr ) ; B.falselist := makelist ( nextinstr + 1 ) ; emit ( ' if ' id 1.var relop.op id 2.var ' goto_ ' ) || emit ( ' goto_ ' ) } B → id 1 relop id 2 { B.truelist := makelist ( nextinstr ) ; emit ( ' goto_ ' ) } B → true { B.falselist := makelist ( nextinstr ) ; emit ( ' goto_ ' ) } B → false

Computing Boolean expressions Recall the code jumps to B.true ot B.false according to B’s value. Previously: Now with backpatching: { B 1.true := B.false ; B 1.false := B.true ; B.code := B 1.code } B → not B 1 { B 1.true := B.true ; B 1.false := B.false ; B.code := B 1.code } B → ( B 1 ) { B.truelist := B 1.falselist ; B.falselist := B 1.truelist } B → not B 1 { B.truelist := B 1.truelist ; B.falselist := B 1.falselist } B → ( B 1 )

Computing Boolean expressions Recall the code jumps to B.true ot B.false according to B’s value. Previously: Now with backpatching: { B 1.true := B.true ; B 1.false := newlable() ; B 2.true := B.true ; B 2.false := B.false ; B.code := B 1.code || gen ( B 1.false ' : ') || B 2.code } B → B 1 or B 2 { B 1.true := newlabel (); B 1.false := B.false ; B 2.true := B.true ; B 2.false := B.false ; B.code := B 1.code || gen ( B 1.true ' : (' || B 2.code } B → B 1 and B 2 { backpatch ( B 1.falselist, M.instr ) ; B.truelist := merge ( B 1.truelist, B 2.truelist ) ; B.falselist := B 2.falselist } B → B 1 or M B 2 { backpatch ( B 1.truelist, M.instr ) ; B.truelist := B 2.truelist ; B.falselist := merge ( B 1.falselist, B 2.falselist ) } B → B 1 and M B 2 { M.instr := nextinstr } M → 

Backpatching Boolean expressions 58 productionsemantic action B  B1 or M B2backpatch(B1.falseList,M.instr); B.trueList = merge(B1.trueList,B2.trueList); B.falseList = B2.falseList; B  B1 and M B2backpatch(B1.trueList,M.instr); B.trueList = B2.trueList; B.falseList = merge(B1.falseList,B2.falseList); B  not B1B.trueList = B1.falseList; B.falseList = B1.trueList; B  (B1)B.trueList = B1.trueList; B.falseList = B1.falseList; B  id1 relop id2B.trueList = makeList(nextInstr); B.falseList = makeList(nextInstr+1); emit (‘if’ id1.var relop id2.var ‘goto _’) || emit(‘goto _’); B  trueB.trueList = makeList(nextInstr); emit (‘goto _’); B  falseB.falseList = makeList(nextInstr); emit (‘goto _’); M  M   M.instr = nextinstr;

Using a marker { M.instr = nextinstr;} Use M to obtain the address just before B2 code starts being generated 59 B1 or B B M M B2   Example:

Example 60 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B  id1 relop id2B.trueList = makeList(nextInstr); B.falseList = makeList(nextInstr+1); emit (‘if’ id1.var relop id2.var ‘goto _’) || emit(‘goto _’); B.t = {100} B.f = {101}  

Example 61 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B.t = {100} B.f = {101} M  M   M.instr = nextinstr; M.i = 102  

Example 62 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B.t = {100} B.f = {101} M.i = 102 B  id1 relop id2B.trueList = makeList(nextInstr); B.falseList = makeList(nextInstr+1); emit (‘if’ id1.var relop id2.var ‘goto _’) || emit(‘goto _’); 102: if x> 200 goto _ 103: goto _ B.t = {102} B.f = {103}  

Example 63 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B.t = {100} B.f = {101} M.i = : if x> 200 goto _ 103: goto _ B.t = {102} B.f = {103}   M  M   M.instr = nextinstr; M.i = 104

Example 64 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B.t = {100} B.f = {101} M.i = : if x> 200 goto _ 103: goto _ B.t = {102} B.f = {103}   M.i = 104 B  id1 relop id2B.trueList = makeList(nextInstr); B.falseList = makeList(nextInstr+1); emit (‘if’ id1.var relop id2.var ‘goto _’) || emit(‘goto _’); 104: if x!=y goto _ 105: goto _ B.t = {104} B.f = {105}

Example 65 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto _ B.t = {100} B.f = {101} M.i = : if x> 200 goto : goto _ B.t = {102} B.f = {103}   M.i = : if x!=y goto _ 105: goto _ B.t = {104} B.f = {105} B  B1 and M B2backpatch(B1.trueList,M.instr); B.trueList = B2.trueList; B.falseList = merge(B1.falseList,B2.falseList); B.t = {104} B.f = {103,105}

Example 66 X 200 and x != y B B B B x x < < 100 B B x x > > 200 B B x x != y y B B and or M M M M   100: if x< 100 goto _ 101: goto 102 B.t = {100} B.f = {101} M.i = : if x> 200 goto : goto _ B.t = {102} B.f = {103}   M.i = : if x!=y goto _ 105: goto _ B.t = {104} B.f = {105} B.t = {104} B.f = {103,105} B  B1 or M B2backpatch(B1.falseList,M.instr); B.trueList = merge(B1.trueList,B2.trueList); B.falseList = B2.falseList; B.t = {100,104} B.f = {103,105}

Example : if x<100 goto _ 101: goto _ 102: if x>200 goto _ 103: goto _ 104: if x!=y goto _ 105: goto _ 100: if x<100 goto _ 101: goto _ 102: if x>200 goto : goto _ 104: if x!=y goto _ 105: goto _ 100: if x<100 goto _ 101: goto : if x>200 goto : goto _ 104: if x!=y goto _ 105: goto _ Before backpatching After backpatching by the production B  B1 and M B2 After backpatching by the production B  B1 or M B2

Backpatching for statements 68 productionsemantic action S  if (B) M S1backpatch(B.trueList,M.instr); S.nextList = merge(B.falseList,S1.nextList); S  if (B) M1 S1 N else M2 S2 backpatch(B.trueList,M1.instr); backpatch(B.falseList,M2.instr); temp = merge(S1.nextList,N.nextList); S.nextList = merge(temp,S2.nextList); S  while M1 (B) M2 S1 backpatch(S1.nextList,M1.instr); backpatch(B.trueList,M2.instr); S.nextList = B.falseList; emit(‘goto’ M1.instr); S  { L }S.nextList = L.nextList; S  AS.nextList = null; M  M   M.instr = nextinstr; N  N   N.nextList = makeList(nextInstr); emit(‘goto _’); L  L1 M Sbackpatch(L1.nextList,M.instr); L.nextList = S.nextList; L  SL.nextList = S.nextList

Generate code for procedures we will see handling of procedure calls in much more detail later 69 n = f(a[i]); t1 = i * 4 t2 = a[t1] // could have expanded this as well param t2 t3 = call f, 1 n = t3

Extend grammar for procedures type checking – function type: return type, type of formal parameters – within an expression function treated like any other operator symbol table – parameter names 70 D  define T id (F) { S } F   | T id, F S  return E; | … E  id (A) | … A   | E, A expressions statements

Summary Three address code. Intermediate code generation is executed with parsing (via semantic actions). Creating code for Boolean expressions and for control statements is more involved. We typically use short circuit evaluation, value of expression is implicit in control location. We need to compute the branching addresses. Option 1: compute them in a second AST pass. Option 2: backpatching (a single pass): maintain lists of incomplete jumps, where all jumps in a list have the same target. When the target becomes known, all instructions on its list are “filled in”. 71