Compiler Principles Fall Compiler Principles Lecture 6: Intermediate Representation Roman Manevich Ben-Gurion University of the Negev
Tentative syllabus Front End Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Lowering Operational Semantics Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection 2
Previously 3 Becoming parsing ninjas – Going from text to an Abstract Syntax Tree By Admiral Ham [GFDL ( or CC-BY-SA-3.0 ( via Wikimedia Commons
From scanning to parsing 59 + (1257 * xPosition) )id*num(+ Lexical Analyzer program text token stream Parser Grammar: E id E num E E + E E E * E E ( E ) + num x * Abstract Syntax Tree valid syntax error 4
Agenda The role of intermediate representations Two example languages – A high-level language – An intermediate language Lowering Correctness – Formal meaning of programs 5
Role of intermediate representation Bridge between front-end and back-end Allow implementing optimizations independent of source language and executable (target) language High-level Language (scheme) Executable Code Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation 6
Motivation for intermediate representation 7
Intermediate representation A language that is between the source language and the target language – Not specific to any source language of machine language Goal 1: retargeting compiler components for different source languages/target machines 8 C++ IR Pentium Java bytecode Sparc Pyhton Java
Intermediate representation A language that is between the source language and the target language – Not specific to any source language of machine language Goal 1: retargeting compiler components for different source languages/target machines Goal 2: machine-independent optimizer – Narrow interface: small number of node types (instructions) 9 C++ IR Pentium Java bytecode Sparc Pyhton Java optimize LoweringCode Gen.
Multiple IRs Some optimizations require high-level structure Others more appropriate on low-level code Solution: use multiple IR stages ASTLIR Pentium Java bytecode Sparc optimize HIR optimize 10
Multiple IRs example 11 Elixir Program Automated Reasoning (Boogie+Z3) Delta Inferencer QueryAnswer Elixir Program + delta Automated Planner IL Synthesizer Planning Problem Plan LIR C++ backend C++ code Galois Library HIR Lowering HIR Elixir – a language for parallel graph algorithms Mini-project on parallel graph algorithms
AST vs. LIR for imperative languages AST Rich set of language constructs Rich type system Declarations: types (classes, interfaces), functions, variables Control flow statements: if- then-else, while-do, break- continue, switch, exceptions Data statements: assignments, array access, field access Expressions: variables, constants, arithmetic operators, logical operators, function calls LIR An abstract machine language Very limited type system Only computation-related code Labels and conditional/ unconditional jumps, no looping Data movements, generic memory access statements No sub-expressions, logical as numeric, temporaries, constants, function calls – explicit argument passing 12
three address code 13
Three-Address Code IR A popular form of IR High-level assembly where instructions have at most three operands There exist other types of IR – For example, IR based on acyclic graphs – more amenable for analysis and optimizations 14 Chapter 8
Base language: While 15
Syntax A n | x | A ArithOp A | ( A ) ArithOp - | + | * | / B true | false | A = A | A A | B | B B | ( B ) S x := A | skip | S ; S | { S } | if B then S else S | while B S 16 n Numnumerals x Varprogram variables
Example program 17 while x < y { x := x + 1 { y := x;
Intermediate language: IL 18
Syntax V n | x R V Op V Op - | + | * | / | = | | > | … C l : skip | l : x := R | l : Goto l’ | l : IfZ x Goto l’ | l : IfNZ x Goto l’ IR C + 19 n NumNumerals l Num Labels x Temp VarTemporaries and variables
Intermediate language programs An intermediate program P has the form 1:c 1 … n:c n We can view it as a map from labels to individual commands and write P(j) = c j 20 1: t0 := 137 2: y := t : IfZ x Goto 7 4: t1 := y 5: z := t1 6: Goto 9 7: t2 := y 8: x := t2 9: skip
Lowering 21
TAC generation At this stage in compilation, we have – an AST – annotated with scope information – and annotated with type information To generate TAC for the program, we do recursive tree traversal – Generate TAC for any subexpressions and substatements – Using the result, generate TAC for the overall expression (bottom-up manner) 22
TAC generation for expressions Define a function cgen(expr) that generates TAC that computes an expression, stores it in a temporary variable, then hands back the name of that temporary Define cgen directly for atomic expressions (constants, this, identifiers, etc.) Define cgen recursively for compound expressions (binary operators, function calls, etc.) 23
Translation rules for expressions 24 cgen(n) = (l: t:=n, t)where l and t are fresh cgen(x) = (l: t:=x, t)where l and t are fresh cgen(e 1 ) = (P 1, t 1 ) cgen(e 2 ) = (P 2, t 2 ) cgen(e 1 op e 2 ) = (P 1 · P 2 · l: t:=t 1 op t 2, t) where l and t are fresh
cgen for basic expressions Maintain a counter for temporaries in c, and a counter for labels in l Initially: c = 0, l = 0 25 cgen(k) = { // k is a constant c = c + 1, l = l +1 Emit(l: tc := k) Return tc { cgen(id) = { // id is an identifier c = c + 1, l = l +1 Emit(l: t := id) Return tc {
Naive cgen for binary expressions cgen(e 1 op e 2 ) = { Let A = cgen(e 1 ) Let B = cgen(e 2 ) c = c + 1, l = l +1 Emit( l: tc := A op B; ) Return tc } 26 The translation emits code to evaluate e 1 before e 2. Why is that?
Example: cgen for binary expressions 27 cgen( (a*b)-d)
Example: cgen for binary expressions 28 c = 0, l = 0 cgen( (a*b)-d)
Example: cgen for binary expressions 29 c = 0, l = 0 cgen( (a*b)-d) = { Let A = cgen(a*b) Let B = cgen(d) c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc }
Example: cgen for binary expressions 30 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = cgen(a) Let B = cgen(b) c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = cgen(d) c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc }
Example: cgen for binary expressions 31 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code here A=t1
Example: cgen for binary expressions 32 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; here A=t1
Example: cgen for binary expressions 33 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; 2: t2:=b; here A=t1
Example: cgen for binary expressions 34 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 here A=t1
Example: cgen for binary expressions 35 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 here A=t1 here A=t3
Example: cgen for binary expressions 36 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 4: t4:=d here A=t1 here A=t3
Example: cgen for binary expressions 37 c = 0, l = 0 cgen( (a*b)-d) = { Let A = { Let A = {c=c+1, l=l+1, Emit(l: tc := a;), return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := b;), return tc } c = c + 1, l = l +1 Emit(l: tc := A * B; ) Return tc } Let B = {c=c+1, l=l+1, Emit(l: tc := d;), return tc } c = c + 1, l = l +1 Emit(l: tc := A - B; ) Return tc } Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 4: t4:=d 5: t5:=t3-t4 here A=t1 here A=t3
cgen for statements We can extend the cgen function to operate over statements as well Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value – (Why?) 38
Syntax A n | x | A ArithOp A | ( A ) ArithOp - | + | * | / B true | false | A = A | A A | B | B B | ( B ) S x := A | skip | S ; S | { S } | if B then S else S | while B S 39 n Numnumerals x Varprogram variables
Translation rules for statements 40 cgen(e) = (P, t) cgen( x := e) = P · l: x :=t where l is fresh cgen( b ) = (Pb, t), cgen( S 1 ) = P 1, cgen( S 2 ) = P 2 cgen( if b then S 1 else S 2 ) = Pb IfZ t Goto l false P 1 l finish : Goto L after l false : skip P 2 l after : skip cgen( skip ) = l: skip where l is fresh where l finish, l false, l after are fresh
Translation rules for loops 41 cgen( b ) = (Pb, t), cgen( S ) = P cgen( while b S ) = l before : skip Pb IfZ t Goto l after P l loop : Goto L before l after : skip where l after, l before, l loop are fresh
Translation example 42 1: t1 := 137 2: t2 := 3 3: t3 := t1 + t2 4: y := t3 5: t4 := x 6: t5 := 0 7: t6 := t4=t5 8: IfZ t6 Goto 12 9: t7 := y 10: z := t7 11: Goto 14 12: t8 := y 13: x := t8 14: skip y := 137+3; if x=0 z := y; else x := y;
Correctness 43
Compiler correctness Intuitively, a compiler translates programs in one language (usually high) to another language (usually lower) such that they are bot equivalent Our goal is to formally define the meaning of this equivalence But first, we must define the meaning of a programming language 44
Formal semantics 45
46
What is formal semantics? 47 “Formal semantics is concerned with rigorously specifying the meaning, or behavior, of programs, pieces of hardware, etc.”
What is formal semantics? 48 “This theory allows a program to be manipulated like a formula – that is to say, its properties can be calculated.” Gérard Huet & Philippe Flajolet homage to Gilles Kahn
Why formal semantics? Implementation-independent definition of a programming language Automatically generating interpreters (and some day maybe full fledged compilers) Optimization, verification, and debugging – If you don’t know what it does, how do you know its correct/incorrect? – How do you know whether a given optimization is correct? 49
Operational semantics Elements of the semantics States/configurations: the (aggregate) values that a program computes during execution Transition rules: how the program advances from one configuration to another 50
Operational semantics of while 51
While syntax reminder A n | x | A ArithOp A | ( A ) ArithOp - | + | * | / B true | false | A = A | A A | B | B B | ( B ) S x := A | skip | S ; S | { S } | if B then S else S | while B S 52 n Numnumerals x Varprogram variables
Semantic categories Z Integers {0, 1, -1, 2, -2, …} T Truth values { ff, tt } State Var Z Example state: =[ x 5, y 7, z 0] Lookup: ( x) = 5 Update: [ x 6] = [ x 6, y 7, z 0] 53
Semantics of expressions 54
Semantics of arithmetic expressions Semantic function A : State Z Defined by induction on the syntax tree n = n x = (x) a 1 + a 2 = a 1 + a 2 a 1 - a 2 = a 1 - a 2 a 1 * a 2 = a 1 a 2 (a 1 ) = a 1 --- not needed - a = 0 - a 1 Compositional Expressions in While are side-effect free 55
Arithmetic expression exercise Suppose x = 3 Evaluate x+1 56
Semantics of boolean expressions Semantic function B : State T Defined by induction on the syntax tree true = tt false = ff a 1 = a 2 = a 1 a 2 = b 1 b 2 = b = Compositional Expressions in While are side-effect free 57
Natural operating semantics Developed by Gilles Kahn [STACS 1987]STACS 1987 Configurations S, Statement S is about to execute on state Terminal (final) state Transitions S, ’ Execution of S from will terminate with the result state ’ – Ignores non-terminating computations 58
Natural operating semantics defined by rules of the form The meaning of compound statements is defined using the meaning immediate constituent statements 59 S 1, 1 1 ’, …, S n, n n ’ S, ’ if… premise conclusion side condition
Natural semantics for While 60 x := a, [x a ] [ass ns ] skip, [skip ns ] S 1, ’, S 2, ’ ’’ S 1 ; S 2, ’’ [comp ns ] S 1, ’ if b then S 1 else S 2, ’ if b = tt [if tt ns ] S 2, ’ if b then S 1 else S 2, ’ if b = ff [if ff ns ] axioms
Natural semantics for While 61 S, ’, while b S, ’ ’’ while b S, ’’ if b = tt [while tt ns ] while b S, if b = ff [while ff ns ] Non-compositional
Next lecture: Correctness of lowering