Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall Compiler Principles Lecture 5: Intermediate Representation

Similar presentations


Presentation on theme: "Fall Compiler Principles Lecture 5: Intermediate Representation"— Presentation transcript:

1 Fall 2017-2018 Compiler Principles Lecture 5: Intermediate Representation
Roman Manevich Ben-Gurion University of the Negev

2 Tentative syllabus mid-term exam Front End Intermediate Representation
Scanning Top-down Parsing (LL) Bottom-up Parsing (LR) Intermediate Representation Operational Semantics Lowering Optimizations Dataflow Analysis Loop Optimizations Code Generation Register Allocation Instruction Selection mid-term exam

3 Previously Becoming parsing ninjas
Going from text to an Abstract Syntax Tree By Admiral Ham [GFDL ( or CC-BY-SA-3.0 ( via Wikimedia Commons

4 From scanning to parsing
program text 59 + (1257 * xPosition) Lexical Analyzer token stream ) id * num ( + Grammar: E  id E  num E  E + E E  E * E E  ( E ) Parser syntax error valid + num x * Abstract Syntax Tree

5 Agenda The role of intermediate representations Two example languages
A high-level language An intermediate language Lowering Correctness Formal meaning of programs

6 Role of intermediate representation
Bridge between front-end and back-end Allow implementing optimizations independent of source language and executable (target) language High-level Language (scheme) Executable Code Lexical Analysis Syntax Analysis Parsing AST Symbol Table etc. Inter. Rep. (IR) Code Generation

7 Motivation for intermediate representation

8 Intermediate representation
A language that is between the source language and the target language Not specific to any source language or machine language Goal 1: retargeting compiler components for different source languages/target machines Java Pentium C++ IR Java bytecode Pyhton Sparc

9 Intermediate representation
A language that is between the source language and the target language Not specific to any source language or machine language Goal 1: retargeting compiler components for different source languages/target machines Goal 2: machine-independent optimizer Narrow interface: small number of node types (instructions) Lowering Code Gen. Java Pentium optimize C++ IR Java bytecode Pyhton Sparc

10 Multiple IRs Some optimizations require high-level constructs while others more appropriate on low-level code Solution: use multiple IR stages Pentium optimize optimize AST HIR LIR Java bytecode Sparc

11 Multiple IRs example Elixir – a language for parallel graph algorithms
Elixir Program + delta Elixir Program Delta Inferencer HIR Lowering IL Synthesizer HIR LIR C++ backend Query Answer Planning Problem Plan Automated Reasoning (Boogie+Z3) C++ code Automated Planner Galois Library

12 AST vs. LIR for imperative languages
Rich set of language constructs Rich type system Declarations: types (classes, interfaces), functions, variables Control flow statements: if-then-else, while-do, break-continue, switch, exceptions Data statements: assignments, array access, field access Expressions: variables, constants, arithmetic operators, logical operators, function calls An abstract machine language Very limited type system Only computation-related code Labels and conditional/ unconditional jumps, no looping Data movements, generic memory access statements No sub-expressions, logical as numeric, temporaries, constants, function calls – explicit argument passing

13 three address code

14 Three-Address Code IR A popular form of IR
High-level assembly where instructions have at most three operands There exist other types of IR For example, IR based on acyclic graphs – more amenable for analysis and optimizations Chapter 8

15 Base language: While

16 Syntax n  Num numerals x  Var program variables A  n | x | A ArithOp A | (A) ArithOp  - | + | * B  true | false | A = A | A  A | B | B  B | (B) S  x := A | skip | S; S | { S } | if B then S else S | while B S

17 Example program while x < y { x := x + 1 { y := x;

18 Intermediate language: IL

19 Syntax n  Num Numerals l  Num Labels
x  Temp  Var Temporaries and variables V  n | x R  V Op V Op  - | + | * | = |  | << | >> | … C  l: skip | l: x := R | l: Goto l’ | l: IfZ x Goto l’ | l: IfNZ x Goto l’ IR  C+

20 Intermediate language programs
An intermediate program P has the form 1:c1 … n:cn We can view it as a map from labels to individual commands and write P(j) = cj 1: t0 := 137 2: y := t0 + 3 3: IfZ x Goto 7 4: t1 := y 5: z := t1 6: Goto 9 7: t2 := y 8: x := t2 9: skip

21 Lowering

22 TAC generation At this stage in compilation, we have
an AST annotated with scope information and annotated with type information We generate TAC recursively (bottom-up) First for expressions Then for statements

23 TAC generation for expressions
Define a function cgen(expr) that generates TAC that: Computes an expression, Stores it in a temporary variable, Returns the name of that temporary Define cgen directly for atomic expressions (constants, this, identifiers, etc.) Define cgen recursively for compound expressions (binary operators, function calls, etc.)

24 Translation rules for expressions
cgen(n) = (l: t:=n, t) where l and t are fresh cgen(x) = (l: t:=x, t) where l and t are fresh cgen(e1) = (P1, t1) cgen(e2) = (P2, t2) cgen(e1 op e2) = (P1· P2· l: t:=t1 op t2, t) where l and t are fresh

25 cgen for basic expressions
Maintain a counter for temporaries in c, and a counter for labels in l Initially: c = 0, l = 0 cgen(k) = { // k is a constant c = c + 1, l = l emit(l: tc := k) return tc { cgen(id) = { // id is an identifier c = c + 1, l = l emit(l: tc := id) return tc {

26 Naive cgen for binary expressions
cgen(e1 op e2) = { let A = cgen(e1) let B = cgen(e2) c = c + 1, l = l emit( l: tc := A op B; ) return tc } The translation emits code to evaluate e1 before e2. Why is that?

27 Example: cgen for binary expressions
cgen( (a*b)-d)

28 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d)

29 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = cgen(a*b) let B = cgen(d) c = c + 1 , l = l emit(l: tc := A - B; ) return tc }

30 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = cgen(a) let B = cgen(b) c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = cgen(d) c = c + 1, l = l emit(l: tc := A - B; ) return tc }

31 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } Code here A=t1

32 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } Code 1: t1:=a; here A=t1

33 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } Code 1: t1:=a; 2: t2:=b; here A=t1

34 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 here A=t1

35 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } here A=t3 Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 here A=t1

36 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } here A=t3 Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 4: t4:=d here A=t1

37 Example: cgen for binary expressions
c = 0, l = 0 cgen( (a*b)-d) = { let A = { let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc } let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc } c = c + 1, l = l emit(l: tc := A * B; ) return tc } let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc } c = c + 1, l = l emit(l: tc := A - B; ) return tc } here A=t3 Code 1: t1:=a; 2: t2:=b; 3: t3:=t1*t2 4: t4:=d 5: t5:=t3-t4 here A=t1

38 cgen for statements We can extend the cgen function to operate over statements as well Unlike cgen for expressions, cgen for statements does not return the name of a temporary holding a value (Why?)

39 Syntax n  Num numerals x  Var program variables A  n | x | A ArithOp A | (A) ArithOp  - | + | * | / B  true | false | A = A | A  A | B | B  B | (B) S  x := A | skip | S; S | { S } | if B then S else S | while B S

40 Translation rules for statements
cgen(skip) = l: skip where l is fresh cgen(e) = (P, t) cgen(x := e) = P · l: x :=t where l is fresh cgen(S1) = P1, cgen(S2) = P cgen(S1; S2) = P1 · P2

41 Translation rules for conditions
cgen(b) = (Pb, t), cgen(S1) = P1, cgen(S2) = P cgen(if b then S1 else S2) = Pb IfZ t Goto lfalse P lfinish: Goto Lafter lfalse: skip P lafter: skip where lfinish, lfalse, lafter are fresh

42 Translation example y := 137+3; if x=0 z := y; else x := y;
1: t1 := 137 2: t2 := 3 3: t3 := t1 + t2 4: y := t3 5: t4 := x 6: t5 := 0 7: t6 := t4=t5 8: IfZ t6 Goto 12 9: t7 := y 10: z := t7 11: Goto 14 12: t8 := y 13: x := t8 14: skip

43 Translation rule for loops
cgen(b) = (Pb, t), cgen(S) = P cgen(while b S) = lbefore: skip Pb IfZ t Goto lafter P lloop: Goto Lbefore lafter: skip where lafter, lbefore, lloop are fresh

44 Translation example y := 137+3; while x=0 z := y;
1: t1 := 137 2: t2 := 3 3: t3 := t1 + t2 4: y := t3 5: t4 := x 6: t5 := 0 7: t6 := t4=t5 8: IfZ t6 Goto 12 9: t7 := y 10: z := t7 11: Goto 14 12: skip

45 Translation Correctness

46 Compiler correctness Compilers translate programs in one language (usually high) to another language (usually lower) such that they are both equivalent Our goal is to formally define the meaning of this equivalence First, we must formally define the meaning of programs

47 Formal semantics

48

49 What is formal semantics?
“Formal semantics is concerned with rigorously specifying the meaning, or behavior, of programs, pieces of hardware, etc.”

50 Why formal semantics? Implementation-independent definition of a programming language Automatically generating interpreters (and some day maybe full fledged compilers) Optimization, verification, and debugging If you don’t know what it does, how do you know its correct/incorrect? How do you know whether a given optimization is correct?

51 Operational semantics
Elements of the semantics States/configurations: the (aggregate) values that a program computes during execution Transition rules: how the program advances from one configuration to another

52 Operational semantics of while

53 While syntax reminder n  Num numerals x  Var program variables A  n | x | A ArithOp A | (A) ArithOp  - | + | * B  true | false | A = A | A  A | B | B  B | (B) S  x := A | skip | S; S | { S } | if B then S else S | while B do S

54 Semantic categories Z Integers {0, 1, -1, 2, -2, …} T Truth values {ff, tt} State Var  Z Example state:  =[x5, y7, z0] Lookup: (x) = 5 Update: [x6] = [x6, y7, z0]

55 Semantics of expressions

56 Semantics of arithmetic expressions
Semantic function  A  : State  Z Defined by induction on the syntax tree  n   = n  x   = (x)  a1 + a2   =  a1   +  a2    a1 - a2   =  a1   -  a2    a1 * a2   =  a1     a2    (a1)   =  a1   not needed  - a   = 0 -  a1   Compositional Expressions in While are side-effect free

57 Arithmetic expression exercise
Suppose  x = 3 Evaluate x+1 

58 Semantics of boolean expressions
Semantic function  B  : State  T Defined by induction on the syntax tree  true   = tt  false   = ff  a1 = a2  =  a1  a2   =  b1  b2   =  b   = Compositional Expressions in While are side-effect free

59 Semantics of statements
first step By Vanillase (Own work) [CC BY-SA 3.0 ( via Wikimedia Commons This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

60 Operational semantics
Developed by Gordon Plotkin Configurations:  has one of two forms: S,  Statement S is about to execute on state   Terminal (final) state Transitions S,    = S’, ’ Execution of S from  is not completed and remaining computation proceeds from intermediate configuration  = ’ Execution of S from  has terminated and the final state is ’ S,  is stuck if there is no  such that S,    first step PLOTKIN G .D., “A Structural Approach to Operational Semantics”, DAIMI FN-19, Computer Science Department, Aarhus University, Aarhus, Denmark, September 1981.

61 Form of semantic rules  defined by rules of the form
The meaning of compound statements is defined using the meaning immediate constituent statements side condition premise S1, 1  1’, … , Sn, n  n’ S,   ’ if… conclusion

62 Operational semantics for While
x:=a,   [xa] [ass] skip,    [skip] S1,   S1’, ’ S1; S2,   S1’; S2, ’ [comp1] When does this happen? S1,   ’ S1; S2,   S2, ’ [comp2] if b then S1 else S2,   S1,  if b  = tt [iftt] if b then S1 else S2,   S2,  if b  = ff [ifff] while b do S,   S; while b do S,  [whilett] if b  = tt while b do S,    [whileff] if b  = ff

63 Factorial (n!) example Input state  such that  x = 3
y := 1; while (x=1) do (y := y * x; x := x – 1) y :=1 ; W,   W, [y1]  ((y := y * x; x := x – 1); W), [y1]  (x := x – 1; W), [y3]  W , [y3][x2]  ((y := y * x; x := x – 1); W), [y3] [x2]  (x := x – 1; W) , [y6] [x2]  W, [y6][x1]  [y6][x1]

64 Derivation sequences An interpreter
Given a statement S and an input state  start with the initial configuration S,  Repeatedly apply rules until either a terminal state is reached or an infinite sequence We will write S,  k ’ if ’ can be obtained from S,  by k derivation steps We will write S,  * ’ if S,  k ’ for some k0

65 The semantics of statements
The meaning of a statement S is defined as Examples: skip =  x:=1 = [x 1] while true do skip =  S  = ’ if S,  * ’  else

66 Properties of operational semantics

67 Deterministic semantics for While
Theorem: for all statements S and states 1, 2 if S,  * 1 and S,  * 2 then 1= 2 Theorem: for all statements S and states , no derivation sequence from S,  ever reaches a stuck state

68 Program termination Given a statement S and input 
S terminates on  if there exists a state ’ such that S,  * ’ S loops on  if there is no state ’ such that S,  * ’ Given a statement S S always terminates if for every input state , S terminates on  S always loops if for every input state , S loops on 

69 Semantic equivalence S1 and S2 are semantically equivalent if for all  : S1,  *  if and only if S2,  *   is either stuck or terminal and there is an infinite derivation sequence for S1,  if and only if there is one for S2,  A formal basis for correct optimizations of While programs Examples S; skip and S while b do S and if b then (S; while b S) else skip ((S1; S2); S3) and (S1; (S2; S3)) (x:=5; y:=x*8) and (x:=5; y:=40)

70 Operational semantics of IL

71 IL syntax reminder n  Num Numerals l  Num Labels
x  Var  Temp Variables and temporaries V  n | x R  V Op V |V Op  - | + | * | = |  | << | >> | … C  l: skip | l: x := R | l: Goto l’ | l: IfZ x Goto l’ IR  C+

72 Intermediate program states
Z Integers {0, 1, -1, 2, -2, …} IState (Var  Temp  {pc})  Z pc (for program counter) is a special variable Var, Temp, and {pc} are all disjoint For every intermediate program state m and program P=1:c1,…,n:cn we have that 1  m(pc)  n+1 We can check that the labels used in P are legal

73 Rules for executing commands
We will use rules of the following form Here m is the pre-state, which is scheduled to be executed as the program counter indicates, and m’ is the post-state The rules specialize for the particular type of command C and possibly other conditions m(pc) = l P(l) = C m  m’

74 Evaluating values nm = n xm = m(x) v1 op v2m = v1m op v2m

75 Rules for executing commands
m(pc) = l P(l) = skip m  m[pcl+1] m(pc) = l P(l) = x:=v m  m[pcl+1, xv m] m(pc) = l P(l) = x:=v1 op v2 m  m[pcl+1, xv1 op v2 m] m(pc) = l P(l) = Goto l’ m  m[pcl’]

76 Rules for executing commands
m(pc) = l P(l) = IfZ x Goto l’ m(x)=0 m  m[pcl’] m(pc) = l P(l) = IfZ x Goto l’ m(x)0 m  m[pcl+1] m(pc) = l P(l) = IfNZ x Goto l’ m(x)  0 m  m[pcl’] m(pc) = l P(l) = IfNZ x Goto l’ m(x)=0 m  m[pcl+1]

77 Executing programs For a program P=1:c1,…,n:cn we define executions as finite or infinite sequences m1  m2 …  mk … We write m * m’ if there is a finite execution starting at m and ending at m’: m = m1  m2 …  mk = m’

78 Semantics of a program For a program P=1:c1,…,n:cn and a state m s.t. m(pc)=1 we define the result of executing P on m as Lemma: the function is well-defined (i.e., at most one output state) P m = m’ if m * m’ and m’(pc)=n+1  else

79 Execution example Execute the following intermediate language program on a state where all variables evaluate to 0 1: t1 := 137 2: y := t1 + 3 3: IfZ x Goto 7 4: t2 := y 5: z := t2 6: Goto 9 7: t3 := y 8: x := t3 9: skip m = [pc1, t10 , t20 , t30 , x0 , y0]  ?

80 Execution example Execute the following intermediate language program on a state where all variables evaluate to 0 1: t1 := 137 2: y := t1 + 3 3: IfZ x Goto 7 4: t2 := y 5: z := t2 6: Goto 9 7: t3 := y 8: x := t3 9: skip m = [pc1, t10 , t20 , t30 , x0 , y0]  m[pc2, t1137]  m[pc3, t1137, y140]  m[pc7, t1137, y140]  m[pc8, t1137, t3140, y140]  m[pc9, t1137, t3140, , x140, y140]  m[pc10, t1137, t3140, , x140, y140]

81 Semantic equivalence P1 and P2 are semantically equivalent if for all m the following holds: ?

82 Are these programs semantically equivalent?
Semantic equivalence P1 and P2 are semantically equivalent if for all m the following holds: Attempt 1: P1 m = P2 m Are these programs semantically equivalent? 1: t1 := 137 2: y := t1 + 3 1: y := 140

83 Semantic equivalence P1 and P2 are semantically equivalent if for all m the following holds: (P1 m)|Var = (P2 m)|Var

84 Exercise Why do we need this?

85 Adding procedures

86 While+procedures syntax
n  Num Numerals x  Var Program variables f Function identifiers, including main A  n | x | A ArithOp A | (A) | f(A*) ArithOp  - | + | * | / B  true | false | A = A | A  A | B | B  B | (B) S  x := A | skip | S; S | { S } | if B then S else S | while B S | f(A*) F  f(x*) { S } P ::= F+ Can be used to call side-effecting functions and to return values from functions

87 IL+procedures syntax n  Num Numerals
l  Num Labels and function identifiers x  Var  Temp  Params Variables, temporaries, and parameters f Function identifiers V  n | x R  V Op V | V | Call f(V*) | pn Op  - | + | * | / | = |  | << | >> | … C  l: skip | l: x := R | l: Goto l’ | l: IfZ x Goto l’ | l: IfNZ x Goto l’ | l: Call f(x*) | l: Ret x IR  C+ Accesses value of function parameter n

88 Exercise

89 Formalizing the correctness of lowering

90 Exercise 1 Are the following equivalent in your opinion? While IL
2: x := t1

91 Exercise 2 Are the following equivalent in your opinion? While IL
2: x := y

92 Exercise 3 Are the following equivalent in your opinion? While IL
2: t2 := 138 3: t1 := 137 4: x := t1

93 Exercise 4 Are the following equivalent in your opinion? While IL
2: t2 := 138 3: t1 := 137 4: x := t1 5: t1 := 138

94 Equivalence of arithmetic expressions
While state:   Var  Z IL state: m  (Var  Temp  {pc})  Z Define label(P) to be first label of IL program P cgen(a) = (P,res) An arithmetic While expression a is equivalent to an IL program P and resVar  Temp iff for every input state m such that m(pc)=label(P): a m|Var = (P m) res

95 Expression equivalence example
137+x m|Var = (P m) t3 for every state m such that m(pc)=1 Example: m=[pc1, t10 , t20 , t30 , x3] then m|Var is m|{x} = [x3] (P [pc1, t10 , t20 , t30 , x3])= [pc4, t13 , t2137 , t3140 , x3] t3 = 140 While IL 137+x 1: t1 := x 2: t2 := 137 3: t3 := t1 + t2

96 Statement equivalence
We define state equivalence by  m iff =m|Var That is, for each xVar (x)=m(x) A While statement S is equivalent to an IL program P iff for every input states  m such that m(pc)=label(P): S   P m

97 Example While IL S m|Var  P m
Example: if m=[pc1, t10 , t20 , t30 , x3] then m|Var is m|{x} = [x3] S m|Var = [x137] P m = [pc6, t1137 , t2138 , t30 , x137] While IL x := 137 2: t2 := 138 3: t1 := 137 4: x := t1 5: t1 := 138

98 Next lecture: Dataflow-based Optimizations


Download ppt "Fall Compiler Principles Lecture 5: Intermediate Representation"

Similar presentations


Ads by Google