Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:

Similar presentations


Presentation on theme: "Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:"— Presentation transcript:

1 Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Jeff Ullman, Eran Yahav

2 Admin Next week: Trubowicz 101 (Law school) Mobiles... 2

3 What is a Compiler? “A compiler is a computer program that transforms source code written in a programming language (source language) into another language (target language). The most common reason for wanting to transform source code is to create an executable program.” --Wikipedia 3

4 Conceptual Structure of a Compiler Executable code exe Source text txt Semantic Representation Backend Compiler Frontend Lexical Analysis Syntax Analysis Parsing Semantic Analysis Intermediate Representation (IR) Code Generation wordssentences 4

5 From scanning to parsing ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 5

6 Context Free Grammars V – non terminals (syntactic variables) T – terminals (tokens) P – derivation rules –Each rule of the form V  (T  V)* S – start symbol G = (V,T,P,S) 6

7 CFG terminology Symbols: Terminals (tokens): ; := ( ) id num print Non-terminals: S E L Start non-terminal: S Convention: the non-terminal appearing in the first derivation rule Grammar productions (rules) N  μ S  S ; S S  id := E E  id E  num E  E + E 7

8 CFG terminology Derivation - a sequence of replacements of non-terminals using the derivation rules Language - the set of strings of terminals derivable from the start symbol Sentential form - the result of a partial derivation in which there may be non- terminals 8

9 Parse Tree S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x:= z;y := x + z S S S S ; ; S S id := E E id := E E E E + + E E id 9

10 Leftmost/rightmost Derivation Leftmost derivation –always expand leftmost non-terminal Rightmost derivation –Always expand rightmost non-terminal Allows us to describe derivation by listing the sequence of rules –always know what a rule is applied to 10

11 Leftmost Derivation x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S SS ; id := ES ; id := idS ; id := E ; id := id id := E + E ; id := id id := id + E ; id := id id := id + id ; S  S;S S  id := E E  id S  id := E E  E + E E  id x := z ; y := x + z 11

12 Broad kinds of parsers Parsers for arbitrary grammars –Earley’s method, CYK method –Usually, not used in practice (though might change) Top-Down parsers –Construct parse tree in a top-down matter –Find the leftmost derivation Linear Bottom-Up parsers –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order 12

13 Intuition: Top-Down Parsing Begin with start symbol “Guess” the productions Check if parse tree yields user's program 13

14 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FFF T T E T E 14 Recall: Non standard precedence …

15 We need this rule to get the * +*321 E 15 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

16 +*321 E T E 16 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

17 +*321 T E T E 17 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

18 +*321 F T T E T E 18 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

19 +*321 FF T T E T E 19 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

20 +*321 FF T T E E 20 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) T

21 +*321 FF T T E T E 21 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

22 +*321 FFF T T E T E 22 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

23 +*321 FFF T T E T E 23 Intuition: Top-Down parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

24 Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? 24

25 Blind exhaustive search –Goes over all possible production rules –Read & parse prefixes of input –Backtracks if guesses wrong Implementation –Uses (possibly recursive) functions for every production rule –Backtracks  “rewind” input 25 Recursive Descent

26 Recursive descent bool A(){ // A  A 1 | … | A n pos = recordCurrentPosition(); for (i = 1; i ≤ n; i++){ if (A i ()) return true; rewindCurrent(pos); } return false; } bool A i (){ // A i = X 1 X 2 …X k for (j=1; j ≤ k; j++) if (X j is a terminal) if (X j == current) match(current); else return false; else if (! X j ()) return false; return true; } 26 )x*)7+23(( RPIdOPRPNumOPNumLP token stream current

27 Example Grammar –E  T | T + E –T  int | int * T | ( E ) Input: ( 5 ) Token stream: LP int RP 27

28 int E() { return E() && match(token(‘-’)) && term(); } E  E - term | term  What happens with this procedure?  Recursive descent parsers cannot handle left-recursive grammars p. 127 28 Problem: Left Recursion

29 Left recursion removal L(G 1 ) = β, βα, βαα, βααα, … L(G 2 ) = same N  Nα | β N  βN’ N’  αN’ |  G1G1 G2G2 E  E - term | term E  term TE | term TE  - term TE |   For our 3 rd example: p. 130 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition 29

30 Challenges in top-down parsing Top-down parsing begins with virtually no information –Begins with just the start symbol, which matches every program How can we know which productions to apply? Wanted: Top-Down parsing without backtracking 30

31 Predictive parsing Given a grammar G and a word w derive w using G –Apply production to leftmost nonterminal –Pick production rule based on next input token General grammar –More than one option for choosing the next production based on a token Restricted grammars (LL) –Know exactly which single rule to apply based on Non terminal Next (k) tokens (lookahead) 31

32 Boolean expressions example E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor 32 not ( not true or false )

33 Boolean expressions example not ( not true or false ) E => not E => not ( E OP E ) => not ( not E OP E ) => not ( not LIT OP E ) => not ( not true OP E ) => not ( not true or E ) => not ( not true or LIT ) => not ( not true or false ) not E E (EOPE) notLITorLIT truefalse production to apply known from next token E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor 33

34 Cannot tell which rule to use based on lookahead (ID) term  ID | ID [ expr ] Problem: productions with common prefix 34

35 Solution: left factoring Rewrite the grammar to be in LL(1) Intuition: just like factoring x*y + x*z into x*(y+z) term  ID | ID [ expr ] term  ID after_ID After_ID  [ expr ] |  35

36 S  if E then S else S | if E then S | T S  if E then S S’ | T S’  else S |  Left factoring – another example 36

37 LL(k) Parsers Predictive parser –Can be generated automatically –Does not use recursion –Efficient In contrast, recursive descent –Manual construction –Recursive –Expensive 37

38 Pushdown automaton uses –Prediction stack –Input token stream –Transition table nonterminals x tokens -> production alternative Entry indexed by nonterminal N and token t contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via pushdown automata and prediction table 38

39 ()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Which rule should be used Example transition table 39 Table

40 Non-Recursive Predictive Parser Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output 40

41 LL(k) parsing via pushdown automata and prediction table Two possible moves –Prediction When top of stack is nonterminal N, pop N, lookup table[N,t]. If table[N,t] is not empty, push table[N,t] on prediction stack, otherwise – syntax error –Match When top of prediction stack is a terminal T, must be equal to next input token t. If (t == T), pop T and consume t. If (t ≠ T) syntax error Parsing terminates when prediction stack is empty –If input is empty at that point, success. Otherwise, syntax error 41

42 abc AA  aAbA  c A  aAb | c aacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A  aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A  aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A  c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 42

43 FIRST sets FIRST(α) = { t | α  * t β} ∪ {X  * ℇ } –FIRST(α) = all terminals that α can appear as first in some derivation for α + ℇ if can be derived from X Example: –FIRST( LIT ) = { true, false } –FIRST( ( E OP E ) ) = { ‘(‘ } –FIRST( not E ) = { not } 43

44 Computing FIRST sets FIRST (t) = { t } // “t” non terminal ℇ∈ FIRST(X) if –X  ℇ or –X  A 1.. A k and ℇ∈ FIRST(A i ) i=1…k FIRST(α) ⊆ FIRST(X) if –X  A 1.. A k α and ℇ∈ FIRST(A i ) i=1…k 44

45 FIRST sets computation example STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT 45

46 1. Initialization TERMEXPRSTMT id constant zero? Not ++ -- if while 46 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

47 TERMEXPRSTMT id constant zero? Not ++ -- if while zero? Not ++ -- 47 2. Iterate 1 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

48 2. Iterate 2 TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- 48 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

49 2. Iterate 3 – fixed-point TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant 49 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant

50 FOLLOW sets FOLLOW(X) = { t | S  * α X t β} FOLLOW(X) = set of tokens that can immediately follow X in some sentential form NOT related to what can be derived from X Intuition: X  A B –FIRST(B) ⊆ FOLLOW(A) –FOLLOW(B) ⊆ FOLLOW(X) –If B  *  then FOLLOW(A) ⊆ FOLLOW(X) p. 189 50

51 FOLLOW sets: Constraints $ ∈ FOLLOW(S) FIRST(β) – { ℇ } ⊆ FOLLOW(X) –For each A  α X β FOLLOW(A) ⊆ FOLLOW(X) –For each A  α X β and ℇ ∈ FIRST(β) 51

52 Example: FOLLOW sets E  TX X  + E | ℇ T  (E) | int Y Y  * T | ℇ 52 Terminal+(*)int FOLLOWint, ( _, ), $*, ), +, $ Non. Term. ETXY FOLLOW), $+, ), $$, )_, ), $

53 Prediction Table A  α T[A,t] = α if t ∈ FIRST(α) T[A,t] = α if ℇ ∈ FIRST(α) and t ∈ FOLLOW(A) –t can also be $ T is not well defined  the grammar is not LL(1) 53

54 Problem: Non LL Grammars bool S() { return A() && match(token(‘a’)) && match(token(‘b’)); } bool A() { return match(token(‘a’)) || true; } S  A a b A  a |   What happens for input “ab”?  What happens if you flip order of alternatives and try “aab”? 54

55 FIRST(S) = { a }FOLLOW(S) = { $ } FIRST(A) = { a  }FOLLOW(A) = { a } FIRST/FOLLOW conflict S  A a b A  a |  55 Problem: Non LL Grammars

56 Solution: substitution S  A a b A  a |  S  a a b | a b Substitute A in S S  a after_A after_A  a b | b Left factoring 56

57 LL(k) grammars A grammar is LL(k) if it can be derived via: –Top-down derivation –Scanning the input from left to right (L) –Producing the leftmost derivation (L) –With lookahead of k tokens (k) –T is well defined A language is said to be LL(k) when it has an LL(k) grammar 57

58 LL(k) grammars A grammar is not LL(k) if it is –Ambiguous –Left recursive –Not left factored –… 58

59 Earley Parsing Invented by Earley [PhD. 1968] Handles arbitrary CFG Can handle ambiguous grammars Complexity O(N 3 ) when N = |input| Uses dynamic programming –Compactly encodes ambiguity 59

60 Dynamic programming Break a problem P into subproblems P 1 …P k –Solve P by combining solutions for P 1 …P k –Memo-ize (store) solutions to subproblems instead of re-computation Bellman-Ford shortest path algorithm –Sol(x,y,i) = minimum of Sol(x,y,i-1) Sol(t,y,i-1) + weight(x,t) for edges (x,t) 60

61 Earley Parsing Dynamic programming implementation of a recursive descent parser –S[N+1] Sequence of sets of “Earley states” N = |INPUT| Earley state (item) s is a sentential form + aux info –S[i] All parse tree that can be produced (by a RDP) after reading the first i tokens S[i+1] built using S[0] … S[i] 61

62 Earley States s = –constituent (dotted rule) for A  αβ A  αβ predicated constituents A  αβ in-progress constituents A  αβ completed constituents –back previous Early state in derivation 62

63 Earley Parser Input = x[1…N] S[0] = ; S[1] = … S[N] = {} for i = 0... N do until S[i] does not change do foreach s ∈ S[i] if s = and a=x[i+1] then S[i+1] = S[i+1] ∪ { } // scan if s = and X  α then S[i] = S[i] ∪ { } // predict if s = and ∈ S[b] then S[i] = S[i] ∪ { } // complete 63

64 Example 64

65 65


Download ppt "Compilation 0368-3133 (Semester A, 2013/14) Lecture 4: Syntax Analysis (Top-Down Parsing) Modern Compiler Design: Chapter 2.2 Noam Rinetzky 1 Slides credit:"

Similar presentations


Ads by Google