Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compilation 0368-3133 Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model.

Similar presentations


Presentation on theme: "Compilation 0368-3133 Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model."— Presentation transcript:

1 Compilation 0368-3133 Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model for face representation and sketching," IEEE Trans. Pattern Analysis and Machine Intelligence(PAMI)'08. * * * *

2 Where are we? 2 Executable code exe Source text txt Lexical Analysis Sem. Analysis Process text input characters Syntax Analysis tokens AST Intermediate code generation Annotated AST Intermediate code optimization IR Code generation IR Target code optimization Symbolic Instructions SI Machine code generation Write executable output MI Lexical Analysis Syntax Analysis

3 From scanning to parsing 3 ((23 + 7) * x) )x*)7+23( RPIdOPRPNum( LP Lexical Analyzer characters (program text) token stream Parser Context free grammar: Exp ... |Exp + Exp | Id Op(*) Id(x) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error Regular language: Id  ‘a’ |... | ‘z’

4 Broad kinds of parsers Top-Down parsers –Construct parse tree in a top-down matter –Find the leftmost derivation Bottom-Up parsers –Construct parse tree in a bottom-up manner –Find the rightmost derivation in a reverse order Parsers for arbitrary grammars –Earley’s method, CYK method –Usually, not used in practice (though might change) 4

5 5

6 Context free grammars (CFGs) V – non terminals (syntactic variables) T – terminals (tokens) P – derivation rules Each rule of the form V  (T  V)* S – start symbol 6 G = (V,T,P,S)

7 Derivations Show that a sentence ω is in a grammar G by repeatedly applying a production rule Sentence αNβ Rule N  µ Derived sentence: αNβ  αµβ –µ 1  * µ 2 if µ 1  …  µ 2 7

8 Leftmost Derivation x := z; y := x + z S  S;S S  id := E E  id | E + E | E * E | ( E ) S SS ; id := ES ; id := idS ; id := E ; id := id id := E + E ; id := id id := id + E ; id := id id := id + id ; S  S;S S  id := E E  id S  id := E E  E + E E  id x := z ; y := x + z 8

9 Rightmost Derivation S SS ; Sid := E ; Sid := E + E ; Sid := E + id ; S id := id + id ; id := E id := id + id ; id := id id := id + id ; ASS ; ASS PLUS S  S;S S  id := E | … E  id | E + E | E * E | … S  S;S S  id := E E  E + E E  id S  id := E E  id 9 ASS ; ASS PLUS

10 Parse tree S SS ; id := ES ; id := idS ; id := E ; id := idid := E + E ; id := idid := E + id ; id := idid := id + id ; x:= z;y := x + z S S S S ; ; S S id := E E id := E E E E + + E E id 10

11 Ambiguity x := y+z*w S  S ; S S  id := E | … E  id | E + E | E * E | … S S id := E E E E + + E E id E E * * E E S S := E E E E * * E E id E E + + E E 11

12 Top-down parsing 12 Begin with Start symbol Apply production rules Until desired word is derived

13 Top-down parsing 13 Begin with Start symbol Apply production rules Until desired word is derived Can be implemented using recursion

14 Recursive descent parsing Define a function for every nonterminal Every function work as follows –Find applicable production rule –Terminal function checks match with next input token –Nonterminal function calls (recursively) other functions 14

15 Recursive descent parsing Define a function for every nonterminal Every function work as follows –Find applicable production rule –Terminal function checks match with next input token –Nonterminal function calls (recursively) other functions If there are several applicable productions … 15

16 Top-down parsing 16

17 Recursive descent parsing with lookahead Define a function for every nonterminal Every function work as follows –Find applicable production rule –Terminal function checks match with next input token –Nonterminal function calls (recursively) other functions If there are several applicable productions decide based on the next unmatched token 17

18 Predictive parsing Recursive descent LL(k) grammars 18

19 A predictive (recursive descent) parser E() { if (current  {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error(); } LIT() { if (current == TRUE) match(TRUE); else if (current == FALSE) match(FALSE); else error(); } E  LIT | (E OP E) | not E LIT  true | false OP  and | or | xor 19 match(token t) { if (current == t) current = next_token() else error(); } Reminder: Variable current holds the current input token OP() { if (current == AND)match(AND); else if (current == OR) match(OR); else if (current == XOR) match(XOR); else error(); } Note: TRUE = token for “true” etc.

20 What we want: Lookaheads! 20 E() { if (current  {TRUE, FALSE}) LIT(); else if (current == LPAREN) match(LPAREN); E(); OP(); E(); match(RPAREN); else if (current == NOT) match(NOT); E(); else error(); } E LIT | (E OP E) | not E LIT true | false OP and | or | xor Note: TRUE = token for “true” etc.

21 Why we want it: Prediction Table! Given –Non terminal X –Derivation rules X α 1 | … | α k –Terminal t T[X,t] = α i if we should apply rule α i 21 Prediction table Remember the colors

22 How we get it? First FIRST Then FOLLOW –(then FIRST again …) 22

23 FIRST Sets FIRST(µ): The set of first tokens in words in L(µ) FIRST( LIT ) = { TRUE, FSLSE } FIRST( ( E OP E ) ) = { LPAREN } FIRST( not E ) = { NOT } X α 1 |…|α k FIRST(X) = FIRST(α 1 ) ∪ … ∪ FIRST(α k ) ∪ { ℇ } if α i * ℇ for some α i 23 E LIT | (E OP E) | not E

24 FIRST Sets FIRST(X) = { t | X * t β} ∪ { ℇ | X * ℇ } –all terminals t can appear as first in some derivation for X Plus ℇ if it can be derived from X If for every α and β such that X... α |…| β … FIRST(α)∩FIRST(β) = {} then we can always predict (choose) which rule to apply based on next token 24

25 FIRST Sets FIRST(X) = { t | X * t β} ∪ { ℇ | X * ℇ } –all terminals t can appear as first in some derivation for X Plus ℇ if it can be derived from X If for every α and β such that X... α |…| β … FIRST(α)∩FIRST(β) = {} then we can always predict (choose) which rule to apply based on next token X... α |…| β … input = a… 25 a ∈ FIRST(α)  use X α

26 Computing FIRST sets FIRST (t) = { t } –t is a non terminal –t is a sentential form ℇ∈ FIRST(X) if –X ℇ or –X A 1 …A k and ℇ∈ FIRST(A i ) i=1..k 26

27 constraints Computing FIRST sets (take I) Assume no null productions X  …| ℇ | … Observation If X … | tα |…| Nβ Then { t } = FIRST(N) ⊆ FIRST(X) and FIRST(N) ⊆ FIRST(X) 27 Compute FIRST by solving the constraint system If we know the (minimal) solution to this constraints system then we have first If we know FIRST() then we have the solution to this constraints system

28 Fixed-point algorithm for computing FIRST sets Assume no null productions X i  …| ℇ | … Initialization FIRST(X i ) = { t | X i  tβ for some β} i = 1..m Body do for every X i  X k β FIRST(X i ) = FIRST(X i ) ∪ FIRST(X k ) until FIRST(X i ) does not change for any X i 28 Say we have m non-terminals

29 FIRST sets constraints example STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant 29 Initialization: F(STMT) = {if, while} F (EXPR) = {zero?, not, ++, --} F(TERM) = {id, constant} do F’(STMT) = F(STMT); F’(EXPR) = F(EXPR); F’(TERM) = F(TERM); F(STMT) = F(STMT) ∪ F(EXPR); F(EXPR) = F(EXPR) ∪ F(TERM); Until (F’(STMT) == F(STMT) && F’(EXPR) = F(EXPR) && F’(TERM) = F(TERM)); * F = FIRST

30 FIRST sets computation example STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT 30

31 1. Initialization TERMEXPRSTMT id constant zero? Not ++ -- if while STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant 31

32 STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant TERMEXPRSTMT id constant zero? Not ++ -- if while zero? Not ++ -- 32 2. F(STMT) = F(STMT) ∪ F(EXPR)

33 3. F(EXPR) = F(EXPR) ∪ F(TERM) TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant 33

34 4. F(STMT) = F(STMT) ∪ F(EXPR) TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant 34

35 4. We reached a fixed-point TERMEXPRSTMT id constant zero? Not ++ -- if while id constant zero? Not ++ -- id constant STMT  if EXPR then STMT | while EXPR do STMT | EXPR ; EXPR  TERM -> id | zero? TERM | not EXPR | ++ id | -- id TERM  id | constant 35

36 Fixed-point algorithm for computing FIRST sets What to do with null productions? X  Y a | Z b Y  ℇ Z  ℇ Say input=“a”, which rule to use? a ∉ FIRST (Y), a ∉ FIRST (Z) 36 Use what comes after Y/Z

37 constraint Computing FIRST sets (take II) Observation If X A 1.. A k N α| … and ℇ∈ FIRST(A 1 ), …, ℇ∈ FIRST(A k ) Then FIRST(N) \ { ℇ } ⊆ FIRST(X) 37 ℇ a… Use what comes after A 1..A k to predict which production rule of X to use

38 FOLLOW sets FOLLOW(N) = the set of tokens that can immediately follow the non-terminal N in some sentential form If S ➝ * αNtβ then t ∈ FOLLOW(N) p. 189 38

39 FOLLOW sets FOLLOW(N) = the set of tokens that can immediately follow the non-terminal N in some sentential form If S ➝ * αNtβ then t ∈ FOLLOW(N) FOLLOW(t) = the set … terminal t … form If αNtβ ➝ * α’tqβ’ then q ∈ FOLLOW(t) p. 189 39

40 FOLLOW sets: Constraints $ ∈ FOLLOW(S) If X  α N β then FIRST(β) – { ℇ } ⊆ FOLLOW(N) If X  α N β and ℇ ∈ FIRST(β) then FOLLOW(X) ⊆ FOLLOW(N) 40 End of input Start symbol Compute FIRST and FOLLOW by solving the extended constraint system

41 Example: FOLLOW sets E  TX X  + E | ℇ T  (E) | int Y Y  * T | ℇ 41 Terminal+(*)int FOLLOWint, ( _, ), $*, ), +, $ Non. Term. ETXY FOLLOW), $+, ), $$, )_, ), $

42 Prediction Table A  α T[A,t] = α if t ∈ FIRST(α) T[A,t] = α if ℇ ∈ FIRST(α) and t ∈ FOLLOW(A) –t can also be $ T is not well defined  the grammar is not LL(1) 42

43 LL(k) grammars A grammar is in class LL(k) iff for every two productions A  α and A  β –FIRST(α) ∩ FIRST(β) = {} In particular α  * ℇ and β  * ℇ is not possible –If β  * ℇ then FIRST(α) ∩ FOLLOW(A) = {} 43

44 Problem: Non LL(k) grammars 44

45 LL(k) grammars An LL(k) grammar G can be derived via: –Top-down derivation –Scanning the input from left to right (L) –Producing the leftmost derivation (L) –With lookahead of k tokens (k) –G is not ambiguous –G is not left-recursive A language is said to be LL(k) when it has an LL(k) grammar 45

46 Non LL grammar: Common prefix FIRST(term) = { ID } FIRST(indexed_elem) = { ID } FIRST/FIRST conflict term  ID | indexed_elem indexed_elem  ID [ expr ] 46

47 Solution: left factoring Rewrite the grammar to be in LL(1) Intuition: just like factoring x*y + x*z into x*(y+z) term  ID | indexed_elem indexed_elem  ID [ expr ] term  ID after_ID After_ID  [ expr ] |  47

48 S  if E then S else S | if E then S | T S  if E then S S’ | T S’  else S |  Left factoring – another example 48

49 FIRST(S) = { a }FOLLOW(S) = { $ } FIRST(X) = { a,  }FOLLOW(X) = { a } FIRST/FOLLOW conflict S  X a b X  a |  49 Non LL grammar: Problematic null productions T[X,a] = α if a ∈ FIRST(a) T[X, a ] =  if ℇ ∈ FIRST( ℇ ) and a ∈ FOLLOW(X) t can also be $ T is not well defined  the grammar is not LL(1)

50 Solution: substitution S  A a b A  a |  S  a a b | a b Substitute A in S S  a after_A after_A  a b | b Left factoring 50

51 Non LL grammar: Left-recursion Left recursion cannot be handled with a bounded lookahead What can we do? E  E - term | term 51

52 Solution: Left recursion removal L(G 1 ) = β, βα, βαα, βααα, … L(G 2 ) = same N  Nα | β N  βN’ N’  αN’ |  G1G1 G2G2 p. 130 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition 52

53 Solution: Left recursion removal L(G 1 ) = β, βα, βαα, βααα, … L(G 2 ) = same N  Nα | β N  βN’ N’  αN’ |  G1G1 G2G2 E  E - term | term E  term TE | term TE  - term TE |  p. 130 Can be done algorithmically. Problem: grammar becomes mangled beyond recognition 53

54 LL(k) Parsers Recursive Descent –Manual construction –Uses recursion Wanted –A parser that can be generated automatically –Does not use recursion 54

55 Pushdown automata uses Prediction stack Input stream Transition table –nonterminals x tokens -> production alternative –Entries indexed by nonterminal N and token t Entry contains the alternative of N that must be predicated when current input starts with t LL(k) parsing via PDA 55

56 LL(k) parsing via PDA: Moves Prediction top(prediction stack) = N –Pop N –If table[N, current] = α, push α to prediction stack, otherwise – syntax error Match top(prediction stack) = t –If (t == current) pop prediction stack, otherwise syntax error 56

57 LL(k) parsing via PDA: Termination Parsing terminates when prediction stack is empty –If input is empty at that point, success, otherwise, syntax error 57

58 ()nottruefalseandorxor$ E2311 LIT45 OP678 (1) E → LIT (2) E → ( E OP E ) (3) E → not E (4) LIT → true (5) LIT → false (6) OP → and (7) OP → or (8) OP → xor Nonterminals Input tokens Which rule should be used Example transition table 58

59 Model of non-recursive predictive parser Predictive Parsing program Parsing Table X Y Z $ Stack $b+a Output 59

60 abc AA  aAbA  c A  aAb | caacbb$ Input suffixStack contentMove aacbb$A$predict(A,a) = A  aAb aacbb$aAb$match(a,a) acbb$Ab$predict(A,a) = A  aAb acbb$aAbb$match(a,a) cbb$Abb$predict(A,c) = A  c cbb$ match(c,c) bb$ match(b,b) b$ match(b,b) $$match($,$) – success Running parser example 60

61 Erorrs 61

62 Handling Syntax Errors Report and locate the error Diagnose the error Correct the error Recover from the error in order to discover more errors –without reporting too many “strange” errors 62

63 Error Diagnosis Line number –may be far from the actual error The current token The expected tokens Parser configuration 63

64 Error Recovery Becomes less important in interactive environments Example heuristics: –Search for a semi-column and ignore the statement –Try to “replace” tokens for common errors –Refrain from reporting 3 subsequent errors Globally optimal solutions –For every input w, find a valid program w’ with a “minimal-distance” from w 64

65 abc AA  aAbA  c A  aAb | c abcbb$ Input suffixStack contentMove abcbb$A$predict(A,a) = A  aAb abcbb$aAb$match(a,a) bcbb$Ab$predict(A,b) = ERROR Illegal input example 65

66 Error handling in LL parsers Now what? –Predict b S anyway “missing token b inserted in line XXX” S  a c | b S c$ abc SS  a cS  b S Input suffixStack contentMove c$S$predict(S,c) = ERROR 66

67 Error handling in LL parsers Result: infinite loop S  a c | b S c$ abc SS  a cS  b S Input suffixStack contentMove bc$S$predict(b,c) = S  bS bc$bS$match(b,b) c$S$Looks familiar? 67

68 Error handling and recovery x = a * (p+q * ( -b * (r-s); Where should we report the error? The valid prefix property 68

69 The Valid Prefix Property For every prefix tokens –t 1, t 2, …, t i that the parser identifies as legal: there exists tokens t i+1, t i+2, …, t n such that t 1, t 2, …, t n is a syntactically valid program If every token is considered as single character: –For every prefix word u that the parser identifies as legal there exists w such that u.w is a valid program 69

70 Recovery is tricky Heuristics for dropping tokens, skipping to semicolon, etc. 70

71 Building the Parse Tree 71

72 Adding semantic actions Can add an action to perform on each production rule Can build the parse tree –Every function returns an object of type Node –Every Node maintains a list of children –Function calls can add new children 72

73 Building the parse tree Node E() { result = new Node(); result.name = “E”; if (current  {TRUE, FALSE}) // E  LIT result.addChild(LIT()); else if (current == LPAREN) // E  ( E OP E ) result.addChild(match(LPAREN)); result.addChild(E()); result.addChild(OP()); result.addChild(E()); result.addChild(match(RPAREN)); else if (current == NOT) // E  not E result.addChild(match(NOT)); result.addChild(E()); else error; return result; } 73

74 static int Parse_Expression(Expression **expr_p) { Expression *expr = *expr_p = new_expression() ; /* try to parse a digit */ if (Token.class == DIGIT) { expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token(); return 1; } /* try parse parenthesized expression */ if (Token.class == ‘(‘) { expr->type=‘P’; get_next_token(); if (!Parse_Expression(&expr->left)) Error(“missing expression”); if (!Parse_Operator(&expr->oper)) Error(“missing operator”); if (Token.class != ‘)’) Error(“missing )”); get_next_token(); return 1; } return 0; } 74 Parser for Fully Parenthesized Expers

75 Bottom-up parsing 75

76 Intuition: Bottom-Up Parsing Begin with the user's program Guess parse (sub)trees Check if root is the start symbol 76

77 +*321 77 Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

78 +*321 F 78 Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E )

79 Bottom-up parsing Unambiguous grammar E  E * T E  T T  T + F T  F F  id F  num F  ( E ) +*321 FF T F T 79

80 Top-Down vs Bottom-Up Top-down (predict match/scan-complete ) 80 to be read… already read… A A ab Aabc aacbb$ A  aAb|c

81 Top-Down vs Bottom-Up Top-down (predict match/scan-complete ) 81  Bottom-up (shift reduce) to be read… already read… A A ab Aabc A ab A c ab A aacbb$ A  aAb|c

82 Bottom-up parsing: LR(k) Grammars A grammar is in the class LR(K) when it can be derived via: –Bottom-up derivation –Scanning the input from left to right (L) –Producing the rightmost derivation (R) –With lookahead of k tokens (k) 82

83 Bottom-up parsing: LR(k) Grammars A language is said to be LR(k) if it has an LR(k) grammar The simplest case is LR(0), which we will discuss 83

84 Terminology: Reductions & Handles The opposite of derivation is called reduction –Let A  α be a production rule –Derivation: βAµ  βαµ –Reduction: βαµ  βAµ A handle is the reduced substring –α is the handles for βαµ 84

85 Goal: Reduce the Input to the Start Symbol Example: 0 + 0 * 1 B + 0 * 1 E + 0 * 1 E + B * 1 E * 1 E * B E E → E * B | E + B | B B → 0 | 1 Go over the input so far, and upon seeing a right-hand side of a rule, “invoke” the rule and replace the right-hand side with the left-hand side (reduce) E BE * B 1 0 B 0 E + 85

86 Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules. 86

87 Use Shift & Reduce In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules. E BE * B 1 0 StackInputaction 0+0*1$shift 0 +0*1$reduce B +0*1$reduce E +0*1$shift E+0 *1$reduce E+B *1$reduce E *1$shift E*1 $reduce E*B $reduce E $accept B 0 E + 87 Example: “0+0*1” E → E * B | E + B | B B → 0 | 1

88 Stack Parser Input Output Action Table Goto table 88 )x*)7+23(( RPIdOPRPNumOPNumLP token stream Op(*) Id(b) Num(23)Num(7) Op(+) How does the parser know what to do?

89 A state will keep the info gathered on handle(s) –A state in the “control” of the PDA –Also (part of) the stack alpha beit A table will tell it “what to do” based on current state and next token –The transition function of the PDA A stack will records the “nesting level” –Prefixes of handles 89 Set of LR(0) items

90 LR item 90 N  α  β Already matched To be matched Input Hypothesis about αβ being a possible handle, so far we’ve matched α, expecting to see β

91 Example: LR(0) Items All items can be obtained by placing a dot at every position for every production: 91 (1) S  E $ (2) E  T (3) E  E + T (4) T  id (5) T  ( E ) 1: S   E$ 2: S  E  $ 3: S  E $  4: E   T 5: E  T  6: E   E + T 7: E  E  + T 8: E  E +  T 9: E  E + T  10: T   i 11: T  i  12: T   (E) 13: T  (  E) 14: T  (E  ) 15: T  (E)  Grammar LR(0) items

92 92 N  α  β Shift Item N  αβ  Reduce Item

93 States and LR(0) Items The state will “remember” the potential derivation rule s given the part that was already identified For example, if we have already identified E then the state will remember the two alternatives: (1) E → E * B, (2) E → E + B Actually, we will also remember where we are in each of them: (1) E → E ● * B, (2) E → E ● + B A derivation rule with a location marker is called LR(0) item The state is actually a set of LR(0) items. E.g., q 13 = { E → E ● * B, E → E ● + B} E → E * B | E + B | B B → 0 | 1 93

94 Intuition Gather input token by token until we find a right-hand side of a rule and then replace it with the non-terminal on the left hand side –Going over a token and remembering it in the stack is a shift Each shift moves to a state that remembers what we’ve seen so far –A reduce replaces a string in the stack with the non-terminal that derives it 94

95 Model of an LR parser 95 LR Parser 0 T 2 + 7 id 5 Stack $id+ + Output state symbol gotoaction Input Terminals and Non-terminals

96 LR parser stack Sequence made of state, symbol pairs For instance a possible stack for the grammar S  E $ E  T E  E + T T  id T  ( E ) could be: 0 T 2 + 7 id 5 96 Stack grows this way

97 Form of LR parsing table 97 state terminals non-terminals Shift/Reduce actionsGoto part 0 1...... snsn rkrk shift state nreduce by rule k gmgm goto state m acc accept error

98 LR parser table example 98 gotoactionSTATE TE$)(+id g6g1s7s50 accs31 2 g4s7s53 r3 4 r4 5 r2 6 g6g8s7s57 s9s38 r5 9

99 Shift move 99 LR Parsing program q...... Stack $…a… Output gotoaction Input If action[q, a] = sn

100 Result of shift 100 LR Parsing program n a q...... Stack $…a… Output gotoaction Input If action[q, a] = sn

101 Reduce move 101 If action[qn, a] = rk Production: (k) A  β If β= σ1… σn Top of stack looks like q1 σ1… qn σn goto[q, A] = qm LR Parsing program qn … q … Stack $…a… Output gotoaction Input 2*|β|

102 Result of reduce move 102 LR Parsing program Stack Output gotoaction 2*|β| qm A q … $…a… Input If action[qn, a] = rk Production: (k) A  β If β= σ1… σn Top of stack looks like q1 σ1… qn σn goto[q, A] = qm Last slide

103 Accept move 103 LR Parsing program q...... Stack $a… Output gotoaction Input If action[q, a] = accept parsing completed

104 Error move 104 LR Parsing program q...... Stack $…a… Output gotoaction Input If action[q, a] = error (usually empty) parsing discovered a syntactic error

105 Example 105 Z  E $ E  T | E + T T  i | ( E )

106 Example: parsing with LR items 106 Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ i+i$ Why do we need these additional LR items? Where do they come from? What do they mean?

107  -closure Given a set S of LR(0) items If P  α  Nβ is in S then for each rule N  in the grammar S must also contain N    107  -closure ({Z   E $}) = E   T, E   E + T, T   i, T   ( E ) } { Z   E $, Z  E $ E  T | E + T T  i | ( E )

108 108 i+i$ E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) Items denote possible future handles Remember position from which we’re trying to reduce Example: parsing with LR items

109 109 T  i  Reduce item! i+i$ E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) Match items with current token Example: parsing with LR items

110 110 i E  T  Reduce item! T+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ Example: parsing with LR items

111 111 T E  T  Reduce item! i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ Example: parsing with LR items

112 112 T i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ E  E  + T Z  E  $ Example: parsing with LR items

113 113 T i E+i$ Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ E  E  + T Z  E  $ E  E+  T T   i T   ( E ) Example: parsing with LR items

114 114 E  E  + T Z  E  $ E  E+  T T   i T   ( E ) E+T$ i Z  E $ E  T | E + T T  i | ( E ) E   T E   E + T T   i T   ( E ) Z   E $ T i Example: parsing with LR items

115 115 E   T E   E + T T   i T   ( E ) Z   E $ Z  E $ E  T | E + T T  i | ( E ) E+T T i E  E  + T Z  E  $ E  E+  T T   i T   ( E ) i E  E+T  $ Reduce item! Example: parsing with LR items

116 116 E   T E   E + T T   i T   ( E ) Z   E $ E$ E T i +T Z  E  $ E  E  + T i Z  E $ E  T | E + T T  i | ( E ) Example: parsing with LR items

117 117 E   T E   E + T T   i T   ( E ) Z   E $ E$ E T i +T Z  E  $ E  E  + T Z  E$  i Z  E $ E  T | E + T T  i | ( E ) Example: parsing with LR items Reduce item!

118 118 E   T E   E + T T   i T   ( E ) Z   E $ Z E T i +T Z  E  $ E  E  + T Z  E$  Reduce item! E$ i Z  E $ E  T | E + T T  i | ( E ) Example: parsing with LR items

119 GOTO/ACTION tables 119 Statei+()$ETaction q0q5q7q1q6shift q1q3q2shift q2Z  E$ q3q5q7q4Shift q4E  E+T q5TiTi q6ETET q7q5q7q8q6shift q8q3q9shift q9TETE GOTO Table ACTION Table empty – error move

120 LR(0) parser tables Two types of rows: –Shift row – tells which state to GOTO for current token –Reduce row – tells which rule to reduce (independent of current token) GOTO entries are blank 120

121 LR parser data structures Input – remainder of text to be processed Stack – sequence of pairs N, qi –N – symbol (terminal or non-terminal) –qi – state at which decisions are made Initial stack contains q0 121 +i$ input q0 stack iq5

122 LR(0) pushdown automaton Two moves: shift and reduce Shift move –Remove first token from input –Push it on the stack –Compute next state based on GOTO table –Push new state on the stack –If new state is error – report error 122 i+i$ input q0 stack +i$ input q0 stack shift iq5 Statei+()$ETaction q0q5q7q1q6shift Stack grows this way

123 LR(0) pushdown automaton Reduce move –Using a rule N  α –Symbols in α and their following states are removed from stack –New state computed based on GOTO table (using top of stack, before pushing N) –N is pushed on the stack –New state pushed on top of N 123 +i$ input q0 stack iq5 Reduce T  i +i$ input q0 stack Tq6 Statei+()$ETaction q0q5q7q1q6shift Stack grows this way

124 124


Download ppt "Compilation 0368-3133 Lecture 4 Syntax Analysis Noam Rinetzky 1 Zijian Xu, Hong Chen, Song-Chun Zhu and Jiebo Luo, "A hierarchical compositional model."

Similar presentations


Ads by Google