Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Similar presentations


Presentation on theme: "Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators."— Presentation transcript:

1 Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators

2 Context-Free Grammars Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A  , where A   and   (  u  ).* Left-most derivation: At each step, the left-most nonterminal is re-written. Right-most derivation: At each step, the right-most nonterminal is re-written.

3

4 Derivation Trees Derivation trees: Describe re-writes, independently of the order (left-most or right-most). Each tree branch matches a production rule in the grammar.

5

6 Derivation Trees (cont’d) Notes: 1)Leaves are terminals. 2)Bottom contour is the sentence. 3)Left recursion causes left branching. 4)Right recursion causes right branching.

7 Goals of Parsing Examine input string, determine whether it's legal. Equivalent to building derivation tree. Added benefit: tree embodies syntactic structure of input. Therefore, tree should be unique.

8 Grammar Ambiguity Definition: A CFG is ambiguous if there exist two different right-most (or left- most, but not both) derivations for some sentence z. (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

9 Ambiguous Grammars Classic ambiguities: – Simultaneous left/right recursion: E → E + E –Dangling else problem: S → if E then S → if E then S else S

10

11 Grammar Reduction What language does this grammar generate? S → aD → EDBC A → BCDEFE → CBA B → ASDFAF → S C → DDCF L(G) = {a} Problem: Many nonterminals (and productions) cannot be used in the generation of any sentence.

12 Grammar Reduction Definition: A CFG is reduced iff for all A  Ф, a) S =>* α Aβ, for some α, β  V*, (we say A is generable), and b) A =>* z, for some z  Σ* (we say A is terminable) G is reduced iff every nonterminal A is both generable and terminable.

13 Grammar Reduction Example:S → BBA → aA B → bB → a B is not terminable, since B =>* z, for any z  Σ*. A is not generable, since S =>* α Aβ, for any α,βV*.

14 Grammar Reduction To find out which nonterminals are generable: 1.Build the graph (Ф, δ), where (A, B)  δ iff A → α Bβ is a production. 2.Check that all nodes are reachable from S.

15 Grammar Reduction Example: S → BBA → aA B → bB → a A is not reachable from S, so A is not generable. SB A

16 Grammar Reduction Algorithmically, Generable := {S} while(Generable changes) do for each A →  Bβ do if A  Generable then Generable := Generable U {B} od { Now, Generable contains the nonterminals that are generable }

17 Grammar Reduction To find out which nonterminals are terminable: 1.Build the graph (2 Ф, δ), where (N, N U {A})  δ iff A → X 1 … X n is a production, and for all i, either X i  Σ or X i  N. 2.Check that the node Ф (set of all nonterminals) is reachable from node ø (empty set).

18 Grammar Reduction Example:S → BBA → aA B → bB → a {A, S, B} not reachable from ø ! Only {A} is reachable from ø. Thus S and B are not terminable. {A,B}{B} {B,S} {S} ø {A} {A,S}{A,S,B}

19 Grammar Reduction Algorithmically, Terminable := { }; while (Terminable changes) do for each A → X 1 …X n do if every nonterminal among the X’s is in Terminable then Terminable := Terminable U {A} od { Now, Terminable contains the nonterminals that are terminable. }

20 Grammar Reduction Reducing a grammar: 1.Find all generable nonterminals. 2.Find all terminable nonterminals. 3.Remove any production A → X 1 … X n if either a) A is not generable b) any X i is not terminable 4.If the new grammar is not reduced, repeat the process.

21 Grammar Reduction Example:E → E + TF → not F → TQ → P / Q T → F * TP → (E) → P → i Generable: {E, T, F, P}, not Generable: {Q} Terminable: {P, T, E}, not Terminable: {F, Q} So, eliminate every production for Q, and every production whose right-part contains either F or Q.

22 Grammar Reduction New Grammar: E → E + T → T T → P P → (E) → i Generable: {E, T, P}Now, grammar Terminable: {P, T, E}is reduced.

23 Operator Precedence and Associativity Let’s build a CFG for expressions consisting of: elementary identifier i. + and - (binary ops) have lowest precedence, and are left associative. * and / (binary ops) have middle precedence, and are right associative. + and - (unary ops) have highest precedence, and are right associative.

24 Sample Grammar for Expressions E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence). T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence). F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence). P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).

25 Operator Precedence and Associativity (cont’d) Operator Precedence: –The lower in the grammar, the higher the precedence. Operator Associativity: –left recursion in the grammar means left associativity of the operator, and causes left branching in the tree. –right recursion in the grammar means right associativity of the operator, and causes right branching in the tree.

26 Building Derivation Trees Sample Input : - + i - i * ( i + i ) / i + i (Human) derivation tree construction: Bottom-up. On each pass, scan entire expression, process operators with highest precedence (parentheses are highest). Lowest precedence operators are last, at the top of tree.

27

28 Operator Precedence and Associativity Exercise: Write a grammar for expressions that consists of: –elementary identifier ‘i’. –‘&’, ‘¢’, ‘*’ are next (left associative) –‘%’, ‘#’, are next (right associative) –‘@’, ‘!’ have highest precedence (left associative.) –Parentheses override precedence and associativity.

29 Precedence and Associativity Grammar:E0 → E0 & E1 → E0 ¢ E1 → E0 * E1 → E1 E1 → E2 % E1 → E2 # E1 → E2 E2 → E2 @ E3 → E2 ! E3 → E3 E3 → (E0) → i

30 Operator Precedence and Associativity Example: Construct the derivation tree for: i & i @ i # i ¢ ( i * i & i ! ) % ( i & i ) # i @ i Easier to construct the tree from the leaves to the root. On each pass, scan the entire expression, and process first the operators with highest precedence. Leave operators with lowest precedence for last.

31 Derivation Tree

32 Transduction Grammars Definition: A transduction grammar (a.k.a. syntax- directed translation scheme) is like a CFG, except for the following generalization: Each production is a triple (A, β, ω)  Ф x V* x V*, called a translation rule, denoted A → β => ω, where A is the left part, β is the right part, and ω is the translation part.

33 Sample Transduction Grammar Translation of infix to postfix expressions. E → E + T=> E T + → T=> T T → P * T=> P T * → P=> P P → (E)=> E Note: ()’s discarded → i=> i The translation part describes how the output is generated, as the input is derived.

34 Sample Transduction Grammar We keep track of a pair (, β), where  and β are the sentential forms of the input and output. ( E, E ) => ( E + T, E T + ) => ( T + T, T T + ) => ( P + T, P T + ) => ( i + T, i T + ) => ( i + P * T, i P T * + ) => ( i + i * T, i i T * + ) => ( i + i * i, i i i * + )

35 String to Tree Transduction Transduction to Abstract Syntax Trees Notation: denotes String-to-tree transduction grammar: E → E + T=> → T=> T T → P * T=> → P=> P P → (E)=> E → i=> i t 1 … t n N

36 String to Tree Transduction Example: (E, E) => (E + T, ) => (T + T, ) => (P + T, ) => (i + T, ) => (i + P * T, >) => (i + i * T, >) => (i + i * P, >) => (i + i * i, >) i + i * i

37 String to Tree Transduction Definition: A transduction grammar is simple if for every rule A →  => β, the sequence of nonterminals appearing in  is identical to the sequence appearing in β. Example:E → E + T=> → T=> T T → P * T=> → P=> P P → (E)=> E → i=> i

38 String to Tree Transduction For notational convenience, we dispense with both the nonterminals and the tree notation in the translation parts, leaving E → E + T=> + → T T → P * T=> * → P P → (E) → i=> i Look familiar ?

39 Abstract Syntax Trees AST is a condensed version of the derivation tree. No noise (intermediate nodes). Result of simple String-to-tree transduction grammar. Rules of the form A → ω => 's'. Build 's' tree node, with one child per tree from each nonterminal in ω. We transduce from vocabulary of input symbols (which appear in ω), to vocabulary of tree node names.

40 Sample AST Input :: - + i - i * ( i + i ) / i + i DTG: AST:

41 The Game of Syntactic Dominoes The grammar: E → E+TT → P*TP → (E) → T → P → i The playing pieces: An arbitrary supply of each piece (one per grammar rule). The game board: Start domino at the top. Bottom dominoes are the "input".

42

43 Parsing: The Game of Syntactic Dominoes (cont’d) Game rules: –Add game pieces to the board. –Match the flat parts and the symbols. –Lines are infinitely elastic. Object of the game: –Connect start domino with the input dominoes. –Leave no unmatched flat parts.

44 Parsing Strategies Same as for the game of syntactic dominoes. –“Top-down” parsing: start at the start symbol, work toward the input string. –“Bottom-up” parsing: start at the input string, work towards the goal symbol. In either strategy, can process the input left-to-right or right-to-left 

45 Top-Down Parsing Attempt a left-most derivation, by predicting the re-write that will match the remaining input. Use a string (a stack, really) from which the input can be derived.

46 Top-Down Parsing Start with S on the stack. At every step, two alternatives: 1) (the stack) begins with a terminal t. Match t against the first input symbol. 2) begins with a nonterminal A. Consult an OPF (omniscient parsing function) to determine which production for A would lead to a match with the first symbol of the input. The OPF does the “predicting” in such a predictive parser.

47

48 Classical Top-Down Parsing Algorithm Push (Stack, S); while not Empty (Stack) do if Top(Stack)  then if Top(Stack) = Head(input) then input := tail(input) Pop(Stack) else error (Stack, input) else P:= OPF (Stack, input) Push (Pop(Stack), RHS(P)) od

49

50 Top-Down Parsing (cont’d) Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1). We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input. Storage requirements: O(n 2 ), where n is the size of the grammar vocabulary (a few hundred).

51 Top-Down Parsing OPF (A, t) = A → ω if 1. ω =>* t, for some . 2. ω =>* ε, and S =>* At, for some , , where  =>* ε. ω t … or A …

52 Top-Down Parsing ExampleS → AB → b (illustrating 1):A → BAdC → c → C OPF bcd BB → bB → bB → b CC → cC → cC → c SS → AS → AS → A AA → BAdA → C ??? OPF (A, b) = A → BAd because BAd =>* bAd OPF (A, c) = A → C because C =>* c i.e., B begins with b, and C begins with c. Tan entries are optional. So is the ??? entry.

53 Top-Down Parsing Example (illustrating 2): S → AA → bAd → OPF bd  SS → A S → A AA → bAdA → A → OPF (S, b) = S → A, because A =>* bAd OPF (S, d) = --------, because S =>* α S  dβ OPF (S,  ) = S → A, because S  is legal OPF (A, b) = A → bAd, because A =>* bAd OPF (A, d) = A →, because S =>* bAd OPF (A,  ) = A →, because S  =>*A 

54 Top-Down Parsing Definition: First (A) = {t / A =>* t, for some } Follow (A) = {t / S =>* Atβ, for some , β} Computing First sets: 1.Build graph (Ф, δ), where (A,B)  δ if B → A,  =>* ε (First(A)  First(B)) 2.Attach to each node an empty set of terminals. 3.Add t to the set for A if A → A,  =>* ε. 4.Propagate the elements of the sets along the edges of the graph.

55 Top-Down Parsing Example:S → ABCDA → CDAC → A B → BC → aD → AC → b → Nullable = {A, C, D} SB AC D {a, b} {a} {b} White: after step 3 Tan: after step 4 {a}

56 Top-Down Parsing Computing Follow Sets: 1.Build graph (Ф, δ), where (A,B)  δ if A → B,  =>* ε. Follow(A)  Follow(B), because any symbol X that follows A, also follows B. AX B  α ε

57 Top-Down Parsing 2.Attach to each node an empty set of terminals. Add  to the set for the start symbol. 3.Add First(X) to the set for A (i.e. Follow(A)) if B → AX,  =>* ε. 4.Propagate the elements of the sets along the edges of the graph.

58 Top-Down Parsing Example:S → ABCDA → CDAC → A B → BC → aD → AC → b → Nullable = {A, C, D} First(S) = {a, b} First(C) = {a} First(A) = {a} First(D) = {a} First(B) = {b} SB AC D } {a,b, {a ┴ ┴ ┴ { }, },b, } ┴ {a,b, } ┴ White: after step 3 Tan: after step 4

59 Top-Down Parsing So, Follow(S) = {} Follow(A) = Follow(C) = Follow(D) = {a, b, } Follow(B) = {a, }

60 Top-Down Parsing Back to Parsing … We want OPF(A, t) = A → ω if either 1.t  First(ω), i.e. ω =>* tβ 2.ω =>* ε and t  Follow(A), i.e. S =>* A => *Atβ ω t β A α t β A α ω ε

61 Top-Down Parsing Definition: Select (A → ω) = First(ω) U if ω =>* ε then Follow(A) else ø So PT(A, t) = A → ω if t  Select(A → ω) “Parse Table”, rather than OPF, because it isn’t omniscient.

62 Top-Down Parsing Example:First (S) = {a, b}Follow (S) = { } First (A) = {a}Follow(A) = {a, b, } First (B) = {b}Follow(B) = {a, } First (C) = {a}Follow (C) = {a, b, } First (D) = {a}Follow(D) = {a, b, } Grammar Selects sets S → ABCD{a, b} B → BC{b} → b{b} A → CDA{a, b, } → a{a} → {a, b, } C → A{a, b, } D → AC{a, b, } not disjointnot pair-wise disjoint Grammar is not LL(1)

63 Top-Down Parsing a b ┴ SS → ABCD S → ABCD AA → CDA, A → a, A → A → CDA, A → A → CDA,A → B B → BC, B → b CC → A C → A C → A DD → AC D → AC D → AC Non LL(1) grammar: multiple entries in PT. S → ABCD {a, b} C → A {a, b, } B → BC {b} D → AC {a, b, } → b {b} A → CDA {a, b, } → a {a} → {a, b, }

64 LL(1) Grammars Definition: A CFG G is LL(1) ( Left-to-right, Left-most, (1)-symbol lookahead) iff for all A Ф, and for all productions A → , A →  with   , Select (A → ) ∩ Select (A → ) =  Previous example: grammar is not LL(1). More later on what do to about it.

65 Sample LL(1) Grammar S → A{b,} A → bAd{b} → {d, } Disjoint! Grammar is LL(1) ! db  SS → A AA → A → bAdA → One production per entry.

66 Example Build the LL(1) parse table for the following grammar. S → begin SL end{begin} → id := E;{id} SL → SL S{begin,id} → S{begin,id} E → E+T{(, id} → T{(, id} T → P*T{(, id} → P{(, id} P → (E) {(} → id {id} * * * * - not LL(1)

67

68 Example (cont’d) Lemma: Left recursion always produces a non-LL(1) grammar (e.g., SL, E above) Proof: Consider A → A  First (  ) or Follow (A) →  First (  ) Follow (A)

69 Problems with our Grammar 1.SL is left recursive. 2.E is left recursive. 3.T → P * T both begin with the same → P sequence of symbols (P).

70 Solution to Problem 3 Change: T → P * T { (, id } → P { (, id } to: T → P X { (, id } X → * T { * } → { +, ;, ) } Follow(X) Follow(T) due to T → P X Follow(E) due to E → E+T, E → T = { +, ;, ) } due to E → E+T, S → id := E ; and P → (E) Disjoint!

71 Solution to Problem 3 (cont’d) In general, change A →  1 →  2... →  n to A →  X X →  1... →  n Hopefully all the  ’s begin with different symbols

72 Solution to Problems 1 and 2 We want (…((( T + T) + T) + T)…) Instead, (T) (+T) (+T) … (+T) Change: E → E + T { (, id } → T { (, id } To: E → T Y { (, id } Y → + T Y { + } → { ;, ) } Follow(Y)  Follow(E) = { ;, ) } No longer contains ‘+’, because we eliminated the production E → E + T

73 Solution to Problems 1 and 2 (cont’d) In general, Change: A → A  1 A →  1... → A  n →  m to: A →  1 X X →  1 X... →  m X →  n X →

74 Solution to Problems 1 and 2 (cont’d) In our example, Change: SL → SL S{ begin, id } → S{ begin, id } To: SL → S Z{ begin, id } Z → S Z{ begin, id } → { end }

75 Modified Grammar S → begin SL end{begin} → id := E ;{id} SL → S Z{begin,id} Z → S Z{begin,id} → {end} E → T Y (,id} Y → + T Y {+} → {;,)} T → P X{(,id} X → * T {*} → {;,+,)} P → (E) {(} → id {id} Disjoint. Grammar is LL(1)

76

77

78 Recursive Descent Parsing Top-down parsing strategy, suitable for LL(1) grammars. One procedure per nonterminal. Contents of stack embedded in recursive call sequence. Each procedure “commits” to one production, based on the next input symbol, and the select sets. Good technique for hand-written parsers.

79 Sample Recursive Descent Parser proc S; {S → begin SL end → id := E; } case Next_Token of T_begin : Read(T_begin); SL; Read (T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError; end end; “Read (T_X)” verifies that the upcoming token is X, and consumes it. “Next_Token” is the upcoming token.

80 Sample Recursive Descent Parser proc SL; {SL → SZ} S; Z; end; proc E; {E → TY} T; Y; end; Technically, should have insisted that Next Token be either T_begin or T_id, but S will do that anyway. Checking early would aid error recovery. // Ditto for T_( and T_id.

81 Sample Recursive Descent Parser proc Z;{Z → SZ → } case Next Token of T_begin, T_id: S;Z; T_end: ; otherwise Error; end end;

82 Sample Recursive Descent Parser proc Y; {Y → +TY → } if Next Token = T_+ then Read (T_+) T; Y; end; proc T; {T → PX} P; X end; Could have used a case statement Could have checked for T_( and T_id.

83 Sample Recursive Descent Parser proc X;{X → *T → } if Next Token = T_* then Read (T_*); T; end;

84 Sample Recursive Descent Parser proc P; {P → (E) → id } case Next Token of T_(: Read (T_(); E; Read (T_)); T_id: Read (T_id); otherwise Error; end end;

85 String-To-Tree Transduction Can obtain derivation or abstract syntax tree. Tree can be generated top-down, or bottom-up. We will show how to obtain 1.Derivation tree top-down 2.AST for the original grammar, bottom-up.

86 TD Generation of Derivation Tree In each procedure, and for each alternative, write out the appropriate production AS SOON AS IT IS KNOWN

87 TD Generation of Derivation Tree proc S; {S → begin SL end → id := E; } case Next_Token of T_begin :Write(S → begin SL end); Read(T_begin); SL; Read(T_end);

88 TD Generation of Derivation Tree T_id : Write(S → id :=E;); Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError end end;

89 TD Generation of Derivation Tree proc SL; {SL → SZ} Write(SL → SZ); S; Z; end; proc E; {E → TY} Write(E → TY); T; Y; end;

90 TD Generation of Derivation Tree proc Z; {Z → SZ → } case Next_Token of T_begin, T_id: Write(Z → SZ); S; Z; T_end: Write(Z → ); otherwise Error; end end;

91 TD Generation of Derivation Tree proc Y; {Y → +TY → } if Next_Token = T_+ then Write (Y → +TY); Read (T_+); T; Y; else Write (Y → ); end;

92 TD Generation of Derivation Tree proc T; {T → PX} Write (T → PX); P; X end; proc X;{X → *T → }

93 TD Generation of Derivation Tree if Next_Token = T_* then Write (X → *T); Read (T_*); T; else Write (X → ); end;

94 TD Generation of Derivation Tree proc P;{P → (E) → id } case Next_Token of T_(: Write (P → (E)); Read (T_(); E; Read (T_)); T_id: Write (P → id); Read (T_id); otherwise Error; end;

95 Notes The placement of the Write statements is obvious precisely because the grammar is LL(1). Can build the tree “as we go”, or have it built by a post-processor.

96 Example Input String: begin id := (id + id) * id; end Output: S → begin SL end SL → SZ S → id :=E; E → TY T → PX P → (E) E → TY T → PX P → id X → Y → +TY T → PX P → id X → Y → X → *T T → PX P → id X → Y → Z →

97

98 Bottom-up Generation of the Derivation Tree We could have placed the write statements at the END of each phrase, instead of the beginning. If we do, the tree will be generated bottom-up. In each procedure, and for each alternative, write out the production A   AFTER  is parsed.

99 BU Generation of the Derivation Tree proc S;{S → begin SL end → id := E; } case Next_Token of T_begin:Read (T_begin); SL; Read (T_end); Write (S → begin SL end); T_id:Read (T_id); Read (T_:=); E; Read (T_;); Write (S → id:=E;); otherwise Error; end;

100 BU Generation of the Derivation Tree proc SL; {SL → SZ} S; Z; Write(SL → SZ); end; proc E; {E → TY} T; Y; Write(E → TY); end;

101 BU Generation of the Derivation Tree proc Z; {Z → SZ → } case Next_Token of T_begin, T_id: S; Z; Write(Z → SZ); T_end: Write(Z → ); otherwise Error; end end;

102 BU Generation of the Derivation Tree proc Y; {Y → +TY → } if Next_Token = T_+ then Read (T_+); T; Y; Write (Y → +TY); else Write (Y → ); end;

103 BU Generation of the Derivation Tree proc T; {T → PX } P; X; Write (T → PX) end; proc X;{X → *T → } if Next_Token = T_* then Read (T_*); T; Write (X → *T); else Write (X → ); end

104 BU Generation of the Derivation Tree proc P;{P → (E) → id } case Next_Token of T_(: Read (T_(); E; Read (T_)); Write (P → (E)); T_id: Read (T_id); Write (P → id); otherwise Error; end;

105 Notes The placement of the Write statements is still obvious. The productions are emitted as procedures quit, not as they start. Productions emitted in reverse order, i.e., the sequence of productions must be used in reverse order to obtain a right-most derivation. Again, can built tree “as we go” (need stack of trees), or later.

106 Example Input String : begin id := (id + id) * id; end Output: P → id X → T → PX P → id X → T → PX Y → Y → +TY E → TY P → (E) P → id X → T → PX X → *T T → PX Y → E → TY S → id:=E; Z → SL → SZ S → begin SL end

107

108 Replacing Recursion with Iteration Not all the nonterminals are needed. The recursion in SL, X, Y and Z can be replaced with iteration.

109 Replacing Recursion with Iteration proc S; {S → begin SL end → id := E; case Next_Token of T_begin :Read(T_begin); repeat S; until Next_Token  {T_begin,T_id}; Read(T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError; end end; Replaces recursion on Z. Replaces call to SL. SL SL → S Z Z → S Z → }

110 Replacing Recursion with Iteration proc E; {E → TY Y → +TY → } T; while Next_Token = T_+ do Read (T_+); T; od end; Replaces recursion on Y.

111 Replacing Recursion with Iteration proc T; {T → PX X → *T → } P; if Next_Token = T_* thenRead (T_*); T; end; Replaces call to X.

112 Replacing Recursion with Iteration proc P;{P → (E) → id } case Next_Token of T_(: Read (T_(); E; Read (T_)); T_id: Read (T_id); otherwise Error; end end;

113 Construction of Derivation Tree for the Original Grammar (Bottom Up) proc S; { (1)S → begin SL end (2)S → begin SL end → id := E; → id := E; SL → SZ SL → SL S Z → SZ → S → } case Next_Token of T_begin :Read(T_begin); S; Write (SL → S); while Next_Token in {T_begin,T_id} do S; Write (SL → SL S); od Read(T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); Write (SL → id :=E;); otherwise Error; end end;

114 Construction of Derivation Tree for the Original Grammar (Bottom Up) proc E; {(1)E → TY (2) E → E+T Y → +TY → T → } T; Write (E → T); while Next_Token = T_+ do Read (T_+); T; Write (E → E+T); od end

115 Construction of Derivation Tree for the Original Grammar (Bottom Up) proc T; {(1)T → PX (2) T → P*T X → *T → P → } P; if Next_Token = T_* thenRead (T_*); T; Write (T → P*T) else Write (T → P); end;

116 Construction of Derivation Tree for the Original Grammar (Bottom Up) proc P;{(1)P → (E) (2)P → (E) → id → id } // SAME AS BEFORE end;

117 Example Input String : begin id := (id + id) * id; end Output : P → id T → P E → T P → id T → P E → E+T P → (E) P → id T → P T → P*T E → T S → id:=E; SL → S S → begin SL end

118

119 Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc S; { S → begin S+ end  'block' → id := E;  'assign' var N:integer; case Next_Token of T_begin :Read(T_begin); S; N:=1; while Next_Token in {T_begin,T_id} do S; N:=N+1; od Read(T_end); Build Tree ('block',N); T_id : Read(T_id); Read (T_:=); E; Read (T_;); Build Tree ('assign',2); otherwise Error end end; Assume this builds a node. Build Tree (‘x’,n) pops n trees from the stack, builds an ‘x’ node as their parent, and pushes the resulting tree.

120 Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc E; {E → E+T  '+' → T } T; while Next_Token = T_+ do Read (T_+) T; Build Tree ('+',2); od end; Left branching in tree!

121 Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc T; {T → P*T  '*' → P } P; if Next_Token = T_* thenRead (T_*) T; Build Tree ('*',2); end; Right branching in tree!

122 Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc P;{P → (E) → id } // SAME AS BEFORE, // i.e.,no trees built end;

123 Example Input String : begin id 1 := (id 2 + id 3 ) * id 4 ; end Sequence of events : id 1 id 2 id 3 id 4 BT( ' + ',2) BT( ' * ',2) BT( ' assign ',2) BT( ' block ',1)

124

125 Summary Bottom-up or top-down tree construction. Original or modified grammar. Derivation Tree or Abstract Syntax Tree. Technique of choice: –Top-down, recursive descent parser. –Bottom-up tree construction for the original grammar.

126 LR Parsing Procedures in the recursive descent code can be “annotated” with items, i.e. productions with a “dot” marker somewhere in the right-part. We can use the items to describe the operation of the recursive descent parser. There is an FSA that describes all possible calling sequences in the R.D. parser.

127 Recursive Descent Parser with items Example: proc E; {E →.E + T, E →.T} T; {E → E. + T,E → T.} while Next_Token = T_+ do {E → E. + T} Read(T_+);{E → E +.T } T;{E → E + T.} od {E → E + T.E → T.} end; T T + T

128 FSA Connecting Items The FSA is: M = (DP, V, , S’ →.S, { S’ → S.}) where DP is the set of all possible items (DP: dotted productions), and  is defined such that simulate a call to B simulate the execution of statement X, if X is a nonterminal, or Read(X), if X is a terminal. 1 A → α. B β B →. ω 2 A → α. X β A → X.β X 

129 FSA Connecting Items Example:E → E + TT → iS → E  → TT → (E) S →. E  S → E .S → E.  E →. T E → T. T →. iT → i. T →. (E)T → (E). T → (E.) T → (.E) E →.E + TE → E. + TE → E +. TE → E + T. ε ε ε ε E ε ε E+ ε E ε T ε ε ( i T ) ┴

130 FSA Connecting Items Need to run this machine with the aid of a stack, i.e. need to keep track of the recursive calling sequence. To “return” from A → ω., back up |ω| + 1 states, then advance on A. Problem with this machine: it is nondeterministic. No problem. Be happy. Transform it to a DFA !

131 Deterministic FSA Connecting Items THIS IS AN LR(0) AUTOMATON S →. E ┴ S → E. ┴ ┴ E →. T E → T. T →. i T → i.T →. (E) T → (E). T → (E.) T → (.E) E →.E + T E → E. + T E → E +. T E → E + T. E T ( i T ┴ E →.E + T T →.i E →.T T →.(E) E → E. + T T →.i T →.(E) i i E + ( T ) ( +

132 LR Parsing LR means “Left-to-Right, Right-most Derivation”. Need a stack of states to operate the parser. No look-ahead required, thus LR(0). DFA describes all possible positions in the R.D. parser’s code. Once the automaton is built, items can be discarded.

133 LR Parsing Operation of an LR parser Two moves: shift and reduce. Shift: Advance from current state on Next_Token, push new state on stack. Reduce: (on A → ω). Pop |ω| states from stack. Advance from new top state on A.

134 LR Parsing StackInputDerivation Tree 1 i + (i + i) i + ( i + i ) 14 + (i + i) 13 + (i + i) T 12 + (i + i) 127 (i + i) E 1275 i + i) 12754 + i) 12753 + i) T 12758 + i) 127587 i) E 1275874 ) 1275879 ) T 12758 ) E 12758 10 1279 T 12 E 126 ------ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ 13 245 678 910 E E i i i ( ( ) + + ┴ T T E → E+T T → (E) E → T ( T T→i

135 LR Parsing Table Representation of LR Parsers Two Tables: Action Table: indexed by state, and by terminal symbol. Contains all shift and reduced moves. GOTO Table: indexed by state, and by nonterminal symbol. Contains all transitions on nonterminals symbols.

136 LR Parsing Example: 13 245 678 910 E E i i i ( ( ) + + ┴ T T E → E+T E → T ( T 1S/4S/523 2S/7S/6 3R/E→T 4R/T→ i 5S/4S/583 6Accept 7S/4S/59 8S/7S/10 9R/ E →E+T R/ E →E+T R/ E →E+T R/ E →E+T R/ E →E+T 10R/ T → (E) R/ T → (E) R/ T → (E) R/ T → (E) R/ T → (E) ACTIONGOTO i + ( ) E T ┴ T → (E) T→i

137 LR Parsing Algorithm LR_Driver: Push(Start_State, S); while ACTION (Top(S), ) ≠ Accept do case ACTION (Top(S), Next_Token) of s/r: Read(Next_Token); Push(r, S) R/A → ω: Pop(S) |ω| times; Push(GOTO (Top(S), A), S); empty: Error; end end;

138 LR Parsing Direct Construction of the LR(0) Automaton PT(G) = Closure({S’ →.S  }) U {Closure(P) | P  Successors(P’), P’  PT(G)} Closure(P) = P U {A →.w | B → α.Aβ  Closure(P)} Successors(P) = {Nucleus(P, X) | X  V} Nucleus(P, X) = {A → α X.β | A → α.Xβ  P}

139 LR Parsing Direct Construction of Previous Automaton S →.E E →.E + T E →.T T →.i T →.(E) S → E. E → E. + T E → T. T → i. T → (.E) E →.E + T E →.T T →.i T →.(E) S → E . E → E +.T T →.i T →.(E) ┴ T → (E.) E → E. + T E → E + T. T → (E). 1 2 3 4 ┴ 5 6 7 8 9 10 E 2 E 2 T 3 i 4 ( 5 6 + 7 ┴ E 8 E 8 T 3 i 4 ( 5 T 9 i 4 ( 5 ) + 7

140 LR Parsing Notes: Two states are “equal” if their Nuclei are identical. This grammar is LR(0) because there are no “conflicts”. A conflict occurs when a state contains i – Both a final (dot-at-the-end) item and non-final one (shift-reduce), or ii – Two or more final items (reduce-reduce).

141 LR Parsing Example:E → E + TT → P * TP → i → T → PP → (E) S →.E E →.E + T E →.T T →.P * T T →.P P →.i P →.(E) S → E. E → E. + T E → T. T → P. * T T → P. P → (.E) E →.E + T E →.T T →.P * T T →.P P →. i P →.(E) S → E. E → E +.T T →.P * T T →.P P →.i P →.(E) ┴ T → P *.T T →.P * T T →.P P →.i P →.(E) P → (E.) E → E. + T E → E + T. 1 2 3 4 ┴ 5 6 7 8 9 10 E 2 E 2 T 3 P 4 7 + 8 ┴ E E T 3 P 4 P 4 T 11 P 4 ( 6 T 12 P 4 P 4 i 5 ( 6 * 9 P →i. i 5 ( 6 ┴ P 4 i 5 P 4 i 5 ( 6 ) 13 + 8 11 12 T → P * T. 13 P → (E). Grammar is not LR(0).

142 LR Parsing Solution: Use lookahead! In LL(1), lookahead is used at the beginning of the production. In LR(1), lookahead is used at the end of the production. We will use:SLR(1) – Simple LR(1) LALR(1) – Lookahead LR(1)

143 LR Parsing The Conflict appears in the ACTION table, as multiple entries. +*i()+*i() 1 S/5 S/6 2 S/8 S/7 3 R/E →T 4 5 R/P →i 6 S/5 S/6 7 Accept 8 S/5 S/6 9 S/5 S/6 10 S/8 S/13 11 R/E →E+T 12 R/T →P*T 13 R/P →(E) ACTIONACTION ┴ R/T→P S/9,R/T → P R/T→P

144 LR Parsing SLR(1): For each “inconsistent” state p, compute Follow(A) for each conflict production A → ω. Then place “R/A → ω” in the ACTION table, row p, column t, only if t  Follow(A). In our case, Follow(T)  Follow(E) = {+, ),  }. So, +*i()+*i() 4 R/T → P S/9 R/T → P R/T → P ┴ Grammar is SLR(1)

145 LR Parsing Example:S → aSb{a n b n / n > 0} → S’ →.S S →.aSb S →. S’ → S. S → a.Sb S →.aSb S →. S’ → S. 1 2 3 4 5 S 2 a 3 4 S 5 ┴ a 3 S → aS.b ┴ ┴ ┴ 6 S → aSb. b 6 12 S 4 35 a 6 ┴ S b S → a b  S 1S/3 R/S→ R/S→ R/S→ 2 2 S/4 3S/3 R/S→ R/S→ R/S→ 5 4Accept Accept Accept 5 S/6 6R/S→aSb Grammar is not LR(0) a S → aSb 4

146 LR Parsing SLR(1) Analysis: State 1: Follow(S) = {b, }. Since a  Follow(S), the shift/reduce conflict is resolved. State 3: Same story. Rows 1 and 3 become ab┴ S 1S/3R/S →R/S → 2 3S/3R/S →R/S → 5 All single entries. Grammar is SLR(1).

147 LR Parsing LALR(1) Grammars Consider the grammar:S → AbAaA → a → BaB → a LR(0) Automaton: 12 S 6 37 A 10 b a A → a 48 B a A → AbAa 911 A a 5 a S → Ba A → a B → a Grammar is not LR(0): reduce-reduce conflict. 

148 LR Parsing SLR(1) Analysis: (State 5) Follow(A) = {a, b} Follow(B) = {a} Conflict not resolved. Grammar is not SLR(1).

149 LR Parsing LALR(1) Technique: I. For each conflicting reduction A → ω at each inconsistent state q, find all nonterminal transitions (p i, A) such that II. Compute Follow(p i, A) (see below), for all i, and union together the results. The resulting set is the LALR(1) lookahead set for the A → ω reduction at q. p1p1 A q A pnpn A → ω ω ω

150 LR Parsing Computation of Follow(p, A): Ordinary Follow computation, except on a different grammar, called G’. G’ embodies both the structure of G, and the structure of the LR(0) automaton. To build G’: For each nonterminal transition (p, A) and for each production A → ω, there exists the following in the LR(0) automaton: For each such situation, G’ contains a production of the form: (p, A) → (p, w 1 )(p 2, w 2 )…(p n, w n ) p A q A → w 1 … w n … w2w2 wnwn w1w1

151 LR Parsing In our example: G: S → AbAa A → a → Ba B → a G’: (1, S) → (1, A)(3, b)(7, A)(9, a) → (1, B)(4, a) (1, A) → (1, a) (7, A) → (7, A) (1, B) → (1, a) these have split! 12 S 6 37 A 10 b a A → a 48 B a A → AbAa 911 A a 5 a S → Ba A → a B → a 

152 LR Parsing For the conflict in state 5, we need Follow(1, A) = {(3, b)} Follow(1, B) = {(4, a)}. Extract the terminal symbols from these to obtain ab┴ 5 R/B → a R/A → a Conflict is resolved. Grammar is LALR(1). 5 a A → a {b} B → a {a}

153 LR Parsing Example:S → bBbB → A → aBaA → c → acb LR(0) Automaton: Grammar is not LR(0). 12 S 5 8 11 b A → c 7 4 64 a S → bBb S → aBa 12 10 9 A A a c B B c B → A b 13 S → acb b G A → c State 10 is inconsistent (shift-reduce conflict). 

154 LR Parsing SLR(1) Analysis, state 10: Follow(A)  Follow(B) ={a, b}. Grammar is not SLR(1). LALR(1) Analysis: Need Follow(4, A). G’: (1,S) → (1, b)(3, B)(6, b) (3, B) → (3, A) → (1, a)(4, B)(9, a) (4, B) → (4, A) → (1, a)(4, c)(10, b) (3, B) → (3, c) (4, A) → (4, c) Thus Follow(4, A)  Follow(4, B) = {(9, a)}. The lookahead set is {a}. The grammar is LALR(1).

155 LR Parsing Example:S → aBdB → A → aDaA → a → bBaD → a → bDb LR(0) Automaton: 12 S 15 11 a 7 8 53 a S → aBb S → aDa 12 9 6 D A a B A a b 13 S → bBa b 4 b a S → bDb D 1014 B B → A A → a D → a G State 10 is inconsistent. Grammar is not LR(0).

156 LR Parsing SLR(1) Analysis: Follow(A) Follow(B) = {a, b} Follow(D) = {a, b} Grammar is not SLR(1). LALR(1) Analysis: G’: (1, 5) → (1, a)(3, B)(5, b)(3, B) → (3, A) → (1, a)(3, D)(7, a)(3, D) → (3, a) → (1, b)(4, B)(9, a)(3, A) → (3, a) → (1, b)(4, D)(10, b)(4, B) → (4, A) (4, D) → (4, a) (4, A) → (4, a) Need:Follow(3, A) U Follow(4, A) = {a, b} Follow(3, D) U Follow(4, D) = {a, b} The lookahead sets are not disjoint. The grammar is not LALR(1). с

157 LR Parsing Solution: Modify the LR(0) automaton, by splitting state 8 in two states. LR(1) Parsers: Construction similar to LR(0). Difference: lookahead symbol carried explicitly, as part of each item, e.g. A → α. β: t PT(G) = Closure({S’ →.S: }) U {Closure(P) | P  Successors(P’), P’  PT(G)} ┴

158 LR Parsing Closure(P) = P U {A →.w: t’ | B → α. Aβ: t  Closure(P), t’  First(βt)} Successors(P) = {Nucleus(P, X) | X  V} Nucleus(P, X) = {A → αX. Β: t | A → α. XΒ: t  P} Notes: New lookahead symbols appear during Closure. Lookahead symbols are carried from state to state.

159 LR Parsing Example:S → aBdB → A → aDaA → a → bBaD → a → bDb S’ →.S: S →.aBd: S →.aDa: S →.bBa: S →.bDb: 1 ┴ ┴ ┴ ┴ S a a b b 2 3 3 4 4 2 ┴ S 2 S → S.: 3 S → b.Ba: S → b.Db: B →.A: a A →.a: a D →.a: b ┴ ┴ B D A a a 5 6 7 8 8 4 S → b.Ba: S → b.Db: B →.A: a A →.a: a D →.a: b ┴ ┴ B D A a a 13 9 10 11 12 5 S → aB.b: ┴ b 6 S → aD.a: ┴ a 13 7 B → A.:b 8 A → a.: b D → a.: b 9 S → bB.a: 10 S → bd.b: 11 B → A.: a 12 A → a. : a D → a. : b 13 S → aBb.: 13 S → aDa.: S → bBa.: S → bDb.: ┴ ┴ ┴ ┴ ┴ ┴ a b 16 15 ┴

160 LR Parsing No conflicts. Grammar is LR(1). 12 S a 7 3 8 D A B D → a: a 5 A 6 13 a S → aBb: 14 b B → A: b A → a: b S → aDa: 10 4 9 D A B D → a: b 12 A 11 16 a S → bDb: 15 b A → a: a B → a: a S → bBa: S’ → S: b ┴ ┴ ┴ ┴

161 Summary of Parsing Top-Down Parsing Hand-written or Table Driven (LL(1)) S w β part of tree known stack part of tree left to predict part of tree known remaining input input already parsed α

162 Summary of Parsing Two moves: Terminal on stack: match input Nonterminal on stack: re-write according to Table. LL(1) Table β wDriver

163 Summary of Parsing Bottom-up Parsing Usually Table-Driven ( shift-reduce parsing, e.g. LR(0), SLR(1), LALR(1), LR(1) ). S w β unknown (left to predict) stack known unknown remaining input input already parsed α

164 Summary of Parsing Two moves: Shift if Action (Top(S), Next_Token) = S/N: Read Token and Push N. Reduce is ACTION(Top(S), Next_Token) = R/A → δ: Pop(|δ|) and Push(GOTO(Top(S), A), A). LL(1) Table β wDriver


Download ppt "Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators."

Similar presentations


Ads by Google