Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

AST Generation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 9.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Bottom-up parsing Goal of parser : build a derivation
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 3.
1 Compiler Construction Syntax Analysis Top-down parsing.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
More Parsing CPSC 388 Ellen Walker Hiram College.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Lecture 3: Parsing CS 540 George Mason University.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
LL(1) Parsing Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 7.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
1 Chapter 6 Bottom-Up Parsing. 2 Bottom-up Parsing A bottom-up parsing corresponds to the construction of a parse tree for an input tokens beginning at.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Announcements/Reading
Context-free grammars
Programming Languages Translator
Bottom-Up Parsing.
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Recursive Descent Parsing
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
COP4620 – Programming Language Translators Dr. Manuel E. Bermudez
Fall Compiler Principles Lecture 4: Parsing part 3
LALR Parsing Canonical sets of LR(1) items
Top-Down Parsing.
4 (c) parsing.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Recursive Descent Parsing
Top-down parsing Module 06.3 COP4020 – Programming Language Concepts Dr. Manuel E. Bermudez.
Parsing #2 Leonidas Fegaras.
Bottom-up derivation tree generation
Parsing #2 Leonidas Fegaras.
First, Follow and Select sets
Programming Language Principles
Kanat Bolazar February 16, 2010
Operator precedence and AST’s
Bottom-up derivation tree generation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Operator Precedence and Associativity
Chap. 3 BOTTOM-UP PARSING
Programming Language Principles
Programming Language Concepts
Presentation transcript:

Context-Free Languages Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators

Context-Free Grammars Definition: A context-free grammar (CFG) is a quadruple G = (, , P, S), where all productions are of the form A  , where A   and   (  u  ).* Left-most derivation: At each step, the left-most nonterminal is re-written. Right-most derivation: At each step, the right-most nonterminal is re-written.

Derivation Trees Derivation trees: Describe re-writes, independently of the order (left-most or right-most). Each tree branch matches a production rule in the grammar.

Derivation Trees (cont’d) Notes: 1)Leaves are terminals. 2)Bottom contour is the sentence. 3)Left recursion causes left branching. 4)Right recursion causes right branching.

Goals of Parsing Examine input string, determine whether it's legal. Equivalent to building derivation tree. Added benefit: tree embodies syntactic structure of input. Therefore, tree should be unique.

Grammar Ambiguity Definition: A CFG is ambiguous if there exist two different right-most (or left- most, but not both) derivations for some sentence z. (Equivalent) Definition: A CFG is ambiguous if there exist two different derivation trees for some sentence z.

Ambiguous Grammars Classic ambiguities: – Simultaneous left/right recursion: E → E + E –Dangling else problem: S → if E then S → if E then S else S

Grammar Reduction What language does this grammar generate? S → aD → EDBC A → BCDEFE → CBA B → ASDFAF → S C → DDCF L(G) = {a} Problem: Many nonterminals (and productions) cannot be used in the generation of any sentence.

Grammar Reduction Definition: A CFG is reduced iff for all A  Ф, a) S =>* α Aβ, for some α, β  V*, (we say A is generable), and b) A =>* z, for some z  Σ* (we say A is terminable) G is reduced iff every nonterminal A is both generable and terminable.

Grammar Reduction Example:S → BBA → aA B → bB → a B is not terminable, since B =>* z, for any z  Σ*. A is not generable, since S =>* α Aβ, for any α,βV*.

Grammar Reduction To find out which nonterminals are generable: 1.Build the graph (Ф, δ), where (A, B)  δ iff A → α Bβ is a production. 2.Check that all nodes are reachable from S.

Grammar Reduction Example: S → BBA → aA B → bB → a A is not reachable from S, so A is not generable. SB A

Grammar Reduction Algorithmically, Generable := {S} while(Generable changes) do for each A →  Bβ do if A  Generable then Generable := Generable U {B} od { Now, Generable contains the nonterminals that are generable }

Grammar Reduction To find out which nonterminals are terminable: 1.Build the graph (2 Ф, δ), where (N, N U {A})  δ iff A → X 1 … X n is a production, and for all i, either X i  Σ or X i  N. 2.Check that the node Ф (set of all nonterminals) is reachable from node ø (empty set).

Grammar Reduction Example:S → BBA → aA B → bB → a {A, S, B} not reachable from ø ! Only {A} is reachable from ø. Thus S and B are not terminable. {A,B}{B} {B,S} {S} ø {A} {A,S}{A,S,B}

Grammar Reduction Algorithmically, Terminable := { }; while (Terminable changes) do for each A → X 1 …X n do if every nonterminal among the X’s is in Terminable then Terminable := Terminable U {A} od { Now, Terminable contains the nonterminals that are terminable. }

Grammar Reduction Reducing a grammar: 1.Find all generable nonterminals. 2.Find all terminable nonterminals. 3.Remove any production A → X 1 … X n if either a) A is not generable b) any X i is not terminable 4.If the new grammar is not reduced, repeat the process.

Grammar Reduction Example:E → E + TF → not F → TQ → P / Q T → F * TP → (E) → P → i Generable: {E, T, F, P}, not Generable: {Q} Terminable: {P, T, E}, not Terminable: {F, Q} So, eliminate every production for Q, and every production whose right-part contains either F or Q.

Grammar Reduction New Grammar: E → E + T → T T → P P → (E) → i Generable: {E, T, P}Now, grammar Terminable: {P, T, E}is reduced.

Operator Precedence and Associativity Let’s build a CFG for expressions consisting of: elementary identifier i. + and - (binary ops) have lowest precedence, and are left associative. * and / (binary ops) have middle precedence, and are right associative. + and - (unary ops) have highest precedence, and are right associative.

Sample Grammar for Expressions E → E + T E consists of T's, → E - T separated by –’s and +'s → T (lowest precedence). T → F * T T consists of F's, → F / T separated by *'s and /'s → F (next precedence). F → - F F consists of a single P, → + F preceded by +'s and -'s. → P (next precedence). P → '(' E ')' P consists of a parenthesized E, → i or a single i (highest precedence).

Operator Precedence and Associativity (cont’d) Operator Precedence: –The lower in the grammar, the higher the precedence. Operator Associativity: –left recursion in the grammar means left associativity of the operator, and causes left branching in the tree. –right recursion in the grammar means right associativity of the operator, and causes right branching in the tree.

Building Derivation Trees Sample Input : - + i - i * ( i + i ) / i + i (Human) derivation tree construction: Bottom-up. On each pass, scan entire expression, process operators with highest precedence (parentheses are highest). Lowest precedence operators are last, at the top of tree.

Operator Precedence and Associativity Exercise: Write a grammar for expressions that consists of: –elementary identifier ‘i’. –‘&’, ‘¢’, ‘*’ are next (left associative) –‘%’, ‘#’, are next (right associative) ‘!’ have highest precedence (left associative.) –Parentheses override precedence and associativity.

Precedence and Associativity Grammar:E0 → E0 & E1 → E0 ¢ E1 → E0 * E1 → E1 E1 → E2 % E1 → E2 # E1 → E2 E2 → E3 → E2 ! E3 → E3 E3 → (E0) → i

Operator Precedence and Associativity Example: Construct the derivation tree for: i & i # i ¢ ( i * i & i ! ) % ( i & i ) # i Easier to construct the tree from the leaves to the root. On each pass, scan the entire expression, and process first the operators with highest precedence. Leave operators with lowest precedence for last.

Derivation Tree

Transduction Grammars Definition: A transduction grammar (a.k.a. syntax- directed translation scheme) is like a CFG, except for the following generalization: Each production is a triple (A, β, ω)  Ф x V* x V*, called a translation rule, denoted A → β => ω, where A is the left part, β is the right part, and ω is the translation part.

Sample Transduction Grammar Translation of infix to postfix expressions. E → E + T=> E T + → T=> T T → P * T=> P T * → P=> P P → (E)=> E Note: ()’s discarded → i=> i The translation part describes how the output is generated, as the input is derived.

Sample Transduction Grammar We keep track of a pair (, β), where  and β are the sentential forms of the input and output. ( E, E ) => ( E + T, E T + ) => ( T + T, T T + ) => ( P + T, P T + ) => ( i + T, i T + ) => ( i + P * T, i P T * + ) => ( i + i * T, i i T * + ) => ( i + i * i, i i i * + )

String to Tree Transduction Transduction to Abstract Syntax Trees Notation: denotes String-to-tree transduction grammar: E → E + T=> → T=> T T → P * T=> → P=> P P → (E)=> E → i=> i t 1 … t n N

String to Tree Transduction Example: (E, E) => (E + T, ) => (T + T, ) => (P + T, ) => (i + T, ) => (i + P * T, >) => (i + i * T, >) => (i + i * P, >) => (i + i * i, >) i + i * i

String to Tree Transduction Definition: A transduction grammar is simple if for every rule A →  => β, the sequence of nonterminals appearing in  is identical to the sequence appearing in β. Example:E → E + T=> → T=> T T → P * T=> → P=> P P → (E)=> E → i=> i

String to Tree Transduction For notational convenience, we dispense with both the nonterminals and the tree notation in the translation parts, leaving E → E + T=> + → T T → P * T=> * → P P → (E) → i=> i Look familiar ?

Abstract Syntax Trees AST is a condensed version of the derivation tree. No noise (intermediate nodes). Result of simple String-to-tree transduction grammar. Rules of the form A → ω => 's'. Build 's' tree node, with one child per tree from each nonterminal in ω. We transduce from vocabulary of input symbols (which appear in ω), to vocabulary of tree node names.

Sample AST Input :: - + i - i * ( i + i ) / i + i DTG: AST:

The Game of Syntactic Dominoes The grammar: E → E+TT → P*TP → (E) → T → P → i The playing pieces: An arbitrary supply of each piece (one per grammar rule). The game board: Start domino at the top. Bottom dominoes are the "input".

Parsing: The Game of Syntactic Dominoes (cont’d) Game rules: –Add game pieces to the board. –Match the flat parts and the symbols. –Lines are infinitely elastic. Object of the game: –Connect start domino with the input dominoes. –Leave no unmatched flat parts.

Parsing Strategies Same as for the game of syntactic dominoes. –“Top-down” parsing: start at the start symbol, work toward the input string. –“Bottom-up” parsing: start at the input string, work towards the goal symbol. In either strategy, can process the input left-to-right or right-to-left 

Top-Down Parsing Attempt a left-most derivation, by predicting the re-write that will match the remaining input. Use a string (a stack, really) from which the input can be derived.

Top-Down Parsing Start with S on the stack. At every step, two alternatives: 1) (the stack) begins with a terminal t. Match t against the first input symbol. 2) begins with a nonterminal A. Consult an OPF (omniscient parsing function) to determine which production for A would lead to a match with the first symbol of the input. The OPF does the “predicting” in such a predictive parser.

Classical Top-Down Parsing Algorithm Push (Stack, S); while not Empty (Stack) do if Top(Stack)  then if Top(Stack) = Head(input) then input := tail(input) Pop(Stack) else error (Stack, input) else P:= OPF (Stack, input) Push (Pop(Stack), RHS(P)) od

Top-Down Parsing (cont’d) Most parsing methods impose bounds on the amount of stack lookback and input lookahead. For programming languages, a common choice is (1,1). We must define OPF (A,t), where A is the top element of the stack, and t is the first symbol on the input. Storage requirements: O(n 2 ), where n is the size of the grammar vocabulary (a few hundred).

Top-Down Parsing OPF (A, t) = A → ω if 1. ω =>* t, for some . 2. ω =>* ε, and S =>* At, for some , , where  =>* ε. ω t … or A …

Top-Down Parsing ExampleS → AB → b (illustrating 1):A → BAdC → c → C OPF bcd BB → bB → bB → b CC → cC → cC → c SS → AS → AS → A AA → BAdA → C ??? OPF (A, b) = A → BAd because BAd =>* bAd OPF (A, c) = A → C because C =>* c i.e., B begins with b, and C begins with c. Tan entries are optional. So is the ??? entry.

Top-Down Parsing Example (illustrating 2): S → AA → bAd → OPF bd  SS → A S → A AA → bAdA → A → OPF (S, b) = S → A, because A =>* bAd OPF (S, d) = , because S =>* α S  dβ OPF (S,  ) = S → A, because S  is legal OPF (A, b) = A → bAd, because A =>* bAd OPF (A, d) = A →, because S =>* bAd OPF (A,  ) = A →, because S  =>*A 

Top-Down Parsing Definition: First (A) = {t / A =>* t, for some } Follow (A) = {t / S =>* Atβ, for some , β} Computing First sets: 1.Build graph (Ф, δ), where (A,B)  δ if B → A,  =>* ε (First(A)  First(B)) 2.Attach to each node an empty set of terminals. 3.Add t to the set for A if A → A,  =>* ε. 4.Propagate the elements of the sets along the edges of the graph.

Top-Down Parsing Example:S → ABCDA → CDAC → A B → BC → aD → AC → b → Nullable = {A, C, D} SB AC D {a, b} {a} {b} White: after step 3 Tan: after step 4 {a}

Top-Down Parsing Computing Follow Sets: 1.Build graph (Ф, δ), where (A,B)  δ if A → B,  =>* ε. Follow(A)  Follow(B), because any symbol X that follows A, also follows B. AX B  α ε

Top-Down Parsing 2.Attach to each node an empty set of terminals. Add  to the set for the start symbol. 3.Add First(X) to the set for A (i.e. Follow(A)) if B → AX,  =>* ε. 4.Propagate the elements of the sets along the edges of the graph.

Top-Down Parsing Example:S → ABCDA → CDAC → A B → BC → aD → AC → b → Nullable = {A, C, D} First(S) = {a, b} First(C) = {a} First(A) = {a} First(D) = {a} First(B) = {b} SB AC D } {a,b, {a ┴ ┴ ┴ { }, },b, } ┴ {a,b, } ┴ White: after step 3 Tan: after step 4

Top-Down Parsing So, Follow(S) = {} Follow(A) = Follow(C) = Follow(D) = {a, b, } Follow(B) = {a, }

Top-Down Parsing Back to Parsing … We want OPF(A, t) = A → ω if either 1.t  First(ω), i.e. ω =>* tβ 2.ω =>* ε and t  Follow(A), i.e. S =>* A => *Atβ ω t β A α t β A α ω ε

Top-Down Parsing Definition: Select (A → ω) = First(ω) U if ω =>* ε then Follow(A) else ø So PT(A, t) = A → ω if t  Select(A → ω) “Parse Table”, rather than OPF, because it isn’t omniscient.

Top-Down Parsing Example:First (S) = {a, b}Follow (S) = { } First (A) = {a}Follow(A) = {a, b, } First (B) = {b}Follow(B) = {a, } First (C) = {a}Follow (C) = {a, b, } First (D) = {a}Follow(D) = {a, b, } Grammar Selects sets S → ABCD{a, b} B → BC{b} → b{b} A → CDA{a, b, } → a{a} → {a, b, } C → A{a, b, } D → AC{a, b, } not disjointnot pair-wise disjoint Grammar is not LL(1)

Top-Down Parsing a b ┴ SS → ABCD S → ABCD AA → CDA, A → a, A → A → CDA, A → A → CDA,A → B B → BC, B → b CC → A C → A C → A DD → AC D → AC D → AC Non LL(1) grammar: multiple entries in PT. S → ABCD {a, b} C → A {a, b, } B → BC {b} D → AC {a, b, } → b {b} A → CDA {a, b, } → a {a} → {a, b, }

LL(1) Grammars Definition: A CFG G is LL(1) ( Left-to-right, Left-most, (1)-symbol lookahead) iff for all A Ф, and for all productions A → , A →  with   , Select (A → ) ∩ Select (A → ) =  Previous example: grammar is not LL(1). More later on what do to about it.

Sample LL(1) Grammar S → A{b,} A → bAd{b} → {d, } Disjoint! Grammar is LL(1) ! db  SS → A AA → A → bAdA → One production per entry.

Example Build the LL(1) parse table for the following grammar. S → begin SL end{begin} → id := E;{id} SL → SL S{begin,id} → S{begin,id} E → E+T{(, id} → T{(, id} T → P*T{(, id} → P{(, id} P → (E) {(} → id {id} * * * * - not LL(1)

Example (cont’d) Lemma: Left recursion always produces a non-LL(1) grammar (e.g., SL, E above) Proof: Consider A → A  First (  ) or Follow (A) →  First (  ) Follow (A)

Problems with our Grammar 1.SL is left recursive. 2.E is left recursive. 3.T → P * T both begin with the same → P sequence of symbols (P).

Solution to Problem 3 Change: T → P * T { (, id } → P { (, id } to: T → P X { (, id } X → * T { * } → { +, ;, ) } Follow(X) Follow(T) due to T → P X Follow(E) due to E → E+T, E → T = { +, ;, ) } due to E → E+T, S → id := E ; and P → (E) Disjoint!

Solution to Problem 3 (cont’d) In general, change A →  1 →  2... →  n to A →  X X →  1... →  n Hopefully all the  ’s begin with different symbols

Solution to Problems 1 and 2 We want (…((( T + T) + T) + T)…) Instead, (T) (+T) (+T) … (+T) Change: E → E + T { (, id } → T { (, id } To: E → T Y { (, id } Y → + T Y { + } → { ;, ) } Follow(Y)  Follow(E) = { ;, ) } No longer contains ‘+’, because we eliminated the production E → E + T

Solution to Problems 1 and 2 (cont’d) In general, Change: A → A  1 A →  1... → A  n →  m to: A →  1 X X →  1 X... →  m X →  n X →

Solution to Problems 1 and 2 (cont’d) In our example, Change: SL → SL S{ begin, id } → S{ begin, id } To: SL → S Z{ begin, id } Z → S Z{ begin, id } → { end }

Modified Grammar S → begin SL end{begin} → id := E ;{id} SL → S Z{begin,id} Z → S Z{begin,id} → {end} E → T Y (,id} Y → + T Y {+} → {;,)} T → P X{(,id} X → * T {*} → {;,+,)} P → (E) {(} → id {id} Disjoint. Grammar is LL(1)

Recursive Descent Parsing Top-down parsing strategy, suitable for LL(1) grammars. One procedure per nonterminal. Contents of stack embedded in recursive call sequence. Each procedure “commits” to one production, based on the next input symbol, and the select sets. Good technique for hand-written parsers.

Sample Recursive Descent Parser proc S; {S → begin SL end → id := E; } case Next_Token of T_begin : Read(T_begin); SL; Read (T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError; end end; “Read (T_X)” verifies that the upcoming token is X, and consumes it. “Next_Token” is the upcoming token.

Sample Recursive Descent Parser proc SL; {SL → SZ} S; Z; end; proc E; {E → TY} T; Y; end; Technically, should have insisted that Next Token be either T_begin or T_id, but S will do that anyway. Checking early would aid error recovery. // Ditto for T_( and T_id.

Sample Recursive Descent Parser proc Z;{Z → SZ → } case Next Token of T_begin, T_id: S;Z; T_end: ; otherwise Error; end end;

Sample Recursive Descent Parser proc Y; {Y → +TY → } if Next Token = T_+ then Read (T_+) T; Y; end; proc T; {T → PX} P; X end; Could have used a case statement Could have checked for T_( and T_id.

Sample Recursive Descent Parser proc X;{X → *T → } if Next Token = T_* then Read (T_*); T; end;

Sample Recursive Descent Parser proc P; {P → (E) → id } case Next Token of T_(: Read (T_(); E; Read (T_)); T_id: Read (T_id); otherwise Error; end end;

String-To-Tree Transduction Can obtain derivation or abstract syntax tree. Tree can be generated top-down, or bottom-up. We will show how to obtain 1.Derivation tree top-down 2.AST for the original grammar, bottom-up.

TD Generation of Derivation Tree In each procedure, and for each alternative, write out the appropriate production AS SOON AS IT IS KNOWN

TD Generation of Derivation Tree proc S; {S → begin SL end → id := E; } case Next_Token of T_begin :Write(S → begin SL end); Read(T_begin); SL; Read(T_end);

TD Generation of Derivation Tree T_id : Write(S → id :=E;); Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError end end;

TD Generation of Derivation Tree proc SL; {SL → SZ} Write(SL → SZ); S; Z; end; proc E; {E → TY} Write(E → TY); T; Y; end;

TD Generation of Derivation Tree proc Z; {Z → SZ → } case Next_Token of T_begin, T_id: Write(Z → SZ); S; Z; T_end: Write(Z → ); otherwise Error; end end;

TD Generation of Derivation Tree proc Y; {Y → +TY → } if Next_Token = T_+ then Write (Y → +TY); Read (T_+); T; Y; else Write (Y → ); end;

TD Generation of Derivation Tree proc T; {T → PX} Write (T → PX); P; X end; proc X;{X → *T → }

TD Generation of Derivation Tree if Next_Token = T_* then Write (X → *T); Read (T_*); T; else Write (X → ); end;

TD Generation of Derivation Tree proc P;{P → (E) → id } case Next_Token of T_(: Write (P → (E)); Read (T_(); E; Read (T_)); T_id: Write (P → id); Read (T_id); otherwise Error; end;

Notes The placement of the Write statements is obvious precisely because the grammar is LL(1). Can build the tree “as we go”, or have it built by a post-processor.

Example Input String: begin id := (id + id) * id; end Output: S → begin SL end SL → SZ S → id :=E; E → TY T → PX P → (E) E → TY T → PX P → id X → Y → +TY T → PX P → id X → Y → X → *T T → PX P → id X → Y → Z →

Bottom-up Generation of the Derivation Tree We could have placed the write statements at the END of each phrase, instead of the beginning. If we do, the tree will be generated bottom-up. In each procedure, and for each alternative, write out the production A   AFTER  is parsed.

BU Generation of the Derivation Tree proc S;{S → begin SL end → id := E; } case Next_Token of T_begin:Read (T_begin); SL; Read (T_end); Write (S → begin SL end); T_id:Read (T_id); Read (T_:=); E; Read (T_;); Write (S → id:=E;); otherwise Error; end;

BU Generation of the Derivation Tree proc SL; {SL → SZ} S; Z; Write(SL → SZ); end; proc E; {E → TY} T; Y; Write(E → TY); end;

BU Generation of the Derivation Tree proc Z; {Z → SZ → } case Next_Token of T_begin, T_id: S; Z; Write(Z → SZ); T_end: Write(Z → ); otherwise Error; end end;

BU Generation of the Derivation Tree proc Y; {Y → +TY → } if Next_Token = T_+ then Read (T_+); T; Y; Write (Y → +TY); else Write (Y → ); end;

BU Generation of the Derivation Tree proc T; {T → PX } P; X; Write (T → PX) end; proc X;{X → *T → } if Next_Token = T_* then Read (T_*); T; Write (X → *T); else Write (X → ); end

BU Generation of the Derivation Tree proc P;{P → (E) → id } case Next_Token of T_(: Read (T_(); E; Read (T_)); Write (P → (E)); T_id: Read (T_id); Write (P → id); otherwise Error; end;

Notes The placement of the Write statements is still obvious. The productions are emitted as procedures quit, not as they start. Productions emitted in reverse order, i.e., the sequence of productions must be used in reverse order to obtain a right-most derivation. Again, can built tree “as we go” (need stack of trees), or later.

Example Input String : begin id := (id + id) * id; end Output: P → id X → T → PX P → id X → T → PX Y → Y → +TY E → TY P → (E) P → id X → T → PX X → *T T → PX Y → E → TY S → id:=E; Z → SL → SZ S → begin SL end

Replacing Recursion with Iteration Not all the nonterminals are needed. The recursion in SL, X, Y and Z can be replaced with iteration.

Replacing Recursion with Iteration proc S; {S → begin SL end → id := E; case Next_Token of T_begin :Read(T_begin); repeat S; until Next_Token  {T_begin,T_id}; Read(T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); otherwiseError; end end; Replaces recursion on Z. Replaces call to SL. SL SL → S Z Z → S Z → }

Replacing Recursion with Iteration proc E; {E → TY Y → +TY → } T; while Next_Token = T_+ do Read (T_+); T; od end; Replaces recursion on Y.

Replacing Recursion with Iteration proc T; {T → PX X → *T → } P; if Next_Token = T_* thenRead (T_*); T; end; Replaces call to X.

Replacing Recursion with Iteration proc P;{P → (E) → id } case Next_Token of T_(: Read (T_(); E; Read (T_)); T_id: Read (T_id); otherwise Error; end end;

Construction of Derivation Tree for the Original Grammar (Bottom Up) proc S; { (1)S → begin SL end (2)S → begin SL end → id := E; → id := E; SL → SZ SL → SL S Z → SZ → S → } case Next_Token of T_begin :Read(T_begin); S; Write (SL → S); while Next_Token in {T_begin,T_id} do S; Write (SL → SL S); od Read(T_end); T_id : Read(T_id); Read (T_:=); E; Read (T_;); Write (SL → id :=E;); otherwise Error; end end;

Construction of Derivation Tree for the Original Grammar (Bottom Up) proc E; {(1)E → TY (2) E → E+T Y → +TY → T → } T; Write (E → T); while Next_Token = T_+ do Read (T_+); T; Write (E → E+T); od end

Construction of Derivation Tree for the Original Grammar (Bottom Up) proc T; {(1)T → PX (2) T → P*T X → *T → P → } P; if Next_Token = T_* thenRead (T_*); T; Write (T → P*T) else Write (T → P); end;

Construction of Derivation Tree for the Original Grammar (Bottom Up) proc P;{(1)P → (E) (2)P → (E) → id → id } // SAME AS BEFORE end;

Example Input String : begin id := (id + id) * id; end Output : P → id T → P E → T P → id T → P E → E+T P → (E) P → id T → P T → P*T E → T S → id:=E; SL → S S → begin SL end

Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc S; { S → begin S+ end  'block' → id := E;  'assign' var N:integer; case Next_Token of T_begin :Read(T_begin); S; N:=1; while Next_Token in {T_begin,T_id} do S; N:=N+1; od Read(T_end); Build Tree ('block',N); T_id : Read(T_id); Read (T_:=); E; Read (T_;); Build Tree ('assign',2); otherwise Error end end; Assume this builds a node. Build Tree (‘x’,n) pops n trees from the stack, builds an ‘x’ node as their parent, and pushes the resulting tree.

Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc E; {E → E+T  '+' → T } T; while Next_Token = T_+ do Read (T_+) T; Build Tree ('+',2); od end; Left branching in tree!

Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc T; {T → P*T  '*' → P } P; if Next_Token = T_* thenRead (T_*) T; Build Tree ('*',2); end; Right branching in tree!

Generating the Abstract Syntax Tree, Bottom Up, for the Original Grammar proc P;{P → (E) → id } // SAME AS BEFORE, // i.e.,no trees built end;

Example Input String : begin id 1 := (id 2 + id 3 ) * id 4 ; end Sequence of events : id 1 id 2 id 3 id 4 BT( ' + ',2) BT( ' * ',2) BT( ' assign ',2) BT( ' block ',1)

Summary Bottom-up or top-down tree construction. Original or modified grammar. Derivation Tree or Abstract Syntax Tree. Technique of choice: –Top-down, recursive descent parser. –Bottom-up tree construction for the original grammar.

LR Parsing Procedures in the recursive descent code can be “annotated” with items, i.e. productions with a “dot” marker somewhere in the right-part. We can use the items to describe the operation of the recursive descent parser. There is an FSA that describes all possible calling sequences in the R.D. parser.

Recursive Descent Parser with items Example: proc E; {E →.E + T, E →.T} T; {E → E. + T,E → T.} while Next_Token = T_+ do {E → E. + T} Read(T_+);{E → E +.T } T;{E → E + T.} od {E → E + T.E → T.} end; T T + T

FSA Connecting Items The FSA is: M = (DP, V, , S’ →.S, { S’ → S.}) where DP is the set of all possible items (DP: dotted productions), and  is defined such that simulate a call to B simulate the execution of statement X, if X is a nonterminal, or Read(X), if X is a terminal. 1 A → α. B β B →. ω 2 A → α. X β A → X.β X 

FSA Connecting Items Example:E → E + TT → iS → E  → TT → (E) S →. E  S → E .S → E.  E →. T E → T. T →. iT → i. T →. (E)T → (E). T → (E.) T → (.E) E →.E + TE → E. + TE → E +. TE → E + T. ε ε ε ε E ε ε E+ ε E ε T ε ε ( i T ) ┴

FSA Connecting Items Need to run this machine with the aid of a stack, i.e. need to keep track of the recursive calling sequence. To “return” from A → ω., back up |ω| + 1 states, then advance on A. Problem with this machine: it is nondeterministic. No problem. Be happy. Transform it to a DFA !

Deterministic FSA Connecting Items THIS IS AN LR(0) AUTOMATON S →. E ┴ S → E. ┴ ┴ E →. T E → T. T →. i T → i.T →. (E) T → (E). T → (E.) T → (.E) E →.E + T E → E. + T E → E +. T E → E + T. E T ( i T ┴ E →.E + T T →.i E →.T T →.(E) E → E. + T T →.i T →.(E) i i E + ( T ) ( +

LR Parsing LR means “Left-to-Right, Right-most Derivation”. Need a stack of states to operate the parser. No look-ahead required, thus LR(0). DFA describes all possible positions in the R.D. parser’s code. Once the automaton is built, items can be discarded.

LR Parsing Operation of an LR parser Two moves: shift and reduce. Shift: Advance from current state on Next_Token, push new state on stack. Reduce: (on A → ω). Pop |ω| states from stack. Advance from new top state on A.

LR Parsing StackInputDerivation Tree 1 i + (i + i) i + ( i + i ) 14 + (i + i) 13 + (i + i) T 12 + (i + i) 127 (i + i) E 1275 i + i) i) i) T i) i) E ) ) T ) E T 12 E ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ ┴ E E i i i ( ( ) + + ┴ T T E → E+T T → (E) E → T ( T T→i

LR Parsing Table Representation of LR Parsers Two Tables: Action Table: indexed by state, and by terminal symbol. Contains all shift and reduced moves. GOTO Table: indexed by state, and by nonterminal symbol. Contains all transitions on nonterminals symbols.

LR Parsing Example: E E i i i ( ( ) + + ┴ T T E → E+T E → T ( T 1S/4S/523 2S/7S/6 3R/E→T 4R/T→ i 5S/4S/583 6Accept 7S/4S/59 8S/7S/10 9R/ E →E+T R/ E →E+T R/ E →E+T R/ E →E+T R/ E →E+T 10R/ T → (E) R/ T → (E) R/ T → (E) R/ T → (E) R/ T → (E) ACTIONGOTO i + ( ) E T ┴ T → (E) T→i

LR Parsing Algorithm LR_Driver: Push(Start_State, S); while ACTION (Top(S), ) ≠ Accept do case ACTION (Top(S), Next_Token) of s/r: Read(Next_Token); Push(r, S) R/A → ω: Pop(S) |ω| times; Push(GOTO (Top(S), A), S); empty: Error; end end;

LR Parsing Direct Construction of the LR(0) Automaton PT(G) = Closure({S’ →.S  }) U {Closure(P) | P  Successors(P’), P’  PT(G)} Closure(P) = P U {A →.w | B → α.Aβ  Closure(P)} Successors(P) = {Nucleus(P, X) | X  V} Nucleus(P, X) = {A → α X.β | A → α.Xβ  P}

LR Parsing Direct Construction of Previous Automaton S →.E E →.E + T E →.T T →.i T →.(E) S → E. E → E. + T E → T. T → i. T → (.E) E →.E + T E →.T T →.i T →.(E) S → E . E → E +.T T →.i T →.(E) ┴ T → (E.) E → E. + T E → E + T. T → (E) ┴ E 2 E 2 T 3 i 4 ( ┴ E 8 E 8 T 3 i 4 ( 5 T 9 i 4 ( 5 ) + 7

LR Parsing Notes: Two states are “equal” if their Nuclei are identical. This grammar is LR(0) because there are no “conflicts”. A conflict occurs when a state contains i – Both a final (dot-at-the-end) item and non-final one (shift-reduce), or ii – Two or more final items (reduce-reduce).

LR Parsing Example:E → E + TT → P * TP → i → T → PP → (E) S →.E E →.E + T E →.T T →.P * T T →.P P →.i P →.(E) S → E. E → E. + T E → T. T → P. * T T → P. P → (.E) E →.E + T E →.T T →.P * T T →.P P →. i P →.(E) S → E. E → E +.T T →.P * T T →.P P →.i P →.(E) ┴ T → P *.T T →.P * T T →.P P →.i P →.(E) P → (E.) E → E. + T E → E + T ┴ E 2 E 2 T 3 P ┴ E E T 3 P 4 P 4 T 11 P 4 ( 6 T 12 P 4 P 4 i 5 ( 6 * 9 P →i. i 5 ( 6 ┴ P 4 i 5 P 4 i 5 ( 6 ) T → P * T. 13 P → (E). Grammar is not LR(0).

LR Parsing Solution: Use lookahead! In LL(1), lookahead is used at the beginning of the production. In LR(1), lookahead is used at the end of the production. We will use:SLR(1) – Simple LR(1) LALR(1) – Lookahead LR(1)

LR Parsing The Conflict appears in the ACTION table, as multiple entries. +*i()+*i() 1 S/5 S/6 2 S/8 S/7 3 R/E →T 4 5 R/P →i 6 S/5 S/6 7 Accept 8 S/5 S/6 9 S/5 S/6 10 S/8 S/13 11 R/E →E+T 12 R/T →P*T 13 R/P →(E) ACTIONACTION ┴ R/T→P S/9,R/T → P R/T→P

LR Parsing SLR(1): For each “inconsistent” state p, compute Follow(A) for each conflict production A → ω. Then place “R/A → ω” in the ACTION table, row p, column t, only if t  Follow(A). In our case, Follow(T)  Follow(E) = {+, ),  }. So, +*i()+*i() 4 R/T → P S/9 R/T → P R/T → P ┴ Grammar is SLR(1)

LR Parsing Example:S → aSb{a n b n / n > 0} → S’ →.S S →.aSb S →. S’ → S. S → a.Sb S →.aSb S →. S’ → S S 2 a 3 4 S 5 ┴ a 3 S → aS.b ┴ ┴ ┴ 6 S → aSb. b 6 12 S 4 35 a 6 ┴ S b S → a b  S 1S/3 R/S→ R/S→ R/S→ 2 2 S/4 3S/3 R/S→ R/S→ R/S→ 5 4Accept Accept Accept 5 S/6 6R/S→aSb Grammar is not LR(0) a S → aSb 4

LR Parsing SLR(1) Analysis: State 1: Follow(S) = {b, }. Since a  Follow(S), the shift/reduce conflict is resolved. State 3: Same story. Rows 1 and 3 become ab┴ S 1S/3R/S →R/S → 2 3S/3R/S →R/S → 5 All single entries. Grammar is SLR(1).

LR Parsing LALR(1) Grammars Consider the grammar:S → AbAaA → a → BaB → a LR(0) Automaton: 12 S 6 37 A 10 b a A → a 48 B a A → AbAa 911 A a 5 a S → Ba A → a B → a Grammar is not LR(0): reduce-reduce conflict. 

LR Parsing SLR(1) Analysis: (State 5) Follow(A) = {a, b} Follow(B) = {a} Conflict not resolved. Grammar is not SLR(1).

LR Parsing LALR(1) Technique: I. For each conflicting reduction A → ω at each inconsistent state q, find all nonterminal transitions (p i, A) such that II. Compute Follow(p i, A) (see below), for all i, and union together the results. The resulting set is the LALR(1) lookahead set for the A → ω reduction at q. p1p1 A q A pnpn A → ω ω ω

LR Parsing Computation of Follow(p, A): Ordinary Follow computation, except on a different grammar, called G’. G’ embodies both the structure of G, and the structure of the LR(0) automaton. To build G’: For each nonterminal transition (p, A) and for each production A → ω, there exists the following in the LR(0) automaton: For each such situation, G’ contains a production of the form: (p, A) → (p, w 1 )(p 2, w 2 )…(p n, w n ) p A q A → w 1 … w n … w2w2 wnwn w1w1

LR Parsing In our example: G: S → AbAa A → a → Ba B → a G’: (1, S) → (1, A)(3, b)(7, A)(9, a) → (1, B)(4, a) (1, A) → (1, a) (7, A) → (7, A) (1, B) → (1, a) these have split! 12 S 6 37 A 10 b a A → a 48 B a A → AbAa 911 A a 5 a S → Ba A → a B → a 

LR Parsing For the conflict in state 5, we need Follow(1, A) = {(3, b)} Follow(1, B) = {(4, a)}. Extract the terminal symbols from these to obtain ab┴ 5 R/B → a R/A → a Conflict is resolved. Grammar is LALR(1). 5 a A → a {b} B → a {a}

LR Parsing Example:S → bBbB → A → aBaA → c → acb LR(0) Automaton: Grammar is not LR(0). 12 S b A → c a S → bBb S → aBa A A a c B B c B → A b 13 S → acb b G A → c State 10 is inconsistent (shift-reduce conflict). 

LR Parsing SLR(1) Analysis, state 10: Follow(A)  Follow(B) ={a, b}. Grammar is not SLR(1). LALR(1) Analysis: Need Follow(4, A). G’: (1,S) → (1, b)(3, B)(6, b) (3, B) → (3, A) → (1, a)(4, B)(9, a) (4, B) → (4, A) → (1, a)(4, c)(10, b) (3, B) → (3, c) (4, A) → (4, c) Thus Follow(4, A)  Follow(4, B) = {(9, a)}. The lookahead set is {a}. The grammar is LALR(1).

LR Parsing Example:S → aBdB → A → aDaA → a → bBaD → a → bDb LR(0) Automaton: 12 S a a S → aBb S → aDa D A a B A a b 13 S → bBa b 4 b a S → bDb D 1014 B B → A A → a D → a G State 10 is inconsistent. Grammar is not LR(0).

LR Parsing SLR(1) Analysis: Follow(A) Follow(B) = {a, b} Follow(D) = {a, b} Grammar is not SLR(1). LALR(1) Analysis: G’: (1, 5) → (1, a)(3, B)(5, b)(3, B) → (3, A) → (1, a)(3, D)(7, a)(3, D) → (3, a) → (1, b)(4, B)(9, a)(3, A) → (3, a) → (1, b)(4, D)(10, b)(4, B) → (4, A) (4, D) → (4, a) (4, A) → (4, a) Need:Follow(3, A) U Follow(4, A) = {a, b} Follow(3, D) U Follow(4, D) = {a, b} The lookahead sets are not disjoint. The grammar is not LALR(1). с

LR Parsing Solution: Modify the LR(0) automaton, by splitting state 8 in two states. LR(1) Parsers: Construction similar to LR(0). Difference: lookahead symbol carried explicitly, as part of each item, e.g. A → α. β: t PT(G) = Closure({S’ →.S: }) U {Closure(P) | P  Successors(P’), P’  PT(G)} ┴

LR Parsing Closure(P) = P U {A →.w: t’ | B → α. Aβ: t  Closure(P), t’  First(βt)} Successors(P) = {Nucleus(P, X) | X  V} Nucleus(P, X) = {A → αX. Β: t | A → α. XΒ: t  P} Notes: New lookahead symbols appear during Closure. Lookahead symbols are carried from state to state.

LR Parsing Example:S → aBdB → A → aDaA → a → bBaD → a → bDb S’ →.S: S →.aBd: S →.aDa: S →.bBa: S →.bDb: 1 ┴ ┴ ┴ ┴ S a a b b ┴ S 2 S → S.: 3 S → b.Ba: S → b.Db: B →.A: a A →.a: a D →.a: b ┴ ┴ B D A a a S → b.Ba: S → b.Db: B →.A: a A →.a: a D →.a: b ┴ ┴ B D A a a S → aB.b: ┴ b 6 S → aD.a: ┴ a 13 7 B → A.:b 8 A → a.: b D → a.: b 9 S → bB.a: 10 S → bd.b: 11 B → A.: a 12 A → a. : a D → a. : b 13 S → aBb.: 13 S → aDa.: S → bBa.: S → bDb.: ┴ ┴ ┴ ┴ ┴ ┴ a b ┴

LR Parsing No conflicts. Grammar is LR(1). 12 S a D A B D → a: a 5 A 6 13 a S → aBb: 14 b B → A: b A → a: b S → aDa: D A B D → a: b 12 A a S → bDb: 15 b A → a: a B → a: a S → bBa: S’ → S: b ┴ ┴ ┴ ┴

Summary of Parsing Top-Down Parsing Hand-written or Table Driven (LL(1)) S w β part of tree known stack part of tree left to predict part of tree known remaining input input already parsed α

Summary of Parsing Two moves: Terminal on stack: match input Nonterminal on stack: re-write according to Table. LL(1) Table β wDriver

Summary of Parsing Bottom-up Parsing Usually Table-Driven ( shift-reduce parsing, e.g. LR(0), SLR(1), LALR(1), LR(1) ). S w β unknown (left to predict) stack known unknown remaining input input already parsed α

Summary of Parsing Two moves: Shift if Action (Top(S), Next_Token) = S/N: Read Token and Push N. Reduce is ACTION(Top(S), Next_Token) = R/A → δ: Pop(|δ|) and Push(GOTO(Top(S), A), A). LL(1) Table β wDriver