Bernd Fischer RW713: Compiler and Software Language Engineering.

Bernd Fischer bfischer@cs.sun.ac.za RW713: Compiler and Software Language Engineering

Bottom-Up Parsing

Top-down vs. bottom-up parsing

Ex +Nat* Ex Ex  Nat | (Ex) | Ex + Ex | Ex * Ex Matched input string  success ! Corresponds to a Leftmost derivation  hence LL Ex Ex + Ex Nat + Ex Nat + Ex * Ex Nat + Nat * Ex Nat + Nat * Nat

Top-down vs. bottom-up parsing Nat + Nat * Nat Ex + Nat * Nat Ex + Ex * Nat Ex + Ex * Ex Ex + Ex Ex +Nat* Ex Ex  Nat | (Ex) | Ex + Ex | Ex * Ex Reached start symbol  success ! Corresponds to a Rightmost derivation (in reverse)!  hence LR

Shift-Reduce Parsing

REMINDER Use a parse stack to represent the derivation: initialize s = S if x = ε –if s = ε then accept else reject if tos ∈ T –if x i = tos then pop; skip x i else reject if tos ∈ N –pick a production tos → α in P; pop; push(α) Top-down parsing searches for the (leftmost) derivation using a stack. The parser stack can be explicit or implicit. symbol by symbol, in reverse order.

Shift-reduce parsing searches for the (rightmost) derivation using a stack. Use a parse stack to represent the derivation: initialize s = ε if x = ε –if s = S then accept else reject shift: push(x i ) reduce: if s = αX 1 X 2... X n –pick a production A → X 1 X 2... X n in P; pop n ; push(A); The parser stack is typically explicit. tos

Shift-reduce parsing searches for the (rightmost) derivation using a stack.

Use a parse stack to represent the derivation: initialize s = ε if x = ε –if s = S then accept else reject shift: push(x i ) reduce: if s = αX 1 X 2... X n –pick a production A → X 1 X 2... X n in P; pop n ; push(A); shift or reduce? which production? The parser stack is typically explicit.

Shift-reduce parsing searches for the (rightmost) derivation using a stack. Schematic syntax tree with α ∈ (N ∪ T)*, x, y ∈ T*, a ∈ T, and start symbol S read pointer stack “shift a”“reduce with A → γ” ? ? need to constrain choice

Shift-reduce parsing maintains a viable prefix on the stack. Definition: Let G = (N, T, P, S) be a context-free grammar and S ⇒ r * βAy ⇒ r βγy. Then γ is called a handle or redex of the right-sentential form βγy. Each prefix of βγ is called a viable prefix of G. Shift-reduce parsing invariants: The parser stack is a viable prefix. S ⇒ r * sy

Shift-reduce parsing maintains a viable prefix on the stack. Definition: Let G = (N, T, P, S) be a context-free grammar and S ⇒ r * βAy ⇒ r βγy. Then γ is called a handle or redex of the right-sentential form βγy. Each prefix of βγ is called a viable prefix of G. Theorem: The language of viable prefixes of a grammar G is regular. Corollary: We can build and use a DFA to recognize viable prefixes...... and so constrain the choice of a shift-reduce parser.

LR(0) Parsing

LR(0) items

Parsing with an NFA over LR(0) items.

Constructing the LR(0) NFA Let G = (N, T, P, S) be a context-free grammar. For each nonterminal A ∈ N, construct the item automaton. Build union of item automata: Start state is the start state of item automaton for S, final states are final states of item automata. Add transitions from each state which contains the dot in front of a nonterminal A to the starting state of the item automaton of A. Theorem: The automaton obtained in this way exactly accepts the language of viable prefixes of G if all states are declared to be final.

Constructing the LR(0) NFA

Constructing the LR(0) DFA

Direct Construction of the LR(0) DFA Needs closure operation on itemsets

Direct Construction of the LR(0) DFA Needs goto operation to represent transition relation

Direct Construction of the LR(0) DFA Example:

Parsing with the LR(0) DFA In principle the LR(0) DFA can be used for parsing: run DFA over sentential form until accepting state is reached apply accepting rule from itemset to reduce tail of viable prefix re-run DFA over new sent. form until accepting state is reached apply accepting rule from itemset to reduce tail of viable prefix... ⇒ instead: use pushdown automaton viable prefix redex

LR(0) Pushdown Automata Basic ideas: states == itemsets (conceptionally) uses two stacks: –states –grammar elements uses four kinds of actions per state –shift – push current input symbol –reduce(rule) – reduce with rule –accept –error – default uses goto-table: state x grammar symbol → state usually ignored

LR(0) Pushdown Automata Basic loop: state contains shift item A → α ● aβ –check that x i = a; syntax error if not –push a on symbol stack –push goto[tos, a] on stack state contains reduce item A → α ● –pop |α| elements off symbol stack –pop |α| elements off stack –push A on symbol stack –push goto[tos, A] on stack –accept if A = S and x = ε “dot before terminal” “dot at end”

Parsing with the LR(0) PDA

Does this always work...? shift/reduce conflict reduce/reduce conflict

Recess Refresher

Pop-Quiz... Remember: Check whether Γ 5 is LR(0)!

SLR(1)-Parsing

Does this always work...? reduce/reduce conflict shift/reduce conflict Grammar tells us to… … reduce to A if next input is a … reduce to B if next input is b … shift if next input is c follow sets

SLR(1) parsing uses follow sets to resolve conflicts in an LR(0) state. follow(A) = {a} follow(B) = {b} follow(A) ∩ follow(B) = ∅ ⇒ use next token to pick rule ⇒ resolves reduce/reduce conflict c ∉ follow(A) ∪ follow(B) ⇒ resolves shift/reduce conflict

SLR(1) Grammars Definition: Let G = (N, T, P, S) be a context-free grammar and I be a state of the LR(0) DFA for G. I has an SLR(1) conflict iff I contains two different reduce items A → α ● and B → β ● such that follow(A) ∩ follow(B) ≠ ∅ ; or two items A → α ● and B → β ● aγ such that a ∈ follow(A). G is an SLR(1) grammar if there is no SLR(1) conflict.

LR(0) vs. SLR(1) LR(0): uses sets of LR(0) items as states uses GOTO[state, grammar symbol] as transitions actions depend on state only SLR(1) uses sets of LR(0) items as states uses GOTO[state, grammar symbol] as transitions actions depend on state and next input token

xLR(1) parsing tables Different LR(1) parsing variants use the same tables:

xLR(1) parsing tables Different LR(1) parsing variants use the same tables: empty entry == syntax error empty entry == can’t happen SLR(1) tables can easily be constructed from the LR(0) DFA via the SLR(1) definition.

Construction of SLR(1) tables

Does this always work...? follow(A) = {a,b} follow(B) = {b}

Does this always work...? b ∈ follow(A) a ∈ follow(A) follow(A) = {a,b} follow(B) = {b} The follow-sets... are a global approximation of possible continuations ignore (left) derivation context

Canonical LR(1)-Parsing

LR(1) items are pairs of LR(0) items and a look-ahead symbol.

LR(1) item computation 4. LR(1) property same as SLR(1) property (but uses lookaheads from LR(1) items)

LR(1) DFA construction - example

● ● ●

LR(1) DFA construction - example same LR(0) item in different LR(1) states ⇒ different look-aheads reflect different derivation contexts ⇒ state-splitting removes SLR(1) conflict

LALR(1)-Parsing

LALR(1) DFA construction - conceptual LALR(1) tables can be constructed directly, without going via the LR(1) states. [ASLU, 4.7.5] union of lookahead sets

LALR(1) DFA construction - conceptual 4. LALR(1) property same as SLR(1) property (but uses lookaheads from merged LR(1) items)

LALR(1) DFA construction - example

conflicts are rare, though

“Hacking” xLR(1) tables

Handling precedence and associativity

Idea: remove conflicting parse table entries

Handling precedence and associativity Idea: remove conflicting parse table entries

Error handling Idea: add specific error handlers into ACTION table: default: tell legal tokens (i.e., have non-error entry)

xLR comparison

LR vs LL

LR(0) vs SLR(1) vs LALR(1) vs LR(1) method of choice (yacc and friends)

LR(0) vs SLR(1) vs LALR(1) vs LR(1)

Formal Language Theory

Classes of Grammars

Classes of Languages

Bernd Fischer RW713: Compiler and Software Language Engineering.

Similar presentations

Presentation on theme: "Bernd Fischer RW713: Compiler and Software Language Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bernd Fischer RW713: Compiler and Software Language Engineering.

Similar presentations

Presentation on theme: "Bernd Fischer RW713: Compiler and Software Language Engineering."— Presentation transcript:

Similar presentations

About project

Feedback