Bernd Fischer RW713: Compiler and Software Language Engineering
Bottom-Up Parsing
Top-down vs. bottom-up parsing
Ex +Nat* Ex Ex Nat | (Ex) | Ex + Ex | Ex * Ex Matched input string success ! Corresponds to a Leftmost derivation hence LL Ex Ex + Ex Nat + Ex Nat + Ex * Ex Nat + Nat * Ex Nat + Nat * Nat
Top-down vs. bottom-up parsing Nat + Nat * Nat Ex + Nat * Nat Ex + Ex * Nat Ex + Ex * Ex Ex + Ex Ex +Nat* Ex Ex Nat | (Ex) | Ex + Ex | Ex * Ex Reached start symbol success ! Corresponds to a Rightmost derivation (in reverse)! hence LR
Shift-Reduce Parsing
REMINDER Use a parse stack to represent the derivation: initialize s = S if x = ε –if s = ε then accept else reject if tos ∈ T –if x i = tos then pop; skip x i else reject if tos ∈ N –pick a production tos → α in P; pop; push(α) Top-down parsing searches for the (leftmost) derivation using a stack. The parser stack can be explicit or implicit. symbol by symbol, in reverse order.
Shift-reduce parsing searches for the (rightmost) derivation using a stack. Use a parse stack to represent the derivation: initialize s = ε if x = ε –if s = S then accept else reject shift: push(x i ) reduce: if s = αX 1 X 2... X n –pick a production A → X 1 X 2... X n in P; pop n ; push(A); The parser stack is typically explicit. tos
Shift-reduce parsing searches for the (rightmost) derivation using a stack.
Use a parse stack to represent the derivation: initialize s = ε if x = ε –if s = S then accept else reject shift: push(x i ) reduce: if s = αX 1 X 2... X n –pick a production A → X 1 X 2... X n in P; pop n ; push(A); shift or reduce? which production? The parser stack is typically explicit.
Shift-reduce parsing searches for the (rightmost) derivation using a stack. Schematic syntax tree with α ∈ (N ∪ T)*, x, y ∈ T*, a ∈ T, and start symbol S read pointer stack “shift a”“reduce with A → γ” ? ? need to constrain choice
Shift-reduce parsing maintains a viable prefix on the stack. Definition: Let G = (N, T, P, S) be a context-free grammar and S ⇒ r * βAy ⇒ r βγy. Then γ is called a handle or redex of the right-sentential form βγy. Each prefix of βγ is called a viable prefix of G. Shift-reduce parsing invariants: The parser stack is a viable prefix. S ⇒ r * sy
Shift-reduce parsing maintains a viable prefix on the stack. Definition: Let G = (N, T, P, S) be a context-free grammar and S ⇒ r * βAy ⇒ r βγy. Then γ is called a handle or redex of the right-sentential form βγy. Each prefix of βγ is called a viable prefix of G. Theorem: The language of viable prefixes of a grammar G is regular. Corollary: We can build and use a DFA to recognize viable prefixes and so constrain the choice of a shift-reduce parser.
LR(0) Parsing
LR(0) items
Parsing with an NFA over LR(0) items.
Constructing the LR(0) NFA Let G = (N, T, P, S) be a context-free grammar. For each nonterminal A ∈ N, construct the item automaton. Build union of item automata: Start state is the start state of item automaton for S, final states are final states of item automata. Add transitions from each state which contains the dot in front of a nonterminal A to the starting state of the item automaton of A. Theorem: The automaton obtained in this way exactly accepts the language of viable prefixes of G if all states are declared to be final.
Constructing the LR(0) NFA
Constructing the LR(0) DFA
Direct Construction of the LR(0) DFA Needs closure operation on itemsets
Direct Construction of the LR(0) DFA Needs closure operation on itemsets
Direct Construction of the LR(0) DFA Needs closure operation on itemsets
Direct Construction of the LR(0) DFA Needs goto operation to represent transition relation
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Direct Construction of the LR(0) DFA Example:
Parsing with the LR(0) DFA In principle the LR(0) DFA can be used for parsing: run DFA over sentential form until accepting state is reached apply accepting rule from itemset to reduce tail of viable prefix re-run DFA over new sent. form until accepting state is reached apply accepting rule from itemset to reduce tail of viable prefix... ⇒ instead: use pushdown automaton viable prefix redex
LR(0) Pushdown Automata Basic ideas: states == itemsets (conceptionally) uses two stacks: –states –grammar elements uses four kinds of actions per state –shift – push current input symbol –reduce(rule) – reduce with rule –accept –error – default uses goto-table: state x grammar symbol → state usually ignored
LR(0) Pushdown Automata Basic loop: state contains shift item A → α ● aβ –check that x i = a; syntax error if not –push a on symbol stack –push goto[tos, a] on stack state contains reduce item A → α ● –pop |α| elements off symbol stack –pop |α| elements off stack –push A on symbol stack –push goto[tos, A] on stack –accept if A = S and x = ε “dot before terminal” “dot at end”
Parsing with the LR(0) PDA
Does this always work...? shift/reduce conflict reduce/reduce conflict
Recess Refresher
Pop-Quiz... Remember: Check whether Γ 5 is LR(0)!
SLR(1)-Parsing
Does this always work...? reduce/reduce conflict shift/reduce conflict Grammar tells us to… … reduce to A if next input is a … reduce to B if next input is b … shift if next input is c follow sets
SLR(1) parsing uses follow sets to resolve conflicts in an LR(0) state. follow(A) = {a} follow(B) = {b} follow(A) ∩ follow(B) = ∅ ⇒ use next token to pick rule ⇒ resolves reduce/reduce conflict c ∉ follow(A) ∪ follow(B) ⇒ resolves shift/reduce conflict
SLR(1) Grammars Definition: Let G = (N, T, P, S) be a context-free grammar and I be a state of the LR(0) DFA for G. I has an SLR(1) conflict iff I contains two different reduce items A → α ● and B → β ● such that follow(A) ∩ follow(B) ≠ ∅ ; or two items A → α ● and B → β ● aγ such that a ∈ follow(A). G is an SLR(1) grammar if there is no SLR(1) conflict.
LR(0) vs. SLR(1) LR(0): uses sets of LR(0) items as states uses GOTO[state, grammar symbol] as transitions actions depend on state only SLR(1) uses sets of LR(0) items as states uses GOTO[state, grammar symbol] as transitions actions depend on state and next input token
xLR(1) parsing tables Different LR(1) parsing variants use the same tables:
xLR(1) parsing tables Different LR(1) parsing variants use the same tables: empty entry == syntax error empty entry == can’t happen SLR(1) tables can easily be constructed from the LR(0) DFA via the SLR(1) definition.
Construction of SLR(1) tables
Does this always work...? follow(A) = {a,b} follow(B) = {b}
Does this always work...? b ∈ follow(A) a ∈ follow(A) follow(A) = {a,b} follow(B) = {b} The follow-sets... are a global approximation of possible continuations ignore (left) derivation context
Canonical LR(1)-Parsing
LR(1) items are pairs of LR(0) items and a look-ahead symbol.
LR(1) item computation 4. LR(1) property same as SLR(1) property (but uses lookaheads from LR(1) items)
LR(1) DFA construction - example
● ● ●
LR(1) DFA construction - example same LR(0) item in different LR(1) states ⇒ different look-aheads reflect different derivation contexts ⇒ state-splitting removes SLR(1) conflict
LALR(1)-Parsing
LALR(1) DFA construction - conceptual LALR(1) tables can be constructed directly, without going via the LR(1) states. [ASLU, 4.7.5] union of lookahead sets
LALR(1) DFA construction - conceptual 4. LALR(1) property same as SLR(1) property (but uses lookaheads from merged LR(1) items)
LALR(1) DFA construction - example
conflicts are rare, though
“Hacking” xLR(1) tables
Handling precedence and associativity
Idea: remove conflicting parse table entries
Handling precedence and associativity Idea: remove conflicting parse table entries
Handling precedence and associativity Idea: remove conflicting parse table entries
Error handling Idea: add specific error handlers into ACTION table: default: tell legal tokens (i.e., have non-error entry)
xLR comparison
LR vs LL
LR(0) vs SLR(1) vs LALR(1) vs LR(1) method of choice (yacc and friends)
LR(0) vs SLR(1) vs LALR(1) vs LR(1)
Formal Language Theory
Classes of Grammars
Classes of Languages