Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.

Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova

Last Class Top-down (LL) parsing Bottom-up (LR) parsing
LL(1) parsing tables, FIRST, FOLLOW and PREDICT sets Writing an LL(1) grammar Bottom-up (LR) parsing Intro with example Fall 18 CSCI 4430, A Milanova

Today’s Lecture Outline
Bottom-up (LR) parsing Handles LR Items Characteristic Finite State Machine (CFSM) SLR(1) parsing tables Conflicts in SLR(1) LR parsing variants Fall 18 CSCI 4430, A Milanova

Programming Language Syntax Bottom-up Parsing
Read: Scott, Chapter 2.3.3

Bottom-up Parsing Also called LR parsing
LR parsers work with LR(k) grammars L stands for “left-to-right” scan of input R stands for “rightmost” derivation k stands for “need k tokens of lookahead” We are interested in LR(0) and LR(1) and variants in between LR parsing is better than LL parsing! Accepts larger class of languages Just as efficient! Fall 18 CSCI 4430, A Milanova

Model of the LR Parser Stack  Input
Stack: holds the part of the input seen so far A string of both terminals and nonterminals Input: holds the remaining part of the input A string of terminals Parser performs two actions Reduce: parser pops a “suitable” production right-hand-side off top of stack, and pushes production’s left-hand-side on the stack Shift: parser pushes next terminal from the input on top of the stack Fall 18 CSCI 4430, A Milanova

id + id*id Stack Input Action id+id*id shift id
expr  expr + term | term term  term * id | id Stack Input Action id+id*id shift id id id*id reduce by term id term id*id reduce by expr term expr id*id shift + expr id*id shift id expr+id *id reduce by term  id Fall 18 CSCI 4430, A Milanova

id + id*id Stack Input Action expr+term *id shift *
expr  expr + term | term term  term * id | id Stack Input Action expr+term *id shift * expr+term* id shift id expr+term*id reduce by termterm*id expr+term reduce by exprexpr+term expr accept, SUCCESS Fall 18 CSCI 4430, A Milanova

id + id*id Sequence of reductions performed by parser id+id*id
expr  expr + term | term term  term * id | id Sequence of reductions performed by parser id+id*id term+id*id expr+id*id expr+term*id expr+term expr A rightmost derivation in reverse The stack (e.g., expr) concatenated with remaining input (e.g., +id*id) gives a sentential form (expr+id*id) in the rightmost derivation. Right sentential forms.

Handle A handle Recall our example id+id*id
Notation: A,S are nonterminals. α,β are arbitrary sequences of terminals and nonterminals. w is a string of terminals. A handle Consider a rightmost derivation S …  αAw  αβw. We say that A  β at position α is a handle of αβw Recall our example id+id*id Stack Input expr+term *id Is expr  expr+term at position ε a handle of expr+term*id? expr+term*id Is term  id at position expr+term* a handle of expr+term*id? expr  expr + term | term term  term * id | id How does the parser know that it has a “handle” on top of the stack? Let us begin with the definition of the Handle. The parser starts with the string of terminals, and continuously “reduces” this string until it either ends up at the starting nonterminal (which constitutes a successful parse), or it terminates with an error along the way. The handle is a production A  β which reduces the sentential form αβw back into another sentential form, αAw, where αAw is a valid sentential form in some rightmost derivation (i.e., there is a rightmost derivation which starts at S and derives αAw). If a production A  β reduced the sentential form αβw into a form which was not derivable in a right-most derivation, then A  β is not a handle of αβw at position α. Fall 18 CSCI 4430, A Milanova

Question Consider id*id*id Stack Input
expr  expr + term | term term  term * id | id Consider id*id*id Stack Input term *id*id Is expr  term at position ε a handle of term*id*id? Answer: No! It brings sentential form term*id*id into expr*id*id which is NOT derivable from expr! No! Fall 18 CSCI 4430, A Milanova

Question How about Stack Input
expr  expr + term | term term  term * id | id How about Stack Input term*id *id Is term  term*id at position ε a handle of term*id*id? Answer: Yes! It brings sentential form term*id*id into term*id which is clearly derivable: expr  term  term*id Fall 18 CSCI 4430, A Milanova

id + id*id Stack Input Action
expr  expr + term | term term  term * id | id Stack Input Action id+id*id On state 0 and id, action[0,id] = shift 3 0id id*id On 3 and +, action[3,+] = reduce by term id Pop 3 and id, push term. 0term On 0 and term, goto[0,term] = 2 0term 2 +id*id On 2 and +, etcetera… action[2,+] = … How does the parser work? The states keep track of the configuration and help the parser recognize when it has a handle on top of the stack (and thus must reduce, as it was for example on state 3 and lookahead +) and when it does not have a handle and must continue to look for it.

Model of the LR parser … $$ LR Parser … a1 ai an sm Xm sm-1 Xm-1
Input: a1 … ai an $$ LR Parser Stack: sm Xm sm-1 Xm-1 … State Grammar Symbol In fact, the stack does not consists of grammar symbols only (as I had it in the simplified parsing examples), but it consists of symbols interspersed with integers which denote the parsing states. The states keep track of the configuration (stack+input) and help the parser recognize when it has a handle on top of the stack (and needs to pop and reduce), and when it does not have a handle (i.e., it needs to keep shifting). Parsing table: action goto goto[s,A]: After reduction to nonterminal A, what state is pushed on top of the stack? action[s,a]: Do we shift or reduce? Fall 18 CSCI 4430, A Milanova

Lecture Outline Bottom-up (LR) parsing Handles LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants We will now see how to construct the parsing table. How do we figure what the states are, when to shift, when to reduce, and where to goto? We begin with the discussion of LR items. Fall 18 CSCI 4430, A Milanova

LR Items start  expr expr  expr + term | term term  term * id | id An LR item is a production with a dot at some position on the right-hand side E.g., A  α•β We are trying to find an A We already have seen α (it is on top of the stack) We are looking for β First, we augment the original expression grammar with production start  expr. The LR Items help us figure out when we have a handle at the top of the stack. State 0 represents the beginning of the parse. In the beginning of the parse, we have an empty stack and we are at the beginning of the production for starting symbol start. We represent our location with LR Item start  •expr. This means, we have seen nothing yet, and we are looking to see expr. Note however, that since the • is right in front of nonterminal expr, we are about to see either expr+term or term. I.e., we must first see one of these right-hand sides and reduce them into expr. This is handled by taking the closure of LR item start  •expr. state 0: start  •expr expr  •expr+term expr  •term term  •term*id term  •id state 1: start  expr• expr  expr•+term Transition on expr Group related LR items into sets. Sets correspond to parsing states, state 0, 1, etc. Fall 18 CSCI 4430, A Milanova

Closure of an LR Item The closure of an LR item A  α•β is the set of LR items formed as follows: A  α•β is in the closure of A  α•β If the dot is in front of a nonterminal B for some item in the closure, then all of B  •γ1, B  •γ2,… B  •γn are in the closure (B  γ1, B  γ2,… B  γn are all productions for B) Fall 18 CSCI 4430, A Milanova

Example Compute closure of start  • expr Answer: start  • expr
expr  expr + term | term term  term * id | id Compute closure of start  • expr Answer: start  • expr expr  • expr + term expr  • term term  • term * id term  • id Fall 18 CSCI 4430, A Milanova

Question Compute closure of expr  expr + • term Answer:
start  expr expr  expr + term | term term  term * id | id Compute closure of expr  expr + • term Answer: expr  expr + • term term  • term * id term  • id Fall 18 CSCI 4430, A Milanova

Question Compute closure of start  • list start  list
list  prefix ; prefix  prefix , id | id Compute closure of start  • list Answer: start  • list list  • prefix ; prefix  • prefix , id prefix  • id Fall 18 CSCI 4430, A Milanova

Collection of Sets of LR Items with Transitions
start  expr expr  expr + term | term term  term * id | id 4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term 6 expr  expr+term• term  term•*id term id 2 Construct a collection of sets of LR items as follows: Start from State 0, the closure of starting production start -> •expr. Then transition on each possible symbol (terminal or nonterminal). For state 0 we can transition on expr, term, and id. Transition from 0 on expr, moves the dot beyond expr. It leads to a new State, State 1 with items: start  expr• expr  expr•+term The (intuitive) meaning is that when the parser is in State 1, one of the following is happening: 1) parser has seen the entire string of terminals and is ready to reduce expr into start and ACCEPT, or 2) parser has seen and reduced the expr part in expr + term and it is looking to see + term. Transition from 0 on term leads to State 2 with items expr  term• term  term•*id Transition from 0 on id leads to State 3. We continue to construct new states, until no more states can be added. 3 * 5 expr  term• term  term•*id * term  id• term  term*• id id 7 id term  term* id• Fall 18 CSCI 4430, A Milanova

Example start  list list  pre ; pre  pre , id | id Construct the collection of sets of LR items with transitions for the above grammar Fall 18 CSCI 4430, A Milanova

Lecture Outline Bottom-up (LR) parsing Handles (brief review) LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants Fall 18 CSCI 4430, A Milanova

Sets of LR Items with Transitions
4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term 6 expr  expr+term• term  term•*id term id 2 3 * 5 expr  term• term  term•*id * term  id• term  term*• id id 7 This collection of sets with transitions is in fact a DFA. This DFA is one part of the Characteristic Finite State Machine (CFSM) of the grammar. The states are the states of the parser. Transitions on terminal symbols represent shifts and transitions on nonterminal symbols represent gotos. id term  term * id• The collection of sets of items with transitions is a DFA. This DFA is one part of the CFSM (we’ll see the other part shortly). CFSM states are parsing states. Transitions on terminals represent shifts. Transitions on nonterminals represent gotos.

4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term 6 expr  expr+term• term  term•*id term id 2 3 * 5 expr  term• term  term•*id * term  id• term  term*• id id 7 In states 3 and 7, the parser recognizes the complete handle. In 3, this handle is term  id, in 7, the handle is term  term* id. id term  term* id• 3,7 contain only items of kind A  α•, i.e., reduce items 0,4,5 contain items of kind A  α• aβ , i.e., shift items 1,2,6 contains both reduce and shift items

Question Assume parser in state 2:
start  expr expr  expr + term | term term  term * id | id Assume parser in state 2: should it reduce by expr  term, or should it shift * continuing to look for *id ? Answer: It depends on the lookahead! If what comes next is a + or a $$, then reduce. If it is a *, then shift. 2 expr  term• term  term•*id This is because + and $$ can FOLLOW expr! But * cannot follow expr.

Question + id * * id id expr term term
4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term 6 expr  expr+term• term  term•*id term id 2 3 * 5 expr  term• term  term•*id * term  id• term  term*• id id 7 Answer: Parser is either in state 4 or in state 0. Note that a walk on the DFA corresponds to a stack configuration. When the parser is in a state that contains “reduce items” that means that it may have a handle on top of the stack. id term  term *id• After parser pops right-hand side term*id off the stack (as it reduces in state 7), what state(s) could end up on top of the stack?

4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term 6 expr  expr+term• term  term•*id term id 2 3 * 5 expr  term• term  term•*id * term  id• term  term*• id id 7 In states 3 and 7, the parser recognizes the complete handle. In 3, this handle is term  id, in 7, the handle is term  term* id. id term  term* id• 3,7 contain reduce items (e.g., A  α•). Reduce states. 0,4,5 contain shift items (A  α•aβ). Shift states. 1,2,6 contains both reduce and shift items. Shift-reduce states.

“Reduce by” Labels start  expr $$ expr  expr + term | term term  term * id | id For every state that contains a reduce item A  α•, add label “reduce by A  α on FOLLOW(A)” For example, we add label on state 2: reduce by expr  term on $$,+. 2 expr  term• term  term•*id Fall 18 CSCI 4430, A Milanova

Characteristic Finite State Machine (CFSM)
4 1 start  •expr expr  •expr+term expr  •term term  •term*id term •id expr + expr  expr+•term term  •term *id term  • id start  expr• expr  expr•+term term accept on $$ 6 expr  expr+term• term  term•*id term id 3 reduce by expr  expr+term on $$,+ term  id• * 2 5 * reduce by term  id on $$,+,* FOLLOW(expr) = { $$, + } FOLLOW(term) = { $$, +, * } expr  term• term  term•*id term  term*• id id 7 reduce by expr  term on $$,+ term  term* id• reduce by term  term*id on $$,+,* id

CFSM CFSM consists of 2 parts To construct the CFSM for a grammar G
Collection of sets of LR items with transitions “Reduce by” labels To construct the CFSM for a grammar G First, construct the collection of sets of LR items with transitions Second, add the “reduce by” labels Fall 18 CSCI 4430, A Milanova

From CFSM to SLR(1) Parsing Table
expr  expr + term 3. term  term * id expr  term 4. term  id White – action table Blue – goto table id + * $$ expr term shift 3 1 2 shift 4 accept reduce 2 shift 5 3 reduce 4 4 6 5 shift 7 reduce 1 7 reduce 3 state Reduce n means reduce by production n. For example, r3 means “reduce by production term -> term * id”

SLR(1) Parsing Table Input: An augmented grammar G’ (G with starting production start  …) Output: Functions action and goto for G’ Construct C = {I0,I1,…In} the collection of sets of LR items with transitions State i is constructed from Ii . The parsing actions for state i are a) If item A  α•aβ is in Ii and there is a transition from Ii to Ij on a, then set action[i,a] to “shift j” b) If item A  α• is in Ii then set action[i,a] to “reduce by A  α” for all terminals a in FOLLOW(A) c) If start  …• is in Ii then set action[i,$$] to “accept” The goto transition for state i are constructed for all nonterminals A using the rule: If there is transition from Ii to Ij on A, set goto[i,A]=j If the table contains no multiply-defined entries, the grammar is said to be SLR(1) Fall 18 CSCI 4430, A Milanova

Exercise Construct the CFSM for above grammar start  expr
First, construct collection of sets of LR items with transitions Second, add “reduce by” labels start  expr expr  expr + expr | id Fall 18 CSCI 4430, A Milanova

Lecture Outline Bottom-up (LR) parsing Handles (brief review) LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants We will now see how to construct the parsing table. How do we figure what the states are, when to shift, when to reduce, and where to goto? We begin with the discussion of LR items. Fall 18 CSCI 4430, A Milanova

Conflicts in SLR(1): Shift-reduce
Shift-reduce conflict in state k on a: State k contains item A  β• and a is in FOLLOW(A) and State k contains item A’  α•aβ’ The parser does not know whether it is at the end of production A  β and thus should reduce by A  β, or it is in the middle of production A’  αaβ’ and thus should shift a and continue looking for β’ Fall 18 CSCI 4430, A Milanova

Conflicts in SLR(1): Reduce-reduce
Reduce-reduce conflict in state k on a: State k contains item A  β• and a is in FOLLOW(A) and State k contains item A’  β’• and a is in FOLLOW(A’) The parser does not know whether it is at the end of production A  β and thus should reduce by A  β, or whether it is at the end of production A’  β’ and thus should reduce by A’  β’ Usually, a reduce-reduce conflict indicates a serious problem with the grammar Fall 18 CSCI 4430, A Milanova

Resolving Conflicts in SLR(1)
In some cases, it makes sense to use a non-SLR(1) grammar (e.g., an ambiguous grammar) Interestingly, we can still ensure desired behavior by choosing one of the conflicting actions E.g., we can resolve a shift-reduce conflict by deterministically choosing the shift or the reduce Fall 18 CSCI 4430, A Milanova

Exercise Construct the CFSM and SLR(1) parsing table for above grammar
This grammar is ambiguous and as expected we have shift-reduce conflict(s) Resolve the conflict(s) so that + is left-associative start  expr expr  expr + expr | id Fall 18 CSCI 4430, A Milanova

Question Recall the id-list grammars we saw earlier
“Top-down” grammar: “Bottom-up” grammar: We saw that the top-down grammar is LL(1), but the bottom-up one is not LL(1) How about SLR(1)? Both grammars are SLR(1). However, the top-down one is not ideal for LR parsing. Why? list  id list_tail list_tail  , id list_tail | ; list  list_prefix ; list_prefix  list_prefix , id list_prefix  id

LR Parsing Variants LR(0), SLR(1), LALR(1), LR(1), … LR(k)
LR(0), SLR(1), LALR(1) are most practical Use the same set of parsing states Differ in the handling of ”shift-reduce” states (states with both a shift item and a reduce item) LR(0) uses 0 tokens of lookahead Disallows shift-reduce states Our running example is not LR(0) E.g., state 2: 2 expr term• term term•*id

LR Parsing Variants SLR(1) – uses 1 token of lookahead
Resolves (some) shift-reduce states by peeking at one token ahead Adds labels “reduce by A  β on FOLLOW(A)” to states containing items A  β•. The FOLLOW sets serve as filters If after filtering by FOLLOW, there are no shift-reduce and no reduce-reduce conflicts, then grammar is SLR(1) Is our running example SLR(1)? Yes. Filtering by FOLLOW resolves the shift-reduce issue in state 2: 2 expr  term• term  term•*id Fall 18 CSCI 4430, A Milanova reduce by expr  term on $$,+

Parsing Variants LALR(1) LR(1) Uses the same set of states as SLR(1)
Constructs local, context-sensitive FOLLOW sets and is able to avoid more conflicts An efficiency hack Most common parsers in practice LR(1) Uses a different set of states More states, in order to keep paths disjoint Fall 18 CSCI 4430, A Milanova

Hierarchy of Grammar Classes
LL(0) < LL(1) < LL(k), where k>1 LR(0) < SLR(1) < LALR(1) < LR(1) < LR(k) Also, LL(k) < LR(k) Question: SLR(1) parsers are more powerful than LL(1) ones? Why, what is the intuition? Answer: LL(1) predicts production before it has seen the string derived from this production. SLR applies production (reduce by), after it has seen the entire string!

Group Exercise Consider the grammar: start expr
expr  expr + expr | expr * expr | id Construct the CFSM for this grammar First, construct the sets of LR items with transitions Second, add “reduce by” labels Resolve the conflicts in such a way that the operators will behave “normally”: + and * are left-associative * has higher precedence than + Fall 18 CSCI 4430, A Milanova

Next class We’ve concluded with parsing and programming language syntax! Next class: logic programming and Prolog. Read Chapter 12. Fall 18 CSCI 4430, A Milanova

Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.

Similar presentations

Presentation on theme: "Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova.

Similar presentations

Presentation on theme: "Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova."— Presentation transcript:

Similar presentations

About project

Feedback