Download presentation
Presentation is loading. Please wait.
Published by祥刑 赵 Modified over 5 years ago
1
Announcements HW2 due on Tuesday Fall 18 CSCI 4430, A Milanova
2
Last Class Top-down (LL) parsing Bottom-up (LR) parsing
LL(1) parsing tables, FIRST, FOLLOW and PREDICT sets Writing an LL(1) grammar Bottom-up (LR) parsing Intro with example Fall 18 CSCI 4430, A Milanova
3
Today’s Lecture Outline
Bottom-up (LR) parsing Handles LR Items Characteristic Finite State Machine (CFSM) SLR(1) parsing tables Conflicts in SLR(1) LR parsing variants Fall 18 CSCI 4430, A Milanova
4
Programming Language Syntax Bottom-up Parsing
Read: Scott, Chapter 2.3.3
5
Bottom-up Parsing Also called LR parsing
LR parsers work with LR(k) grammars L stands for “left-to-right” scan of input R stands for “rightmost” derivation k stands for “need k tokens of lookahead” We are interested in LR(0) and LR(1) and variants in between LR parsing is better than LL parsing! Accepts larger class of languages Just as efficient! Fall 18 CSCI 4430, A Milanova
6
Model of the LR Parser Stack Input
Stack: holds the part of the input seen so far A string of both terminals and nonterminals Input: holds the remaining part of the input A string of terminals Parser performs two actions Reduce: parser pops a “suitable” production right-hand-side off top of stack, and pushes production’s left-hand-side on the stack Shift: parser pushes next terminal from the input on top of the stack Fall 18 CSCI 4430, A Milanova
7
id + id*id Stack Input Action id+id*id shift id
expr expr + term | term term term * id | id Stack Input Action id+id*id shift id id id*id reduce by term id term id*id reduce by expr term expr id*id shift + expr id*id shift id expr+id *id reduce by term id Fall 18 CSCI 4430, A Milanova
8
id + id*id Stack Input Action expr+term *id shift *
expr expr + term | term term term * id | id Stack Input Action expr+term *id shift * expr+term* id shift id expr+term*id reduce by termterm*id expr+term reduce by exprexpr+term expr accept, SUCCESS Fall 18 CSCI 4430, A Milanova
9
id + id*id Sequence of reductions performed by parser id+id*id
expr expr + term | term term term * id | id Sequence of reductions performed by parser id+id*id term+id*id expr+id*id expr+term*id expr+term expr A rightmost derivation in reverse The stack (e.g., expr) concatenated with remaining input (e.g., +id*id) gives a sentential form (expr+id*id) in the rightmost derivation. Right sentential forms.
10
Handle A handle Recall our example id+id*id
Notation: A,S are nonterminals. α,β are arbitrary sequences of terminals and nonterminals. w is a string of terminals. A handle Consider a rightmost derivation S … αAw αβw. We say that A β at position α is a handle of αβw Recall our example id+id*id Stack Input expr+term *id Is expr expr+term at position ε a handle of expr+term*id? expr+term*id Is term id at position expr+term* a handle of expr+term*id? expr expr + term | term term term * id | id How does the parser know that it has a “handle” on top of the stack? Let us begin with the definition of the Handle. The parser starts with the string of terminals, and continuously “reduces” this string until it either ends up at the starting nonterminal (which constitutes a successful parse), or it terminates with an error along the way. The handle is a production A β which reduces the sentential form αβw back into another sentential form, αAw, where αAw is a valid sentential form in some rightmost derivation (i.e., there is a rightmost derivation which starts at S and derives αAw). If a production A β reduced the sentential form αβw into a form which was not derivable in a right-most derivation, then A β is not a handle of αβw at position α. Fall 18 CSCI 4430, A Milanova
11
Question Consider id*id*id Stack Input
expr expr + term | term term term * id | id Consider id*id*id Stack Input term *id*id Is expr term at position ε a handle of term*id*id? Answer: No! It brings sentential form term*id*id into expr*id*id which is NOT derivable from expr! No! Fall 18 CSCI 4430, A Milanova
12
Question How about Stack Input
expr expr + term | term term term * id | id How about Stack Input term*id *id Is term term*id at position ε a handle of term*id*id? Answer: Yes! It brings sentential form term*id*id into term*id which is clearly derivable: expr term term*id Fall 18 CSCI 4430, A Milanova
13
id + id*id Stack Input Action
expr expr + term | term term term * id | id Stack Input Action id+id*id On state 0 and id, action[0,id] = shift 3 0id id*id On 3 and +, action[3,+] = reduce by term id Pop 3 and id, push term. 0term On 0 and term, goto[0,term] = 2 0term 2 +id*id On 2 and +, etcetera… action[2,+] = … How does the parser work? The states keep track of the configuration and help the parser recognize when it has a handle on top of the stack (and thus must reduce, as it was for example on state 3 and lookahead +) and when it does not have a handle and must continue to look for it.
14
Model of the LR parser … $$ LR Parser … a1 ai an sm Xm sm-1 Xm-1
Input: a1 … ai an $$ LR Parser Stack: sm Xm sm-1 Xm-1 … State Grammar Symbol In fact, the stack does not consists of grammar symbols only (as I had it in the simplified parsing examples), but it consists of symbols interspersed with integers which denote the parsing states. The states keep track of the configuration (stack+input) and help the parser recognize when it has a handle on top of the stack (and needs to pop and reduce), and when it does not have a handle (i.e., it needs to keep shifting). Parsing table: action goto goto[s,A]: After reduction to nonterminal A, what state is pushed on top of the stack? action[s,a]: Do we shift or reduce? Fall 18 CSCI 4430, A Milanova
15
Lecture Outline Bottom-up (LR) parsing Handles LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants We will now see how to construct the parsing table. How do we figure what the states are, when to shift, when to reduce, and where to goto? We begin with the discussion of LR items. Fall 18 CSCI 4430, A Milanova
16
LR Items start expr expr expr + term | term term term * id | id An LR item is a production with a dot at some position on the right-hand side E.g., A α•β We are trying to find an A We already have seen α (it is on top of the stack) We are looking for β First, we augment the original expression grammar with production start expr. The LR Items help us figure out when we have a handle at the top of the stack. State 0 represents the beginning of the parse. In the beginning of the parse, we have an empty stack and we are at the beginning of the production for starting symbol start. We represent our location with LR Item start •expr. This means, we have seen nothing yet, and we are looking to see expr. Note however, that since the • is right in front of nonterminal expr, we are about to see either expr+term or term. I.e., we must first see one of these right-hand sides and reduce them into expr. This is handled by taking the closure of LR item start •expr. state 0: start •expr expr •expr+term expr •term term •term*id term •id state 1: start expr• expr expr•+term Transition on expr Group related LR items into sets. Sets correspond to parsing states, state 0, 1, etc. Fall 18 CSCI 4430, A Milanova
17
Closure of an LR Item The closure of an LR item A α•β is the set of LR items formed as follows: A α•β is in the closure of A α•β If the dot is in front of a nonterminal B for some item in the closure, then all of B •γ1, B •γ2,… B •γn are in the closure (B γ1, B γ2,… B γn are all productions for B) Fall 18 CSCI 4430, A Milanova
18
Example Compute closure of start • expr Answer: start • expr
expr expr + term | term term term * id | id Compute closure of start • expr Answer: start • expr expr • expr + term expr • term term • term * id term • id Fall 18 CSCI 4430, A Milanova
19
Question Compute closure of expr expr + • term Answer:
start expr expr expr + term | term term term * id | id Compute closure of expr expr + • term Answer: expr expr + • term term • term * id term • id Fall 18 CSCI 4430, A Milanova
20
Question Compute closure of start • list start list
list prefix ; prefix prefix , id | id Compute closure of start • list Answer: start • list list • prefix ; prefix • prefix , id prefix • id Fall 18 CSCI 4430, A Milanova
21
Collection of Sets of LR Items with Transitions
start expr expr expr + term | term term term * id | id 4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term 6 expr expr+term• term term•*id term id 2 Construct a collection of sets of LR items as follows: Start from State 0, the closure of starting production start -> •expr. Then transition on each possible symbol (terminal or nonterminal). For state 0 we can transition on expr, term, and id. Transition from 0 on expr, moves the dot beyond expr. It leads to a new State, State 1 with items: start expr• expr expr•+term The (intuitive) meaning is that when the parser is in State 1, one of the following is happening: 1) parser has seen the entire string of terminals and is ready to reduce expr into start and ACCEPT, or 2) parser has seen and reduced the expr part in expr + term and it is looking to see + term. Transition from 0 on term leads to State 2 with items expr term• term term•*id Transition from 0 on id leads to State 3. We continue to construct new states, until no more states can be added. 3 * 5 expr term• term term•*id * term id• term term*• id id 7 id term term* id• Fall 18 CSCI 4430, A Milanova
22
Example start list list pre ; pre pre , id | id Construct the collection of sets of LR items with transitions for the above grammar Fall 18 CSCI 4430, A Milanova
23
Lecture Outline Bottom-up (LR) parsing Handles (brief review) LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants Fall 18 CSCI 4430, A Milanova
24
Sets of LR Items with Transitions
4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term 6 expr expr+term• term term•*id term id 2 3 * 5 expr term• term term•*id * term id• term term*• id id 7 This collection of sets with transitions is in fact a DFA. This DFA is one part of the Characteristic Finite State Machine (CFSM) of the grammar. The states are the states of the parser. Transitions on terminal symbols represent shifts and transitions on nonterminal symbols represent gotos. id term term * id• The collection of sets of items with transitions is a DFA. This DFA is one part of the CFSM (we’ll see the other part shortly). CFSM states are parsing states. Transitions on terminals represent shifts. Transitions on nonterminals represent gotos.
25
Sets of LR Items with Transitions
4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term 6 expr expr+term• term term•*id term id 2 3 * 5 expr term• term term•*id * term id• term term*• id id 7 In states 3 and 7, the parser recognizes the complete handle. In 3, this handle is term id, in 7, the handle is term term* id. id term term* id• 3,7 contain only items of kind A α•, i.e., reduce items 0,4,5 contain items of kind A α• aβ , i.e., shift items 1,2,6 contains both reduce and shift items
26
Question Assume parser in state 2:
start expr expr expr + term | term term term * id | id Assume parser in state 2: should it reduce by expr term, or should it shift * continuing to look for *id ? Answer: It depends on the lookahead! If what comes next is a + or a $$, then reduce. If it is a *, then shift. 2 expr term• term term•*id This is because + and $$ can FOLLOW expr! But * cannot follow expr.
27
Question + id * * id id expr term term
4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term 6 expr expr+term• term term•*id term id 2 3 * 5 expr term• term term•*id * term id• term term*• id id 7 Answer: Parser is either in state 4 or in state 0. Note that a walk on the DFA corresponds to a stack configuration. When the parser is in a state that contains “reduce items” that means that it may have a handle on top of the stack. id term term *id• After parser pops right-hand side term*id off the stack (as it reduces in state 7), what state(s) could end up on top of the stack?
28
Sets of LR Items with Transitions
4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term 6 expr expr+term• term term•*id term id 2 3 * 5 expr term• term term•*id * term id• term term*• id id 7 In states 3 and 7, the parser recognizes the complete handle. In 3, this handle is term id, in 7, the handle is term term* id. id term term* id• 3,7 contain reduce items (e.g., A α•). Reduce states. 0,4,5 contain shift items (A α•aβ). Shift states. 1,2,6 contains both reduce and shift items. Shift-reduce states.
29
“Reduce by” Labels start expr $$ expr expr + term | term term term * id | id For every state that contains a reduce item A α•, add label “reduce by A α on FOLLOW(A)” For example, we add label on state 2: reduce by expr term on $$,+. 2 expr term• term term•*id Fall 18 CSCI 4430, A Milanova
30
Characteristic Finite State Machine (CFSM)
4 1 start •expr expr •expr+term expr •term term •term*id term •id expr + expr expr+•term term •term *id term • id start expr• expr expr•+term term accept on $$ 6 expr expr+term• term term•*id term id 3 reduce by expr expr+term on $$,+ term id• * 2 5 * reduce by term id on $$,+,* FOLLOW(expr) = { $$, + } FOLLOW(term) = { $$, +, * } expr term• term term•*id term term*• id id 7 reduce by expr term on $$,+ term term* id• reduce by term term*id on $$,+,* id
31
CFSM CFSM consists of 2 parts To construct the CFSM for a grammar G
Collection of sets of LR items with transitions “Reduce by” labels To construct the CFSM for a grammar G First, construct the collection of sets of LR items with transitions Second, add the “reduce by” labels Fall 18 CSCI 4430, A Milanova
32
From CFSM to SLR(1) Parsing Table
expr expr + term 3. term term * id expr term 4. term id White – action table Blue – goto table id + * $$ expr term shift 3 1 2 shift 4 accept reduce 2 shift 5 3 reduce 4 4 6 5 shift 7 reduce 1 7 reduce 3 state Reduce n means reduce by production n. For example, r3 means “reduce by production term -> term * id”
33
SLR(1) Parsing Table Input: An augmented grammar G’ (G with starting production start …) Output: Functions action and goto for G’ Construct C = {I0,I1,…In} the collection of sets of LR items with transitions State i is constructed from Ii . The parsing actions for state i are a) If item A α•aβ is in Ii and there is a transition from Ii to Ij on a, then set action[i,a] to “shift j” b) If item A α• is in Ii then set action[i,a] to “reduce by A α” for all terminals a in FOLLOW(A) c) If start …• is in Ii then set action[i,$$] to “accept” The goto transition for state i are constructed for all nonterminals A using the rule: If there is transition from Ii to Ij on A, set goto[i,A]=j If the table contains no multiply-defined entries, the grammar is said to be SLR(1) Fall 18 CSCI 4430, A Milanova
34
Exercise Construct the CFSM for above grammar start expr
First, construct collection of sets of LR items with transitions Second, add “reduce by” labels start expr expr expr + expr | id Fall 18 CSCI 4430, A Milanova
35
Lecture Outline Bottom-up (LR) parsing Handles (brief review) LR Items
Characteristic Finite State Machine (CFSM) SLR(1) parsing table Conflicts in SLR(1) LR Parsing variants We will now see how to construct the parsing table. How do we figure what the states are, when to shift, when to reduce, and where to goto? We begin with the discussion of LR items. Fall 18 CSCI 4430, A Milanova
36
Conflicts in SLR(1): Shift-reduce
Shift-reduce conflict in state k on a: State k contains item A β• and a is in FOLLOW(A) and State k contains item A’ α•aβ’ The parser does not know whether it is at the end of production A β and thus should reduce by A β, or it is in the middle of production A’ αaβ’ and thus should shift a and continue looking for β’ Fall 18 CSCI 4430, A Milanova
37
Conflicts in SLR(1): Reduce-reduce
Reduce-reduce conflict in state k on a: State k contains item A β• and a is in FOLLOW(A) and State k contains item A’ β’• and a is in FOLLOW(A’) The parser does not know whether it is at the end of production A β and thus should reduce by A β, or whether it is at the end of production A’ β’ and thus should reduce by A’ β’ Usually, a reduce-reduce conflict indicates a serious problem with the grammar Fall 18 CSCI 4430, A Milanova
38
Resolving Conflicts in SLR(1)
In some cases, it makes sense to use a non-SLR(1) grammar (e.g., an ambiguous grammar) Interestingly, we can still ensure desired behavior by choosing one of the conflicting actions E.g., we can resolve a shift-reduce conflict by deterministically choosing the shift or the reduce Fall 18 CSCI 4430, A Milanova
39
Exercise Construct the CFSM and SLR(1) parsing table for above grammar
This grammar is ambiguous and as expected we have shift-reduce conflict(s) Resolve the conflict(s) so that + is left-associative start expr expr expr + expr | id Fall 18 CSCI 4430, A Milanova
40
Question Recall the id-list grammars we saw earlier
“Top-down” grammar: “Bottom-up” grammar: We saw that the top-down grammar is LL(1), but the bottom-up one is not LL(1) How about SLR(1)? Both grammars are SLR(1). However, the top-down one is not ideal for LR parsing. Why? list id list_tail list_tail , id list_tail | ; list list_prefix ; list_prefix list_prefix , id list_prefix id
41
LR Parsing Variants LR(0), SLR(1), LALR(1), LR(1), … LR(k)
LR(0), SLR(1), LALR(1) are most practical Use the same set of parsing states Differ in the handling of ”shift-reduce” states (states with both a shift item and a reduce item) LR(0) uses 0 tokens of lookahead Disallows shift-reduce states Our running example is not LR(0) E.g., state 2: 2 expr term• term term•*id
42
LR Parsing Variants SLR(1) – uses 1 token of lookahead
Resolves (some) shift-reduce states by peeking at one token ahead Adds labels “reduce by A β on FOLLOW(A)” to states containing items A β•. The FOLLOW sets serve as filters If after filtering by FOLLOW, there are no shift-reduce and no reduce-reduce conflicts, then grammar is SLR(1) Is our running example SLR(1)? Yes. Filtering by FOLLOW resolves the shift-reduce issue in state 2: 2 expr term• term term•*id Fall 18 CSCI 4430, A Milanova reduce by expr term on $$,+
43
Parsing Variants LALR(1) LR(1) Uses the same set of states as SLR(1)
Constructs local, context-sensitive FOLLOW sets and is able to avoid more conflicts An efficiency hack Most common parsers in practice LR(1) Uses a different set of states More states, in order to keep paths disjoint Fall 18 CSCI 4430, A Milanova
44
Hierarchy of Grammar Classes
LL(0) < LL(1) < LL(k), where k>1 LR(0) < SLR(1) < LALR(1) < LR(1) < LR(k) Also, LL(k) < LR(k) Question: SLR(1) parsers are more powerful than LL(1) ones? Why, what is the intuition? Answer: LL(1) predicts production before it has seen the string derived from this production. SLR applies production (reduce by), after it has seen the entire string!
45
Group Exercise Consider the grammar: start expr
expr expr + expr | expr * expr | id Construct the CFSM for this grammar First, construct the sets of LR items with transitions Second, add “reduce by” labels Resolve the conflicts in such a way that the operators will behave “normally”: + and * are left-associative * has higher precedence than + Fall 18 CSCI 4430, A Milanova
46
Next class We’ve concluded with parsing and programming language syntax! Next class: logic programming and Prolog. Read Chapter 12. Fall 18 CSCI 4430, A Milanova
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.