Download presentation
Presentation is loading. Please wait.
Published byPosy Powell Modified over 9 years ago
1
Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011
2
Outline Overview Shift-Reduce Parsers LR(0) Table Construction Conflict Diagnosis Conflict Resolution and Table Construction
3
Overview Problems in top-town parsers –Left-recursion –Common prefixes –(Fig. 5.12 vs. Fig. 5.16) Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically
4
ACTOR
6
A bottom-up parser begins with parse tree’s leaves, and moves toward its root A bottom-up parser traces a rightmost derivation in reverse A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS (Fig. 4.5 & Fig. 4.6)
9
Bottom-up: from terminal symbols to the goal symbol Shift-reduce: two most prevalent actions – Shift symbols onto the parse stack – Reduce a string to nonterminals LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead LR parsers are more general than LL parsers –Yacc: LR parser generator
10
Shift-Reduce Parsers LR parsers and rightmost derivations –LR parses construct rightmost derivations in reverse –Fig. 6.2 LR parsing as knitting –How the RHS of a production is found –Fig. 6.1
13
In Fig. 6.1: –Right needle: unprocessed portion of the string –Left needle: parser’s stack (processed portion) Operations – Shift : transfers a symbol from right needle to left needle – Reduction : symbols at the top of the parse stack (left needle) A –(Fig. 6.1)
14
LR Parsing Engine A simple driver for shift-reduce parser –Fig. 6.3 –Driven by a table (Sec. 6.2.4) –Indexed by the parser’s current state and the next input symbol Current state: parser stack –Shift and reduce actions are performed until Accepted: input is reduced to the goal symbol Error: no valid actions found
15
PUSH PEEK POP ADVANCE PREPEND ERROR
16
LR Parse Table Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction –How to identify the handle –Which production to employ – (Fig. 6.4 & Fig. 6.5)
19
In Fig. 6.5: –[s]: Shift to state s –r: reduction by rule r –Blank: error actions A bottom-up parse of “a b b d c $” –Fig. 6.6 & Fig. 6.7 –A rightmost derivation in reverse –Shift actions are implied by inability to perform a useful reduction Tokens are shifted until a handle appears
22
LR(k) Parsing Concept of LR parsing introduced by Knuth in 1965 LR(k) –LR(0): number of symbols lookahead used in constructing the parse table –LR(0) and LR(1): one symbol lookahead at parse time –Number of columns in parse table: n k
23
Properties of LR(k) parsers –Shifting symbols and examining lookahead until the end of handle is found –Handle is reduced to a nonterminal –Determine whether to shift or reduce, based on the symbols already shifted (left context) and the next k lookahead symbols (right context) A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language –Deterministic: each cell in LR parse table contains only one entry
24
Formal Definition of LR(k) Grammars A grammar is LR(k) iff. the following conditions imply Ay= Bx –S=>* rm Aw => rm w –S=>* rm Bx => rm y –First k (w)=First k (y) LR(k) parsers can always determine the correct reduction (A ) given –The left context ( ) up to the end of the handle –The next k symbols (First k (w)) of the input
25
LR(0) Table Construction (Fig. 6.2) –E plus E E LR(0) item –A grammar production with a bookmark that indicates the current progress through the production’s RHS Fresh: E . plus E E Reducible: E plus E E. –(Fig. 6.8)
27
Parser state: a set of LR(0) items LR(0) construction algorithm –Fig. 6.9 & Fig. 6.10 –ComputeGoto Closure of state s Transitions from s –E.g.: Fig. 6.11 Kernel of state s A DFA called CFSM (characteristic finite-state machine)
28
OMPUTE DD TATE DVANCE OT DD TATE RODUCTIONS OR XTRACT LEMENT OMPUTE OTO
29
DVANCE OT RODUCTIONS OR OMPUTE OTO LOSURE DD TATE
31
CFSM recognizes its grammar’s viable prefixes –Viable prefix: any prefix that does not extend beyond its handle –Accept state in CFSM: a viable prefix that ends with a handle Reduction (Fig. 6.12)
33
For LR(0) grammar, the following properties –Given a syntactically correct input string, CFSM will block only in double-boxed states –There’s at most one item in any double-boxed state –If the input string is syntactically invalid, parser will enter a state that the offending symbol cannot be shifted To complete that parse table –(Fig. 6.13 & 6.14) –E.g.: (Fig. 6.15)
34
OMPLETE ABLE OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY EPORT ONFLICT
35
RY ULE N TATE SSERT NTRY OMPUTE OOKAHEAD RY ULE N TATE
36
Conflict Diagnosis A parse table conflict arises when the table- construction method cannot decide between multiple alternatives for some table entry –Shift/reduce conflicts –Reduce/reduce conflicts Reasons for conflicts –Grammar is ambiguous –Grammar is no ambiguous, but current table-building approach cannot resolve the conflict Given more lookahead Use a more powerful method
37
Ambiguous Grammars
38
Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts –Determine a sequence of vocabulary symbols that cause the parse to move from the start state to the inadequate state E plus E –We obtain a snapshot E plus E. plus E (Fig. 6.17)
40
Top parse tree –Reduction –Left-associative grouping for addition Bottom parse tree –Shift –Right-associative grouping for addition -> we eliminate the ambiguity by creating a grammar that favors left-association –(Fig. 6.18)
42
Grammars that are not LR(k)
43
Reduce/reduce conflict –Start=>rm Exprs $ =>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $
44
Conflict Resolution and Table Construction Increasingly sophisticated lookahead techniques to resolve conflicts –SLR(k): simple –LALR(k) –LR(k): the most powerful
45
SLR(k) Table Construction SLR(k): Simple LR with k tokens of lookahead –A grammar that is not LR(0): Fig. 6.20 –Input string: num plus num times num $
47
Replacing a terminal by a nonterminal whose role in the grammar in equivalent –(Fig. 6.21) LR(0) construction: (Fig. 6.22) –Shift/reduce conflict of state 6 Shift: (can continue as in Fig.6.21) Reduce: block in state 3 –E time num $ is not a valid sentential form –E -> E plus T is appropriate under some conditions
50
For sentential forms –E plus T $ –E plus T plus num $ If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form –plus Follow(E) –TryRuleInState(): (Fig.6.23) –SLR(1) parse table: (Fig. 6.24)
51
RY ULE N TATE SSERT NTRY
53
LALR(k) Table Construction Sometimes SLR(k) construction fails only because the Follow k information is not rule specific –(Fig.6.25) –Grammar is not ambiguous –State 3 has shift/reduce conflict –Follow k (A) = {b$ k-1, c$ k-1, $ k } Insufficient to resolve the conflict (Fig. 6.26)
56
LALR(k): Lookahead Ahead LR with k tokens of lookahead –Same number of rows (states) as LR(0) table –The most popular LR table-building Balance of power and efficiency –Redefine two methods TryRuleInState: ItemFollow set (Fig. 6.27) ComputeLookahead: lookahead propagation graph (Fig. 6.28)
57
RY ULE N TATE SSERT NTRY
59
LALR Propagation Graph Each LR(0) item occurs at most once in any state –The pair (s, A-> . ): a vertex in the graph –Edge between items i and j Symbols that follow the reducible form of item i should be included in the corresponding set of symbols for item j
60
For item A-> .B , any symbol in First( ) can follow each closure item B->. Propagation edges –An edge is placed from an item A-> .B in state s to item A-> B. in state t –When =>*λ, any symbol that can follow A can also follow B Example: –Building propagation graph (Fig. 6.29) –Evaluating propagation graph (Fig. 6.30)
63
In general, multiple passes can be required for convergence –(Fig. 6.31) –(Fig. 6.32) –(Fig. 6.33) In practice, LALR(1) lookahead computations converge quickly, usually in one or two passes LALR(1) is a powerful parsing method LALR(1) grammars are available for all popular programming languages
67
LR(k) Table Construction LR(k) parsing: not very practical because –LR(1) tables are typically much larger than LR(0) tables (for SLR(k) and LALR(k)) –It’s rare that LR(1) can handle a grammar for which LALR(1) fails Ex. (Fig. 6.35) When LALR(1) fails –Grammar is ambiguous: LR(k) cannot help –More lookahead needed: LR(k) can help, but LALR(k) might suffice –No amount of lookahead suffices: LR(k) cannot help
71
E.g. item 14 –ItemFollow(14)={rb, rp} –ItemFollow(15)={rb, rp} –Reduce/reduce conflict
72
A state in LR(k) is uniquely identified not only by its kernel, but also its lookahead –For LR(k), we extend an item’s notation from A-> . to [A-> . , w] For LR(1), w is a symbol that can follow A after reduction For LR(k), w is a k-length string that can follow A after reduction –The number of states in LR(k) is usually much larger We can also begin with LALR(1), and split states selectively
75
Thanks for Your Attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.