Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011

Outline Overview Shift-Reduce Parsers LR(0) Table Construction Conflict Diagnosis Conflict Resolution and Table Construction

Overview Problems in top-town parsers –Left-recursion –Common prefixes –(Fig. 5.12 vs. Fig. 5.16) Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically

A bottom-up parser begins with parse tree’s leaves, and moves toward its root A bottom-up parser traces a rightmost derivation in reverse A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS (Fig. 4.5 & Fig. 4.6)

Bottom-up: from terminal symbols to the goal symbol Shift-reduce: two most prevalent actions – Shift symbols onto the parse stack – Reduce a string to nonterminals LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead LR parsers are more general than LL parsers –Yacc: LR parser generator

Shift-Reduce Parsers LR parsers and rightmost derivations –LR parses construct rightmost derivations in reverse –Fig. 6.2 LR parsing as knitting –How the RHS of a production is found –Fig. 6.1

In Fig. 6.1: –Right needle: unprocessed portion of the string –Left needle: parser’s stack (processed portion) Operations – Shift : transfers a symbol from right needle to left needle – Reduction : symbols at the top of the parse stack (left needle) A   –(Fig. 6.1)

LR Parsing Engine A simple driver for shift-reduce parser –Fig. 6.3 –Driven by a table (Sec. 6.2.4) –Indexed by the parser’s current state and the next input symbol Current state: parser stack –Shift and reduce actions are performed until Accepted: input is reduced to the goal symbol Error: no valid actions found

PUSH PEEK POP ADVANCE PREPEND ERROR

LR Parse Table Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction –How to identify the handle –Which production to employ – (Fig. 6.4 & Fig. 6.5)

In Fig. 6.5: –[s]: Shift to state s –r: reduction by rule r –Blank: error actions A bottom-up parse of “a b b d c $” –Fig. 6.6 & Fig. 6.7 –A rightmost derivation in reverse –Shift actions are implied by inability to perform a useful reduction Tokens are shifted until a handle appears

LR(k) Parsing Concept of LR parsing introduced by Knuth in 1965 LR(k) –LR(0): number of symbols lookahead used in constructing the parse table –LR(0) and LR(1): one symbol lookahead at parse time –Number of columns in parse table: n k

Properties of LR(k) parsers –Shifting symbols and examining lookahead until the end of handle is found –Handle is reduced to a nonterminal –Determine whether to shift or reduce, based on the symbols already shifted (left context) and the next k lookahead symbols (right context) A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language –Deterministic: each cell in LR parse table contains only one entry

Formal Definition of LR(k) Grammars A grammar is LR(k) iff. the following conditions imply  Ay=  Bx –S=>* rm  Aw => rm  w –S=>* rm  Bx => rm  y –First k (w)=First k (y) LR(k) parsers can always determine the correct reduction (A   ) given –The left context (  ) up to the end of the handle –The next k symbols (First k (w)) of the input

LR(0) Table Construction (Fig. 6.2) –E  plus E E LR(0) item –A grammar production with a bookmark that indicates the current progress through the production’s RHS Fresh: E . plus E E Reducible: E  plus E E. –(Fig. 6.8)

Parser state: a set of LR(0) items LR(0) construction algorithm –Fig. 6.9 & Fig. 6.10 –ComputeGoto Closure of state s Transitions from s –E.g.: Fig. 6.11 Kernel of state s A DFA called CFSM (characteristic finite-state machine)

OMPUTE DD TATE DVANCE OT DD TATE RODUCTIONS OR XTRACT LEMENT OMPUTE OTO

DVANCE OT RODUCTIONS OR OMPUTE OTO LOSURE DD TATE

CFSM recognizes its grammar’s viable prefixes –Viable prefix: any prefix that does not extend beyond its handle –Accept state in CFSM: a viable prefix that ends with a handle Reduction (Fig. 6.12)

For LR(0) grammar, the following properties –Given a syntactically correct input string, CFSM will block only in double-boxed states –There’s at most one item in any double-boxed state –If the input string is syntactically invalid, parser will enter a state that the offending symbol cannot be shifted To complete that parse table –(Fig. 6.13 & 6.14) –E.g.: (Fig. 6.15)

OMPLETE ABLE OMPUTE OOKAHEAD RY ULE N TATE SSERT NTRY EPORT ONFLICT

RY ULE N TATE SSERT NTRY OMPUTE OOKAHEAD RY ULE N TATE

Conflict Diagnosis A parse table conflict arises when the table- construction method cannot decide between multiple alternatives for some table entry –Shift/reduce conflicts –Reduce/reduce conflicts Reasons for conflicts –Grammar is ambiguous –Grammar is no ambiguous, but current table-building approach cannot resolve the conflict Given more lookahead Use a more powerful method

Ambiguous Grammars

Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts –Determine a sequence of vocabulary symbols that cause the parse to move from the start state to the inadequate state E plus E –We obtain a snapshot E plus E. plus E (Fig. 6.17)

Top parse tree –Reduction –Left-associative grouping for addition Bottom parse tree –Shift –Right-associative grouping for addition -> we eliminate the ambiguity by creating a grammar that favors left-association –(Fig. 6.18)

Grammars that are not LR(k)

Reduce/reduce conflict –Start=>rm Exprs $ =>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $

Conflict Resolution and Table Construction Increasingly sophisticated lookahead techniques to resolve conflicts –SLR(k): simple –LALR(k) –LR(k): the most powerful

SLR(k) Table Construction SLR(k): Simple LR with k tokens of lookahead –A grammar that is not LR(0): Fig. 6.20 –Input string: num plus num times num $

Replacing a terminal by a nonterminal whose role in the grammar in equivalent –(Fig. 6.21) LR(0) construction: (Fig. 6.22) –Shift/reduce conflict of state 6 Shift: (can continue as in Fig.6.21) Reduce: block in state 3 –E time num $ is not a valid sentential form –E -> E plus T is appropriate under some conditions

For sentential forms –E plus T $ –E plus T plus num $ If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form –plus  Follow(E) –TryRuleInState(): (Fig.6.23) –SLR(1) parse table: (Fig. 6.24)

RY ULE N TATE SSERT NTRY

LALR(k) Table Construction Sometimes SLR(k) construction fails only because the Follow k information is not rule specific –(Fig.6.25) –Grammar is not ambiguous –State 3 has shift/reduce conflict –Follow k (A) = {b$ k-1, c$ k-1, $ k } Insufficient to resolve the conflict (Fig. 6.26)

LALR(k): Lookahead Ahead LR with k tokens of lookahead –Same number of rows (states) as LR(0) table –The most popular LR table-building Balance of power and efficiency –Redefine two methods TryRuleInState: ItemFollow set (Fig. 6.27) ComputeLookahead: lookahead propagation graph (Fig. 6.28)

RY ULE N TATE SSERT NTRY

LALR Propagation Graph Each LR(0) item occurs at most once in any state –The pair (s, A-> .  ): a vertex in the graph –Edge between items i and j Symbols that follow the reducible form of item i should be included in the corresponding set of symbols for item j

For item A-> .B , any symbol in First(  ) can follow each closure item B->.  Propagation edges –An edge is placed from an item A-> .B  in state s to item A->  B.  in state t –When  =>*λ, any symbol that can follow A can also follow B Example: –Building propagation graph (Fig. 6.29) –Evaluating propagation graph (Fig. 6.30)

In general, multiple passes can be required for convergence –(Fig. 6.31) –(Fig. 6.32) –(Fig. 6.33) In practice, LALR(1) lookahead computations converge quickly, usually in one or two passes LALR(1) is a powerful parsing method LALR(1) grammars are available for all popular programming languages

LR(k) Table Construction LR(k) parsing: not very practical because –LR(1) tables are typically much larger than LR(0) tables (for SLR(k) and LALR(k)) –It’s rare that LR(1) can handle a grammar for which LALR(1) fails Ex. (Fig. 6.35) When LALR(1) fails –Grammar is ambiguous: LR(k) cannot help –More lookahead needed: LR(k) can help, but LALR(k) might suffice –No amount of lookahead suffices: LR(k) cannot help

E.g. item 14 –ItemFollow(14)={rb, rp} –ItemFollow(15)={rb, rp} –Reduce/reduce conflict

A state in LR(k) is uniquely identified not only by its kernel, but also its lookahead –For LR(k), we extend an item’s notation from A-> .  to [A-> . , w] For LR(1), w is a symbol that can follow A after reduction For LR(k), w is a k-length string that can follow A after reduction –The number of states in LR(k) is usually much larger We can also begin with LALR(1), and split states selectively

Thanks for Your Attention!

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Similar presentations

Presentation on theme: "Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Similar presentations

Presentation on theme: "Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011."— Presentation transcript:

Similar presentations

About project

Feedback