Kanat Bolazar February 16, 2010

Kanat Bolazar February 16, 2010
Compiler Design 9. Table-Driven Bottom-Up Parsing: LR(0), SLR, LR(1), LALR Kanat Bolazar February 16, 2010

Table Driven Parsers Both top-down and bottom-up parsers can be written that explicitly manage a stack while scanning the input to determine if it can be correctly generated from the grammar productions In top-down parsers, the stack will have non-terminals which can be expanded by replacing it with the right-hand-side of a production In bottom-up parsers, the stack will have sequences of terminals and non-terminals which can be reduced by replacing it with the non- terminal for which it is the rhs of a production Both techniques use a table to guide the parser in deciding what production to apply, given the top of the stack and the next input

Top-Down and Bottom-Up Parsers
Predictive parsers are top-down, non-backtracking Sometimes called LL(k) Scan the input from Left to right Generates a Leftmost derivation from the grammar k is the number of lookahead symbols to make parsing deterministic If a grammar is not in an LL(k) form, removing left recursion and doing left-factoring may produce one Not all context free languages can have an LL(k) grammar Shift-reduce parsers are bottom-up parsers, sometimes called LR(k) Scan the input from Left to Right Produce a Rightmost derivation from the grammar Not all context free languages have LR grammars

Bottom-Up (Shift-Reduce) Parsers
Also called Shift-Reduce Parser because it will either Reduce a sequence of symbols on the stack that are the rhs of a production by their non-terminal Shift an input symbol to the top of the stack Input: a b eot Stack: X Y Z eot Output Shift Reduce Parser Parsing Table: M

Shift Reduce Parser Actions
During the parse the stack has a sequence of terminal and non-terminal symbols representing the part of the input worked on so far, and the input has the remaining symbols Parser actions Reduce: If the stack has a sequence FE and there is a production N E, we can replace E by N to get FN on the stack. Shift: If there is no possible reduction, transfer the next input. symbol to the top of the stack. Error: Otherwise it is an error. If, after a reduce, we get the start symbol on the top of the stack and there is no more input, then we have succeeded.

Handles During the parse, the term handle refers to a sequence of symbols on the stack that Matches the rhs of a production Will be a step along the path of producing a correct parse tree Finding the handle, i.e. identifying when to reduce, is the central problem of bottom-up parsing Note that ambiguous grammars do not fit (as they didn’t for top down parsing, either) because there may not be a unique handle at one step E.g. dangling else problem

LR Parsing A specific way to implement a shift reduce parser is an LR parser. This parser represents the state of the stack by a single state symbol on the top of the stack It uses two parsing tables, action and goto For any parsing state and input symbol, the action table tells what action to take Sn, meaning shift and go to state n Rn, meaning reduce by rule n Accept Error For any parsing state and non-terminal symbol N, the goto table gives the next state when a reduce has been performed to the non- terminal symbol N

LR Parser A Shift Reduce parser that encodes the stack with a state on the top of the stack The TOS state and the next input symbol are used to look up the parser’s actions and goto function from the table Input: a b eot Stack: X Y Z eot S1 S2 S3 Output LR Parser Parsing Table: M

Types of LR Parsers LR parsers can work on more general grammars than LL parsers Has more history on the stack to make decisions than top-down LR parsers have different ways to generate the action and goto tables Types of parsers listed in order of increasing power (ability to handle grammars) and decreasing efficiency (size of the parsing tables becomes very large) LR(0) Standard/general LR, with 0 lookahead SLR(1) "Simple LR" (with 1 lookahead) LALR(1) "Lookahead LR", with 1 lookahead LR(1) Standard LR, with 1 lookahead

Types of LR Parsers: Comparisons
Here's a subjective (personal) comparison of grammars LALR class of grammars is the most useful and most complicated Grammar (lookahead) Name Power Table Size (+:small) Conceptual Complexity Utility / Popularity LR(0) - - too weak + - - - never used SLR(1) simple - weak = (was + popular before LALR) LALR(1) look-ahead = or - ~= SLR complicated + + balanced LR(1) 10x larger! too large

LR(0) Parsing Tables Although not used in practice, LR(0) table construction illustrates the key ideas Item or configuration is a production with a dot in the middle, e.g. there are three items from A XY: A  •XY X will be parsed next A  X•Y X parsed; Y will be parsed next A  XY• X and Y parsed, we can reduce to A The item represents how much of the production we have seen so far in the parsing process.

LR(0): Closure and Goto Operations
Closure is defined to construct a configurating set for each item. For the starting item, N W•Y N  W•Y is in the set If Y begins with a terminal, we are done If Y begins with a non-terminal N’, add all N’ productions with the dot at the start of the rhs, N’  •Z For each configurating set and grammar symbol, the goto operation gives another configurating set. If a set of items I contains N  W • x Y, where W and Y are sequences but x is a single grammar symbol, the goto(I,x) contains N  W x • Y To create the family of configurating sets for a grammar, add an initial production S’  S, and construct sets from S’  • S Use the sets for parser states – states that end with a dot will be reduce

LR(0) Example Consider the simple grammar, add an initial rule:
E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E closure ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E more ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift2

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1•

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• more?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 action(s3, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, ?) = ?

E  E - 1 | 1 rule1: E  E rule2: E  1 S  E start symbol added for LR(0) The states are: s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, on any token) = reduce by rule1

LR(0) Example: Table s1: S  •E , E  •E - 1 , E  •1
action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, on any token) = reduce by rule1 State Action Goto - 1 EOT E s1 s2 s3 s4 s5

action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, on any token) = reduce by rule1 State Action Goto - 1 EOT E s1 s2 s3 r2 s4 accept s5 r1

action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, on any token) = reduce by rule1 State Action Goto - 1 EOT E s1 err s2 s3 r2 s4 accept s5 r1

Limitations of LR(0) Since there is no look-ahead, the parser must know whether to shift or reduce based on the parsing stack so far A configurating set can have only (all) shift(s) or reduce and not both based on the input (eg. we can't shift for '-' and reduce for '1') Problematic examples Epsilon rules create shift/reduce conflict if there are other rules Items like these have shift/reduce conflicts: T  id• reduce? T  id• [ E ] shift? Items like these have reduce/reduce conflicts E  V• = E, V  id• , T  id• reduce V? T?

SLR(1) Parsing SLR(1), simple LR, uses the same configurating sets, table structures and parser operations. When assigning table actions, don’t assume that any completed item should be reduced Look ahead by using the Follow set of the item Reduce an item N  Y • only if the next input symbol is in the Follow set of N. The configurating sets may have shift and reduce in the same set, but the Follow sets are required to be disjoint This requires that there are no reduce/reduce conflicts in this state

SLR(1) Table: Reduce Depends on Token
s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, {-, EOT}) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E - 1• action(s5, {-, EOT}) = reduce by rule1 SLR(1) Action Goto State - 1 EOT E s1 s2 s3 r2 s4 accept s5 r1

LR(0) Table For Comparison
s1: S  •E , E  •E - 1 , E  •1 action(s1, '1') = shift goto(s1, E) = s3 s2: E  1• action(s2, on any token) = reduce by rule2 s3: S  E• , E  E• - 1 act(s3, EOT)=accept act(s3, '-')=s4 s4: E  E - •1 action(s4, '1') = shift5 s5: E  E -1• action(s5, on any token) = reduce by rule1 LR(0) Action Goto State - 1 EOT E s1 s2 s3 r2 s4 accept s5 r1

LR(1) Parsing Although SLR(1) is using 1 lookahead symbol, it is still not using all of the information that could be obtained in a parsing state by keeping track of what path led to that item Not every item in Follow(X) is possible in every rule of X In LR(1) parsing tables, we keep the lookahead in the parsing state and separate those states, so that they can have more detailed successor states: A -> B C • D E F , a/b/c A will eventually be reduced, if the following lookahead token after F is one of {a, b, c} if any other token is seen, some other action may be taken if there is no action, it's an error Leads to larger numbers of states (in thousands, instead of hundreds) for programming language parsers

LALR(1) parsing Compromises between the simplicity of SLR and the power of LR(1) by merging similar LR(1) states. Identify a core of configurating sets and merge states that differ only by lookahead This is not just SLR because LALR will have fewer reduce actions, but it may introduce reduce/reduce conflicts that LR(1) did not have Constructing LALR(1) parsing tables is not usually done by brute force to construct LR(1) and then merge sets As configurating sets are generated, a new configurating set is examined to see if it can be merged with an existing one

More on LR Parsing Almost all SR parsing done with automatically generating parser tables Look at the types of parsers in available parser generators Note types of parsers (but not types of trees) Bison (yacc) ANTLR JavaCC Coco/R Elkhound LR errors can be given by giving different error codes for different table entries

One More Type of SR Parsing
Operator precedence parsing Useful for expression grammars and other types of ambiguities Doesn’t use a table, just uses operator precedence rules to resolve conflicts Fits in with the various types of LR parsers In addition to the action table, the parsing algorithm can appeal to a precedence operator table

General Context Free Parsers
All of the table driven parsers work on grammars in particular forms and may not work for arbitrary CFGs, including ambiguous ones General Backtracking Parsers O(n3) CYK (Cocke, Younger, Kasami) algorithm Produces a forest of parse trees Earley’s algorithm Notable for carrying along partial parses (subtrees), the first of the Chart parsers General Parallel Parser, can be O(n3) GLR – copies the relevant parts of the LR stack and parses in parallel whenever there is a conflict – otherwise same as LALR

Kanat Bolazar February 16, 2010

Similar presentations

Presentation on theme: "Kanat Bolazar February 16, 2010"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kanat Bolazar February 16, 2010

Similar presentations

Presentation on theme: "Kanat Bolazar February 16, 2010"— Presentation transcript:

Similar presentations

About project

Feedback