5. Bottom-Up Parsing Chih-Hung Wang Compilers 5. Bottom-Up Parsing Chih-Hung Wang References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010. 2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000. 3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)
Creating a bottom-up parser automatically Left-to-right parse, Rightmost-derivation create a node when all children are present handle: nodes representing the right-hand side of a production IDENT rest_expression expression rest_expr term aap + ( noot + mies )
LR(0) Parsing Theoretically important but too weak to be useful. running example: expression grammar input expression EOF expression expression ‘+’ term | term term IDENTIFIER | ‘(’ expression ‘)’ short-hand notation Z E $ E E ‘+’ T | T T i | ‘(’ E ‘)’
LR(0) Parsing keep track of progress inside potential handles when consuming input tokens LR items: N initial set S0 Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’
Closure algorithm for LR(0) The important part is the inference rule; it predicts new handle hypotheses from the hypothesis that we are looking for a certain non-terminal, and is sometimes called prediction rule; it corresponds to an move, in that it allows the automation to move to another state without consuming input. Reduce item: an item with the dot at the end Shift item: the others
Transition Diagram T i E i ‘+’ $ T Z E $ E E ‘+’ T E T
LR(0) parsing example (1) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ S0 stack i + i $ input shift input token (i) onto the stack compute new state
LR(0) parsing example (2) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 i S1 + i $ reduce handle on top of the stack compute new state Q: what does state S1 look like? A: write down on the blackboard, including transition. Do so for each new state in the remainder of the animation.
LR(0) parsing example (3) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 T S2 + i $ i reduce handle on top of the stack compute new state
LR(0) parsing example (4) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 + i $ T shift input token on top of the stack compute new state i
LR(0) parsing example (5) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 + S4 i $ T shift input token on top of the stack compute new state i
LR(0) parsing example (6) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 + S4 i S1 $ T reduce handle on top of the stack compute new state i Q: is it allowed to re-use state S1? A: yes.
LR(0) parsing example (7) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 + S4 T S5 $ T i reduce handle on top of the stack compute new state i Note we cannot re-use state S2.
LR(0) parsing example (8) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 $ E + T shift input token on top of the stack compute new state T i i
LR(0) parsing example (9) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 E S3 $ S6 E + T reduce handle on top of the stack compute new state T i i
LR(0) parsing example (10) Z E $ E E ‘+’ T E T T i T ‘(’ E ‘)’ stack input S0 Z E $ accept! E + T T i i
Precomputing the item set (1) Initial item set
Precomputing the item set (2) Next item set
Complete transition diagram
The LR push-down automation Two major moves and a minor move Shift move Remove the first token from the present input and pushes it onto the stack Reduce move N -> are moved from the stack N is then pushed onto the stack Termination The input has been parsed successfully when it has been reduced to the start symbol.
GOTO and ACTION tables
LR(0) parsing of the input i+i$
Another Example of LR(0) from Fischer (1)
Another Example of LR(0) from Fischer (2)
Another Example of LR(0) from Fischer (3)
Algorithm of LR(0) Construction (1)
Algorithm of LR(0) Construction (2)
LR(0) Table
LR comments The bottom-up parsing, unlike the top-down parsing, has no problems with left-recursion. On the other hand, bottom-up parsing has a slight problem with right-recursion.
LR(0) conflicts (1) shift-reduce conflict Exist in a state when table construction cannot use the next k tokens to decide whether to shift the next input token or call for a reduction. array indexing: T i [ E ] T i [ E ] (shift) T i (reduce) -rule: RestExpr Expr Term RestExpr (shift) RestExpr (reduce)
LR(0) conflicts (2) reduce-reduce conflict Exist when table construction cannot use the next k tokens to distinguish between multiple reductions that cannot be applied in the inadequate state. assignment statement: Z V := E $ V i (reduce) T i (reduce) (Different reduce rules) typical LR(0) table contains many conflicts
Handling LR(0) conflicts Use a one-token look-ahead Use a two-dimensional ACTION table different construction of ACTION table SLR(1) – Simple LR LR(1) LALR(1) – Look-Ahead LR
SLR(1) parsing A handle should not be reduced to a non-terminal N if the look-ahead is a token that cannot follow N. reduce N iff token FOLLOW(N) FOLLOW(N) FOLLOW(Z) = { $ } FOLLOW(E) = { ‘+’, ‘)’, $ } FOLLOW(T) = { ‘+’, ‘)’, $ }
SLR(1) ACTION table shift
SLR(1) ACTION/GOTO table 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ s7 sn – shift to state n rn – reduce rule n
Example of resolving conflicts (1) A new rule T i [E] state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 6 r2 7 s8 8 s9 9 r5 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ 6: T i ‘[‘ E ‘]’
Example of resolving conflicts (2) state stack symbol / look-ahead token i + ( ) [ ] $ E T s5 s7 s1 s6 1 s3 s2 2 r1 3 s4 4 r3 5 r4 s10 6 r2 7 s8 8 s9 9 r5 1: Z E $ 2: E T 3: E E ‘+’ T 4: T i 5: T ‘(’ E ‘)’ 6: T i ‘[‘ E ‘]’ s5 T i. T i. [E]
Another Example of LR(0) Conflicts(1)
Another Example of LR(0) Conflicts(2)
Another Example of LR(0) Conflicts(3) num plus num times num $
Another Example of LR(0) Conflicts(4) Follow(E)= {plus, $}
Unfortunately … SLR(1) leaves many shift-reduce conflicts unsolved problem: FOLLOW(N) set is a union of all all look- aheads of all alternatives of N in all states example S A | x b A a A b | B B x Follow (S)={$} Follow(A) = {b, $} Follow(B) = {b, $}
SLR(1) automation
Another Example of SLR Problem Follow(A)={b, c, $}
Make the Grammar SLR(1) Follow(A1)={b, $}
LR(1) parsing The LR(1) technique does not rely on FOLLOW sets, but rather keeps the specific look-ahead with each item LR(1) item: N {} - closure for LR(1) item sets: if set S contains an item P N {} then for each production rule N S must contain the item N {} where = FIRST( {} )
Creating look-ahead sets Extended definition of FIRST stes If FIRST() does not contain , FIRST({}) is just equal to FIRST(); if can produce , FIRST({}) contain all the tokens in FIRST(), excluding , plus the tokens in .
LR(1) automation
Another Example of LR(1) Construction (1)
Another Example of LR(1) Construction (2)
Another Example of LR(1) Construction (3)
Another Example of LR(1) Construction (4)
Another Example of LR(1) Construction (5)
LR(1) parsing comments LR(1) automation is more discriminating than the SLR(1). In fact, it is so strong that any language that can be parsed from left to right with a one-token look-ahead in linear time can be parsed using the LR(1). LR tables are big Combine “equal” sets by merging look-ahead sets: LALR(1).
LALR(1) S3 and S10 are similar in that they are equal if one ignores the look-ahead sets, and so are S4 and S9, S6 and S11, and S8 and S12.
LALR(1) automation
Practice Derive the LALR(1) ACTION/GOTO table for the grammar in Fig. 2.95
Making a grammar LR(1) – or not Although the chances for a grammar to be LR(1) are much larger than those being SLR(1) or LL(1), one often encounters a grammar that still is not LR(1). The reason is generally that the grammar is ambiguous. For Example if_statement -> ‘if’ ‘(’ expression ‘)’ statement | ‘if’ ‘(’expression ‘)’ statement ‘else’ statement statement -> … | if_statement |… The statement: if (x>0) if (y>0) p=0; else q=0;
Possible syntax trees (1)
Possible syntax trees (2)
Other Examples of Ambiguous Grammar (1)
Other Examples of Ambiguous Grammar (2)
Resolving shift-reduce conflicts (1) The longest possible sequence of grammar symbols is taken for reduction. In a shift-reduce conflict do shift. Another example E * + E + * input: i * i + i E E ‘+’ E E E ‘*’ E reduce shift
Resolving shift-reduce conflicts (2) The use of precedences between tokens Example: a shift-reduce conflict on t: P -> t{…} (shift item) Q -> uR {…t…} (reduce item) where R is either empty or one non-terminal. If the look-ahead is t, we perform one of the following three actions: If symbol u has a higher precedence than symbol t, we reduce If t has a higher precedence than symbol u, we shift. If both have equal precedence, we also shift
Bottom-up parser: yacc/bison The most widely used parser generator is yacc Yacc is an LALR(1) parser generator A yacc look-alike called bison, provided by GNU
A very high-level view of text analysis techniques
Yacc code example (constructing parser tree)
Yacc code example (auxiliary code)