Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR parse tables are generated , Semester 1, Bottom-up (LR) Parsing
Compilers: Bottom-up/6 2 Overview 1. What is a LR Parser? 2. Bottom-up using Shift-Reduce 3. Building a LR Parser 4. Generating the Parse Table 5. LR Conflicts 6.LL, SLR, LR, LALR Grammars
Compilers: Bottom-up/6 3 In this lecture Source Program Target Lang. Prog. Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code but concentrating on bottom-up parsing
Compilers: Bottom-up/ What is a LR Parser? A LR parser reads its input tokens from Left-to-right and produces a Rightmost derivation. The parse tree is built bottom-up, starting from the leaves and working upwards to the start symbol.
Compilers: Bottom-up/6 5 LR in Action Grammar: S a A B e A A b c | b B d The tree corresponds to a rightmost derivation: S a A B e a A d e a A b c d e a b b c d e Reducing a sentence: a b b c d e a A b c d e a A d e a A B e S S a b b c d e A A B A A B A A A These match production’s right-hand sides parse "a b b c d e"
Compilers: Bottom-up/6 6 LR(k) Parsing The k is to the number of input tokens that are looked at when deciding which production to use. – –e.g. LR(0), LR(1) We'll be using a variation of LR(0) parsing in this chapter.
Compilers: Bottom-up/6 7 LR versus LL LR can deal with more complex (powerful) grammars than LL (top-down parsers). LR can detect errors quicker than LL. LR parsers can be implemented very efficiently, but they're difficult to build by hand (unlike LL parsers).
Compilers: Bottom-up/ Bottom-up using Shift-Reduce The usual way of implementing bottom-up parsing is by using shift-reduce: – –‘shift’ means read in a new input token, and push it onto a stack – –‘reduce’ means to group several symbols into a single non-terminal by choosing a production to use 'backwards' the symbols are popped off the stack, and the production's non-terminal is pushed onto it
Compilers: Bottom-up/6 9 Shift-Reduce Parsing $$ Reduce S => a A B e $ $ a A B e Shift e $ $ a A B Reduce B => d e $ $ a A d Shift d e $ $ a A Reduce A => A b c d e $ $ a A b c Shift c d e $ $ a A b Shift b c d e $ $ a A Reduce A => b b c d e $ $ a b Shift b b c d e $ $ a Shift a b b c d e $ $ ActionInputStack S => a A B e A => A b c | b B => d
Compilers: Bottom-up/ Building a LR Parser The standard way of writing a shift-reduce LR parser is to generate a parse table for the grammar, and 'plug' that into a standard LR compiler framework. The table has two main parts: actions and gotos.
Compilers: Bottom-up/6 11 actionsgotos 3.1. Inside an LR Parser$ anananan… aiaiaiai… a2a2a2a2 a1a1a1a1 LR Parser X o s 0 X o s 0 … X m-1 s m-1 X m s m X m s m output (parse tree) stack input tokens possible actions are shift, reduce, accept, error X is terminals or non-terminals, S = state Parse table (you create this bit) gotos involve state changes push; pop
Compilers: Bottom-up/6 12 Parse Table for the Example 1: S => a A B e 2: A => A b c 3: A => b 4: B => d Action part Goto part s means shift to to that state r means reduce by that numbered production
Compilers: Bottom-up/ Table Algorithm push( ); /* push pair */ currToken = scanner(); while(1) { = pair on top of stack; if (action[state, currToken ] == ) { push( ); currToken = scanner(); } : : 4 branches for the four possible actions that can be in a table cell continued
Compilers: Bottom-up/6 14 else if (action[state, currToken ] == ) { A --> is rule number ruleNum; bodySize = numElements( ); pop bodySize pairs off stack; state’ = state part of pair on top of stack; push( ); } : : continued
Compilers: Bottom-up/6 15 else if (action[state,currToken ] = accept) { S --> is the start symbol production; bodySize = numElements( ); pop bodySize pairs off stack; state’ = state part of pair on top of stack; if (state’ == 0) break; // success; can now stop else error(); } else error(); } // of while loop
Compilers: Bottom-up/ Table Parsing Example $$0 Accept S => a A B e $$0,a1,A2,B6,e7 Shift 7 e $ $0,a1,A2,B4 Reduce B => d e $ $0,a1,A2,d6 Shift 6 d e $ $0,a1,A2 Reduce A => A b c d e $ $0,a1,A2,b5,c8 Shift 8 c d e $ $0,a1,A2,b5 Shift 5 b c d e $ $0,a1,A2 Reduce A => b b c d e $ $0,a1,b3 Shift 3 b b c d e $ $0,a1 Shift 1 a b b c d e $ $0 ActionInputStack pop 1 pair state' == 1 push(A,goto(1, A)) = push(A,2) pop 3 pairs state' == 1 push(A,goto(1, A)) = push(A,2) S => a A B e A => A b c | b B => d
Compilers: Bottom-up/ The LR Parse Stack The parse stack holds the branches of the tree being built bottom-up. For example, – –the stack $0,a1,A2,b5,c8 represents: a b A bcbc continued
Compilers: Bottom-up/6 18 The next stack: $0,a1,A2 a b A b c A Later, $0,a1,A2,B6,e7 a b A b c A d B e continued
Compilers: Bottom-up/ Generating the Parse Table The example parse table was generated using the SLR (simple LR) algorithm – –an extension of LR(0) which uses the grammar's FOLLOW() sets The other LR algorithms can be used to make a parse table: – –e.g. LR(1), LALR(1)
Compilers: Bottom-up/6 20 Supporting Techniques SLR table generation makes use of three techniques: – –LR(0) items – –the closure() function – –the goto() function I'll explain each one first, before the table generation algorithm.
Compilers: Bottom-up/ LR(0) Items An LR(0) item is a grammar production with a at some position of the right-hand side. So, a production A X Y Z has four items: A X Y Z A X Y Z A X Y Z A X Y Z Production A has one item A
Compilers: Bottom-up/ The closure() Function The closure() function generates a set of LR(0) items. Assume that the grammar only has one production for the start symbol S, S => The initial closure set is: closure( { S => } ) continued
Compilers: Bottom-up/6 23 If A B is in the set, then for each production B , add the item B to the set, if it's not already there. Repeat until no new items can be added to the set.
Compilers: Bottom-up/6 24 Example use of closure() Grammar: S --> E E E + T | T T T * F | F F ( E ) F id { S E } closure({ S E }) = { S E E E + T E T } { S E E E + T E T T T * F T F } { S E E E + T E T T T * F T F F ( E ) F id } Add E Add T Add F
Compilers: Bottom-up/ The goto() Function goto(I n, X) takes as input an existing closure set I n, and a terminal/non-terminal symbol X. The output is a new closure set I n+1 : – –for each item A X in I n, add closure({ A X }) to I n+1 – –repeat until no more items can be added to I n+1 InIn I n+1 X
Compilers: Bottom-up/6 26 goto() Example 1 Grammar: S => A B // rule 1, for start symbol A => a B => b Initial state I 0 = closure( { S => A B } ) = { S => A B A => a } continued
Compilers: Bottom-up/6 27 goto( I 0, A) = = closure( { S => A B } ) = { S => A B, B => b} // call it I 1 goto( I 0, a) = = closure( { A => a } ) = { A => a } // call it I 2 I0I0 I1I1 I2I2 A a continued
Compilers: Bottom-up/6 28 goto( I 1, B) = = closure( { S => A B } ) = { S => A B } // call it I 3 – –this is the end of the S production goto( I 1, b) = = closure( { B => b } ) = { B => b } // call it I 4 I0I0 I1I1 I2I2 A a I3I3 I4I4 B b end state
Compilers: Bottom-up/6 29 goto() Example 2 Grammar: S => a A B e // rule 1, for start symbol A => A b c | b B => d Initial state I 0 = closure( { S => a A B e } ) = { S => a A B e } continued
Compilers: Bottom-up/6 30 goto( I 0, a) = = closure( { S => a A B e } ) = { S => a A B e A => A b c A => b} // call it I 1 continued I0I0 I1I1 a
Compilers: Bottom-up/6 31 goto( I 1, A) = = closure( { S => a A B e A => A b c } ) = { S => a A B e A => A b c B => d } // call it I 2 goto( I 1, b) = = closure( { A => b } ) = { A => b } // call it I 3 I0I0 I1I1 I2I2 a A I3I3 b continued
Compilers: Bottom-up/6 32 goto( I 2, B) = = closure( { S => a A B e } ) = { S => a A B e } // call it I 4 Others – –I 5 : { A => A b c } – –I 6 : { B => d } – –I 7 : { S => a A B e } // end of start symbol rule – –I 8 : { A => A b c } I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec
Compilers: Bottom-up/ Using goto() to make a Table The columns of the table should be the grammar's terminals, $, and non-terminals. The rows should be the I 0, I 1, …, I n numbers 0, 1, …, n. what we've been calling states
Compilers: Bottom-up/6 34 Stage 1 In stage 1, we add the shift, goto, and accept entries to the table. action[i, a] gets if goto(I i,a) = I j goto[ i, A ] gets j if goto( I i, A) == I j continued
Compilers: Bottom-up/6 35 action[i, $] get accept if S => in I i (there must be only one S rule)
Compilers: Bottom-up/6 36 Example Grammar 1 S --> A B A --> a B --> b I0I0 I1I1 I2I2 A a I3I3 I4I4 B b ab$SAB s2 s4 acc 1 3 action[] goto[]
Compilers: Bottom-up/6 37 Stage 2 In stage 2, we add the reduce and error entries to the table. action[i, a] gets if [A => ] in I i and A is not S and a is in FOLLOW(A) and A => is rule number ruleNum continued
Compilers: Bottom-up/6 38 After filling the table cells with shift, goto, accept, and reduce actions, any remaining empty cells will trigger an error() call.
Compilers: Bottom-up/6 39 Finishing the Example Table The reduce states are the state boxes at the leaves of the closure graph. – –but exclude the end state For the example 1 grammar, there are two boxes at the leaves: I 2 and I 4. I0I0 I1I1 I2I2 A a I3I3 I4I4 B b
Compilers: Bottom-up/6 40 I 2 Reduction I 2 = { A => a } – –A => a is rule number 2 – –FOLLOW(A) == FIRST(B) = { b } So action[ 2, b ] gets S --> A B A --> a B --> b
Compilers: Bottom-up/6 41 I 4 Reduction I 4 = { B => b } – –B => b is rule number 3 – –FOLLOW(B) = { $ } So action[ 4, $ ] gets S --> A B A --> a B --> b
Compilers: Bottom-up/6 42 Adding Reduce Entries S --> A B A --> a B --> b I0I0 I1I1 I2I2 A a I3I3 I4I4 B b ab$SAB s2 s4 acc 1 3 action[] goto[] r2 r3
Compilers: Bottom-up/6 43 Using the Example 1 Table $$0 Accept (S --> A B) $$0,A1,B3 Reduce 3 (B --> b) $$0,A1,b4 Shift 4 b $ $0,A1 Reduce 2 (A --> a) b $ $0,a2 Shift 2 a b $ $0 ActionInputStack S --> A B A --> a B --> b pop 1 pair; state' = 0; push(A, goto(0,A)) == push(A,1); pop 1 pair; state' = 1; push(B, goto(1,B)) == push(B,3);
Compilers: Bottom-up/ Example Grammar 2 S --> a A B e A --> A b c | b B --> d I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec action[]goto[] abcde$SAB Stage 1 s1 s3 s5s6 s7 s8 acc 2 4
Compilers: Bottom-up/6 45 Reduce States For the example 2 grammar, there are three boxes at the leaves: I 3, I 6, and I 8.
Compilers: Bottom-up/6 46 I 3 Reduction I 3 = { A => b } – –A => b is rule number 3 – –FOLLOW(A) = {b} FIRST(B) – – = {b, d} So action[ 3, b ] and action[ 3, d ] gets S --> a A B e A --> A b c A --> b B --> d
Compilers: Bottom-up/6 47 I 6 Reduction I 6 = { B => d } – –B => d is rule number 4 – –FOLLOW(B) = {e} So action[ 6, e ] gets S --> a A B e A --> A b c A --> b B --> d
Compilers: Bottom-up/6 48 I 8 Reduction I 8 = { A => A b c } – –A => A b c is rule number 2 – –FOLLOW(A) = {b, d} So action[ 8, b ] and action[ 8, d ] gets S --> a A B e A --> A b c A --> b B --> d
Compilers: Bottom-up/6 49 Adding Reduce Entries S --> a A B e A --> A b c | b B --> d I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec action[]goto[] abcde$SAB s1 s3 s5s6 s7 s8 acc 2 4 r3 r4 r2
Compilers: Bottom-up/ LR Conflicts A LR conflict occurs when a cell in the action part of the parse table contains more than one action. There are two kinds of conflict: – –shift/reduce and reduce/reduce Conflicts appear because of: – –grammar ambiguity – –limitations of the SLR parsing method (even when the grammar is unambiguous)
Compilers: Bottom-up/ Shift/Reduce A shift/reduce conflict occurs when the parser cannot decide whether to shift the next symbol or reduce with a production – –typically, the default action is to shift
Compilers: Bottom-up/6 52 Dangling Else Example Grammar rule: IfStmt => if Expr then Stmt | if Expr then Stmt else Stmt Example: if (a == 1) then if (b == 4) then x = 2; else... <-- this goes with which 'if' ?
Compilers: Bottom-up/6 53 On the Stack Stack $… $…if Expr then Stmt Input …$ else…$ Action … shift or reduce? Choose shift, so else matches closest if
Compilers: Bottom-up/ Reduce/Reduce A reduce/reduce conflict occurs when the parser cannot decide which production to use to make a reduction. Typically, the first suitable production is used.
Compilers: Bottom-up/6 55 Example Stack $ $a Input aa$ a$ Action shift reduce A a or B a ? Grammar: C A B A a B a Choose A a, since it's the first suitable one.
Compilers: Bottom-up/ LL, SLR, LR, LALR Grammars LL(1) LR(1) LR(0) SLR LALR(1) the ovals represent the complexity of the grammars that the notation can handle we've been using SLR in this chapter LL(1) was used in chapter 5 on top-down parsing
Compilers: Bottom-up/6 57 LR(1) Grammars LR(1) parsing uses one token lookahead to avoid conflicts in the parsing table. It can deal with more complex/powerful grammars than LR(0) or SLR. A LR(1) grammar takes longer to convert into a parse table.
Compilers: Bottom-up/6 58 LALR(1) Grammars LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce the size of the parse table. LALR(1) is less powerful than LR(1) – –it may introduce reduce-reduce conflicts, but that's not likely for programming language grammars LALR(1) is used by the YACC parsing tool – –see next chapter