Download presentation
Presentation is loading. Please wait.
Published byHarold Cannon Modified over 8 years ago
1
241-437 Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR parse tables are generated 241-437, Semester 1, 2011-2012 6. Bottom-up (LR) Parsing
2
241-437 Compilers: Bottom-up/6 2 Overview 1. What is a LR Parser? 2. Bottom-up using Shift-Reduce 3. Building a LR Parser 4. Generating the Parse Table 5. LR Conflicts 6.LL, SLR, LR, LALR Grammars
3
241-437 Compilers: Bottom-up/6 3 In this lecture Source Program Target Lang. Prog. Semantic Analyzer Syntax Analyzer Lexical Analyzer Front End Code Optimizer Target Code Generator Back End Int. Code Generator Intermediate Code but concentrating on bottom-up parsing
4
241-437 Compilers: Bottom-up/6 4 1. What is a LR Parser? A LR parser reads its input tokens from Left-to-right and produces a Rightmost derivation. The parse tree is built bottom-up, starting from the leaves and working upwards to the start symbol.
5
241-437 Compilers: Bottom-up/6 5 LR in Action Grammar: S a A B e A A b c | b B d The tree corresponds to a rightmost derivation: S a A B e a A d e a A b c d e a b b c d e Reducing a sentence: a b b c d e a A b c d e a A d e a A B e S S a b b c d e A A B A A B A A A These match production’s right-hand sides parse "a b b c d e"
6
241-437 Compilers: Bottom-up/6 6 LR(k) Parsing The k is to the number of input tokens that are looked at when deciding which production to use. – –e.g. LR(0), LR(1) We'll be using a variation of LR(0) parsing in this chapter.
7
241-437 Compilers: Bottom-up/6 7 LR versus LL LR can deal with more complex (powerful) grammars than LL (top-down parsers). LR can detect errors quicker than LL. LR parsers can be implemented very efficiently, but they're difficult to build by hand (unlike LL parsers).
8
241-437 Compilers: Bottom-up/6 8 2. Bottom-up using Shift-Reduce The usual way of implementing bottom-up parsing is by using shift-reduce: – –‘shift’ means read in a new input token, and push it onto a stack – –‘reduce’ means to group several symbols into a single non-terminal by choosing a production to use 'backwards' the symbols are popped off the stack, and the production's non-terminal is pushed onto it
9
241-437 Compilers: Bottom-up/6 9 Shift-Reduce Parsing $$ Reduce S => a A B e $ $ a A B e Shift e $ $ a A B Reduce B => d e $ $ a A d Shift d e $ $ a A Reduce A => A b c d e $ $ a A b c Shift c d e $ $ a A b Shift b c d e $ $ a A Reduce A => b b c d e $ $ a b Shift b b c d e $ $ a Shift a b b c d e $ $ ActionInputStack S => a A B e A => A b c | b B => d
10
241-437 Compilers: Bottom-up/6 10 3. Building a LR Parser The standard way of writing a shift-reduce LR parser is to generate a parse table for the grammar, and 'plug' that into a standard LR compiler framework. The table has two main parts: actions and gotos.
11
241-437 Compilers: Bottom-up/6 11 actionsgotos 3.1. Inside an LR Parser$ anananan… aiaiaiai… a2a2a2a2 a1a1a1a1 LR Parser X o s 0 X o s 0 … X m-1 s m-1 X m s m X m s m output (parse tree) stack input tokens possible actions are shift, reduce, accept, error X is terminals or non-terminals, S = state Parse table (you create this bit) gotos involve state changes push; pop
12
241-437 Compilers: Bottom-up/6 12 Parse Table for the Example 1: S => a A B e 2: A => A b c 3: A => b 4: B => d Action part Goto part s means shift to to that state r means reduce by that numbered production
13
241-437 Compilers: Bottom-up/6 13 3.2. Table Algorithm push( ); /* push pair */ currToken = scanner(); while(1) { = pair on top of stack; if (action[state, currToken ] == ) { push( ); currToken = scanner(); } : : 4 branches for the four possible actions that can be in a table cell continued
14
241-437 Compilers: Bottom-up/6 14 else if (action[state, currToken ] == ) { A --> is rule number ruleNum; bodySize = numElements( ); pop bodySize pairs off stack; state’ = state part of pair on top of stack; push( ); } : : continued
15
241-437 Compilers: Bottom-up/6 15 else if (action[state,currToken ] = accept) { S --> is the start symbol production; bodySize = numElements( ); pop bodySize pairs off stack; state’ = state part of pair on top of stack; if (state’ == 0) break; // success; can now stop else error(); } else error(); } // of while loop
16
241-437 Compilers: Bottom-up/6 16 3.3. Table Parsing Example $$0 Accept S => a A B e $$0,a1,A2,B6,e7 Shift 7 e $ $0,a1,A2,B4 Reduce B => d e $ $0,a1,A2,d6 Shift 6 d e $ $0,a1,A2 Reduce A => A b c d e $ $0,a1,A2,b5,c8 Shift 8 c d e $ $0,a1,A2,b5 Shift 5 b c d e $ $0,a1,A2 Reduce A => b b c d e $ $0,a1,b3 Shift 3 b b c d e $ $0,a1 Shift 1 a b b c d e $ $0 ActionInputStack pop 1 pair state' == 1 push(A,goto(1, A)) = push(A,2) pop 3 pairs state' == 1 push(A,goto(1, A)) = push(A,2) S => a A B e A => A b c | b B => d
17
241-437 Compilers: Bottom-up/6 17 3.4. The LR Parse Stack The parse stack holds the branches of the tree being built bottom-up. For example, – –the stack $0,a1,A2,b5,c8 represents: a b A bcbc continued
18
241-437 Compilers: Bottom-up/6 18 The next stack: $0,a1,A2 a b A b c A Later, $0,a1,A2,B6,e7 a b A b c A d B e continued
19
241-437 Compilers: Bottom-up/6 19 4. Generating the Parse Table The example parse table was generated using the SLR (simple LR) algorithm – –an extension of LR(0) which uses the grammar's FOLLOW() sets The other LR algorithms can be used to make a parse table: – –e.g. LR(1), LALR(1)
20
241-437 Compilers: Bottom-up/6 20 Supporting Techniques SLR table generation makes use of three techniques: – –LR(0) items – –the closure() function – –the goto() function I'll explain each one first, before the table generation algorithm.
21
241-437 Compilers: Bottom-up/6 21 4.1. LR(0) Items An LR(0) item is a grammar production with a at some position of the right-hand side. So, a production A X Y Z has four items: A X Y Z A X Y Z A X Y Z A X Y Z Production A has one item A
22
241-437 Compilers: Bottom-up/6 22 4.2. The closure() Function The closure() function generates a set of LR(0) items. Assume that the grammar only has one production for the start symbol S, S => The initial closure set is: closure( { S => } ) continued
23
241-437 Compilers: Bottom-up/6 23 If A B is in the set, then for each production B , add the item B to the set, if it's not already there. Repeat until no new items can be added to the set.
24
241-437 Compilers: Bottom-up/6 24 Example use of closure() Grammar: S --> E E E + T | T T T * F | F F ( E ) F id { S E } closure({ S E }) = { S E E E + T E T } { S E E E + T E T T T * F T F } { S E E E + T E T T T * F T F F ( E ) F id } Add E Add T Add F
25
241-437 Compilers: Bottom-up/6 25 4.3. The goto() Function goto(I n, X) takes as input an existing closure set I n, and a terminal/non-terminal symbol X. The output is a new closure set I n+1 : – –for each item A X in I n, add closure({ A X }) to I n+1 – –repeat until no more items can be added to I n+1 InIn I n+1 X
26
241-437 Compilers: Bottom-up/6 26 goto() Example 1 Grammar: S => A B // rule 1, for start symbol A => a B => b Initial state I 0 = closure( { S => A B } ) = { S => A B A => a } continued
27
241-437 Compilers: Bottom-up/6 27 goto( I 0, A) = = closure( { S => A B } ) = { S => A B, B => b} // call it I 1 goto( I 0, a) = = closure( { A => a } ) = { A => a } // call it I 2 I0I0 I1I1 I2I2 A a continued
28
241-437 Compilers: Bottom-up/6 28 goto( I 1, B) = = closure( { S => A B } ) = { S => A B } // call it I 3 – –this is the end of the S production goto( I 1, b) = = closure( { B => b } ) = { B => b } // call it I 4 I0I0 I1I1 I2I2 A a I3I3 I4I4 B b end state
29
241-437 Compilers: Bottom-up/6 29 goto() Example 2 Grammar: S => a A B e // rule 1, for start symbol A => A b c | b B => d Initial state I 0 = closure( { S => a A B e } ) = { S => a A B e } continued
30
241-437 Compilers: Bottom-up/6 30 goto( I 0, a) = = closure( { S => a A B e } ) = { S => a A B e A => A b c A => b} // call it I 1 continued I0I0 I1I1 a
31
241-437 Compilers: Bottom-up/6 31 goto( I 1, A) = = closure( { S => a A B e A => A b c } ) = { S => a A B e A => A b c B => d } // call it I 2 goto( I 1, b) = = closure( { A => b } ) = { A => b } // call it I 3 I0I0 I1I1 I2I2 a A I3I3 b continued
32
241-437 Compilers: Bottom-up/6 32 goto( I 2, B) = = closure( { S => a A B e } ) = { S => a A B e } // call it I 4 Others – –I 5 : { A => A b c } – –I 6 : { B => d } – –I 7 : { S => a A B e } // end of start symbol rule – –I 8 : { A => A b c } I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec
33
241-437 Compilers: Bottom-up/6 33 4.4. Using goto() to make a Table The columns of the table should be the grammar's terminals, $, and non-terminals. The rows should be the I 0, I 1, …, I n numbers 0, 1, …, n. what we've been calling states
34
241-437 Compilers: Bottom-up/6 34 Stage 1 In stage 1, we add the shift, goto, and accept entries to the table. action[i, a] gets if goto(I i,a) = I j goto[ i, A ] gets j if goto( I i, A) == I j continued
35
241-437 Compilers: Bottom-up/6 35 action[i, $] get accept if S => in I i (there must be only one S rule)
36
241-437 Compilers: Bottom-up/6 36 Example Grammar 1 S --> A B A --> a B --> b I0I0 I1I1 I2I2 A a I3I3 I4I4 B b 0 1 2 3 4 ab$SAB s2 s4 acc 1 3 action[] goto[]
37
241-437 Compilers: Bottom-up/6 37 Stage 2 In stage 2, we add the reduce and error entries to the table. action[i, a] gets if [A => ] in I i and A is not S and a is in FOLLOW(A) and A => is rule number ruleNum continued
38
241-437 Compilers: Bottom-up/6 38 After filling the table cells with shift, goto, accept, and reduce actions, any remaining empty cells will trigger an error() call.
39
241-437 Compilers: Bottom-up/6 39 Finishing the Example Table The reduce states are the state boxes at the leaves of the closure graph. – –but exclude the end state For the example 1 grammar, there are two boxes at the leaves: I 2 and I 4. I0I0 I1I1 I2I2 A a I3I3 I4I4 B b
40
241-437 Compilers: Bottom-up/6 40 I 2 Reduction I 2 = { A => a } – –A => a is rule number 2 – –FOLLOW(A) == FIRST(B) = { b } So action[ 2, b ] gets S --> A B A --> a B --> b
41
241-437 Compilers: Bottom-up/6 41 I 4 Reduction I 4 = { B => b } – –B => b is rule number 3 – –FOLLOW(B) = { $ } So action[ 4, $ ] gets S --> A B A --> a B --> b
42
241-437 Compilers: Bottom-up/6 42 Adding Reduce Entries S --> A B A --> a B --> b I0I0 I1I1 I2I2 A a I3I3 I4I4 B b 0 1 2 3 4 ab$SAB s2 s4 acc 1 3 action[] goto[] r2 r3
43
241-437 Compilers: Bottom-up/6 43 Using the Example 1 Table $$0 Accept (S --> A B) $$0,A1,B3 Reduce 3 (B --> b) $$0,A1,b4 Shift 4 b $ $0,A1 Reduce 2 (A --> a) b $ $0,a2 Shift 2 a b $ $0 ActionInputStack S --> A B A --> a B --> b pop 1 pair; state' = 0; push(A, goto(0,A)) == push(A,1); pop 1 pair; state' = 1; push(B, goto(1,B)) == push(B,3);
44
241-437 Compilers: Bottom-up/6 44 4.5. Example Grammar 2 S --> a A B e A --> A b c | b B --> d I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec action[]goto[] 0 1 2 3 4 abcde$SAB 5 6 7 8 Stage 1 s1 s3 s5s6 s7 s8 acc 2 4
45
241-437 Compilers: Bottom-up/6 45 Reduce States For the example 2 grammar, there are three boxes at the leaves: I 3, I 6, and I 8.
46
241-437 Compilers: Bottom-up/6 46 I 3 Reduction I 3 = { A => b } – –A => b is rule number 3 – –FOLLOW(A) = {b} FIRST(B) – – = {b, d} So action[ 3, b ] and action[ 3, d ] gets S --> a A B e A --> A b c A --> b B --> d
47
241-437 Compilers: Bottom-up/6 47 I 6 Reduction I 6 = { B => d } – –B => d is rule number 4 – –FOLLOW(B) = {e} So action[ 6, e ] gets S --> a A B e A --> A b c A --> b B --> d
48
241-437 Compilers: Bottom-up/6 48 I 8 Reduction I 8 = { A => A b c } – –A => A b c is rule number 2 – –FOLLOW(A) = {b, d} So action[ 8, b ] and action[ 8, d ] gets S --> a A B e A --> A b c A --> b B --> d
49
241-437 Compilers: Bottom-up/6 49 Adding Reduce Entries S --> a A B e A --> A b c | b B --> d I0I0 I1I1 I2I2 a A I3I3 b I4I4 I5I5 I6I6 I7I7 I8I8 B b d ec action[]goto[] 0 1 2 3 4 abcde$SAB 5 6 7 8 s1 s3 s5s6 s7 s8 acc 2 4 r3 r4 r2
50
241-437 Compilers: Bottom-up/6 50 5. LR Conflicts A LR conflict occurs when a cell in the action part of the parse table contains more than one action. There are two kinds of conflict: – –shift/reduce and reduce/reduce Conflicts appear because of: – –grammar ambiguity – –limitations of the SLR parsing method (even when the grammar is unambiguous)
51
241-437 Compilers: Bottom-up/6 51 5.1. Shift/Reduce A shift/reduce conflict occurs when the parser cannot decide whether to shift the next symbol or reduce with a production – –typically, the default action is to shift
52
241-437 Compilers: Bottom-up/6 52 Dangling Else Example Grammar rule: IfStmt => if Expr then Stmt | if Expr then Stmt else Stmt Example: if (a == 1) then if (b == 4) then x = 2; else... <-- this goes with which 'if' ?
53
241-437 Compilers: Bottom-up/6 53 On the Stack Stack $… $…if Expr then Stmt Input …$ else…$ Action … shift or reduce? Choose shift, so else matches closest if
54
241-437 Compilers: Bottom-up/6 54 5.2. Reduce/Reduce A reduce/reduce conflict occurs when the parser cannot decide which production to use to make a reduction. Typically, the first suitable production is used.
55
241-437 Compilers: Bottom-up/6 55 Example Stack $ $a Input aa$ a$ Action shift reduce A a or B a ? Grammar: C A B A a B a Choose A a, since it's the first suitable one.
56
241-437 Compilers: Bottom-up/6 56 6. LL, SLR, LR, LALR Grammars LL(1) LR(1) LR(0) SLR LALR(1) the ovals represent the complexity of the grammars that the notation can handle we've been using SLR in this chapter LL(1) was used in chapter 5 on top-down parsing
57
241-437 Compilers: Bottom-up/6 57 LR(1) Grammars LR(1) parsing uses one token lookahead to avoid conflicts in the parsing table. It can deal with more complex/powerful grammars than LR(0) or SLR. A LR(1) grammar takes longer to convert into a parse table.
58
241-437 Compilers: Bottom-up/6 58 LALR(1) Grammars LALR(1) parsing (Look-Ahead LR) combines LR(1) states to reduce the size of the parse table. LALR(1) is less powerful than LR(1) – –it may introduce reduce-reduce conflicts, but that's not likely for programming language grammars LALR(1) is used by the YACC parsing tool – –see next chapter
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.