CPSC46001 Bottom-up Parsing Reading Sections 4.5 and 4.7 from ASU
CPSC46002 Predictive Parsing Summary First and Follow sets are used to construct predictive tables For non-terminal A and input t, use a production A a where t First( a ) For non-terminal A and input t, if e First(A) and t Follow( a ), then use a production A a where e First( a ) Recursive-descent without backtracking do not need the parse table explicitly
CPSC46003 Bottom-Up Parsing(1) Bottom-up parsing is more general than top-down parsing And just as efficient Builds on ideas in top-down parsing Bottom-up is the preferred method in practice
CPSC46004 Bottom-Up Parsing(2) Table-driven using an explicit stack (non-recursive) Stack can be viewed as containing both terminals and nonterminals Basic operations: Shift: Move terminals from input stream to the stack until the right-hand side of an appropriate production rule has been identified in the stack Reduce: Replace the sentential form appearing on the stack (considered from top that matched the right-hand side of an appropriate production rule) with the nonterminal appearing on the left-hand side of the production.
CPSC46005 An Introductory Example Bottom-up parsers don ’ t need left-factored grammars Hence we can revert to the “ natural ” grammar for our example: E T + E | T T num * T | num | (E) Consider the string: num * num + num
CPSC46006 The Idea Bottom-up parsing reduces a string to the start symbol by inverting productions: E E T + E T + E E T T + T T num T + num T num * T num * T + num T num num * num + num InputProductions Used
CPSC46007 Right-most Derivation In a right-most derivation, the rightmost nonterminal of a sentential form is replaced at each derivation step. Question: find the rightmost derivation of the string num* num + num
CPSC46008 Observation Read the productions found by bottom-up parse in reverse (i.e., from bottom to top) This is a rightmost derivation! E E T + E T + E E T T + T T num T + num T num * T num * T + num T num num * num + num
CPSC46009 Important Facts A bottom-up parser traces a rightmost derivation in reverse
CPSC A Bottom-up Parse E T + E T + T T + num num * T + num num * num + num E TE + num * T T
CPSC A Bottom-up Parse in Detail (1) + num * num * num + num
CPSC A Bottom-up Parse in Detail (2) num * T + num num * num + num + num * T
CPSC A Bottom-up Parse in Detail (3) T + num num * T + num num * num + num T + num * T
CPSC A Bottom-up Parse in Detail (4) T + T T + num num * T + num num * num + num T + num * T T
CPSC A Bottom-up Parse in Detail (5) T + E T + T T + num num * T + num num * num + num TE +num * T T
CPSC A Bottom-up Parse in Detail (6) E T + E T + T T + num num * T + num num * num + num E TE + num * T T
CPSC Bottom-up Parsing A trivial bottom-up parsing algorithm Let I = input string repeat pick a non-empty substring of I where X is a production if no such , backtrack replace one by X in I until I = “ S ” (the start symbol) or all possibilities are exhausted
CPSC Observations The termination of the algorithm (when/if) Running time of the algorithm If there are more than one choices for the sub-string to be replaced (reduce) which one to choose?
CPSC Where Do Reductions Happen Recall A bottom-up parser traces a rightmost derivation in reverse Let be a rightmost sentential form Assume the next reduction is by X Then is a string of terminals Why? Because X is a step in a right-most derivation
CPSC Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift Reduce
CPSC Shift Shift: Move # (marking the part of the input that has been processed) one place to the right Shifts a terminal to the left string ABC#xyz ABCx#yz
CPSC Reduce Apply an inverse production at the right end of the left string If A xy is a production, then Cbxy#ijk CbA#ijk
CPSC The Example with Shift-Reduce Parsing reduce T num T + num # shiftT + # num shiftnum # * num + num shiftnum * # num + num shift#num * num + num E # reduce E T + E T + E # reduce E T T + T # shiftT # + num reduce T num * T num * T # + num reduce T num num * num # + num
CPSC A Shift-Reduce Parse in Detail (1) + num * #num * num + num
CPSC A Shift-Reduce Parse in Detail (2) + num * num # * num + num #num * num + num
CPSC A Shift-Reduce Parse in Detail (3) + num * num # * num + num num * # num + num #num * num + num
CPSC A Shift-Reduce Parse in Detail (4) + num * num # * num + num num * # num + num #num * num + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (5) + num * T num # * num + num num * # num + num #num * num + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (6) T + num * T num # * num + num num * # num + num #num * num + num T # + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (7) T + num * T T + # num num # * num + num num * # num + num #num * num + num T # + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (8) T + num * T T + num # T + # num num # * num + num num * # num + num #num * num + num T # + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (9) T +num * T T T + num # T + # num num # * num + num num * # num + num #num * num + num T + T # T # + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (10) TE +num * T T T + num # T + # num num # * num + num num * # num + num #num * num + num T + E # T + T # T # + num num * T # + num num * num # + num
CPSC A Shift-Reduce Parse in Detail (11) E TE +num * T T T + num # T + # num num # * num + num num * # num + num #num * num + num E # T + E # T + T # T # + num num * T # + num num * num # + num
CPSC The Stack Left string can be implemented by a stack Top of the stack is the # Shift pushes a terminal on the stack Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non-terminal on the stack (production lhs)
CPSC Key Issue (will be resolved by algorithms) How do we decide when to shift or reduce? Consider step: num # * num + num We could reduce by T num giving T # * num + num A fatal mistake: No way to reduce to the start symbol E
CPSC Conflicts Generic shift-reduce strategy: If there is a handle on top of the stack, reduce Otherwise, shift But what if there is a choice? If it is legal to shift or reduce, there is a shift-reduce conflict If it is legal to reduce by two different productions, there is a reduce-reduce conflict
CPSC Conflict Example Consider the ambiguous grammar: num| (E)| E * E| E + E E
CPSC One Shift-Reduce Parse E # reduce E E + E E + E #... reduce E E * E E * E # + num shift#num * num + num reduce E num E + num# shiftE + # num shiftE # + num InputAction
CPSC Another Shift-Reduce Parse E # reduce E E * E E * E #... shiftE * E # + num shift#num * num + num reduce E E + E E * E + E# reduce E num E * E + num # shiftE * E + # num Input Action
CPSC Observations In the second step E * E # + num we can either shift or reduce by E E * E Choice determines associativity of + and * As noted previously, grammar can be rewritten to enforce precedence Precedence declarations are an alternative
CPSC Overview LR(k) parsing L: scan input Left to right R: produce rightmost derivation k tokens of lookahead LR(0) zero tokens of look-ahead SLR Simple LR: like LR(0) but uses FOLLOW sets to build more “ precise ” parsing tables
CPSC Basic Terminologies Handle A substring that matches the right side of a production whose reduction with that production ’ s left side constitutes one step of the rightmost derivation of the string from the start nonterminal of the grammar
CPSC Model of Shift-Reduce Parsing Stack + input = current right-sentential form. Locate the handle during parsing: shift zero or more terminals (tokens) onto the stack until a handle is on top of the stack. Replace the handle with a proper non-terminal (Handle Pruning): reduce to A where A
CPSC Model of an LR Parser
CPSC Problem: when to shift, when to reduce? Recall grammar: E T + E | T T num * T | num | (E) how to know when to reduce and when to shift?
CPSC Model of Shift-Reduce Parsing Stack + input = current right-sentential form. Locate the handle during the parsing: shift zero or terminals onto the stack until a handle is on top of the stack. Replace the handle with a proper non-terminal (Handle Pruning)
CPSC What we need to know to do LR parsing LR(0) states describe states in which the parser can be Note: LR(0) states are used by both LR(0) and SLR parsers Parsing tables transitions between LR(0) states, actions to take at transition: shift, reduce, accept, error How to construct LR(0) states How to construct parsing tables How to drive the parser
CPSC An LR(0) state = a set of LR(0) items An LR(0) item [X --> a.b] says that the parser is looking for an X it has an a on top of the stack expects to find in the input a string derived from b. Notes: [X --> a.ab] means that if a is on the input, it can be shifted. That is: a is a correct token to see on the input, and shifting a would not “ over-shift ” (still a viable prefix). [X -->a.] means that we could reduce X
CPSC LR(0) states S’ . E E . T E .T + E T .(E) T .num * T T .num S’ E. E T. E T. + E T num. * T T num. T (. E) E .T E .T + E T .(E) T .num * T T .num E T + E. E T +. E E .T E .T + E T .(E) T .num * T T .num T num *.T T .(E) T .num * T T .num T num * T. T (E.) T (E). E T ( num * ) E E T ( ( T (
CPSC SLR Parsing Remember the state of the automaton on each prefix of the stack Change stack to contain pairs Symbol, DFA State
CPSC SLR Parsing (Contd.) For a stack sym 1, state 1 ... sym n, state n state n is the final state of the DFA on sym 1 … sym n Detail: The bottom of the stack is any,start where any is any dummy state start is the start state of the DFA
CPSC Goto Table Define Goto[i,A] = j if state i A state j where A is a nonterminal Goto is just the transition function of the DFA One of two parsing tables
CPSC Parser Moves Shift x Push a, x on the stack a is current input x is a DFA state Reduce A As before Accept Error
CPSC Action Table For each state s i and terminal a If s i has item X .a and there is a transition on terminal a from state i to state j then Action[i,a] = shift j If s i has item X . and a Follow(X) and X != S’ then Action[i,a] = reduce X If s i has item S ’ S. then action[i,$] = accept Otherwise, action[i,a] = error
CPSC SLR Parsing Algorithm Let I = w$ be initial input Let j = 0 Let DFA state 1 have item S ’ .S Let stack = dummy, 1 repeat case action[top_state(stack),I[j]] of shift k: push I[j++], k reduce X A: pop |A| pairs, I[--j] = X // prepend X to input accept: halt normally error: halt and report error
CPSC Notes on SLR Parsing Algorithm Note that the algorithm uses only the DFA states and the input The stack symbols are never used! However, we still need the symbols for semantic actions
CPSC The Compiler So Far Lexical analysis Detects inputs with illegal tokens Parsing Detects inputs with ill-formed parse trees Semantic analysis Last “ front end ” phase Catches all remaining errors
CPSC Typical Semantic Errors multiple declarations: a variable should be declared (in the same scope) at most once undeclared variable: a variable should not be used before being declared. type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side. wrong arguments: methods should be called with the right number and types of arguments.
CPSC Sample Semantic Analyzer For each scope in the program: process the declarations add new entries to the symbol table (or a similar structure) and report any variables that are multiply declared process the statements find uses of undeclared variables, use the symbol-table information to determine the type of each expression, and to find type errors.
CPSC Scope Rules for Pascal- Rule 6.1: All constants, types, variables, and procedures definedin the same block must have different names Rule 6.2: A constant, type, or variable defined in a block is normallyknown from the end of its declaration to the end of the block. A procedure defined in a block B is normally known from the beginning of the procedure to the end of the block B Rule 6.3: Consider a block Q that defines an object x. If Q contains a block R that defines another object named x, the first object is unknown in the scope of the second object.
CPSC Pascal- Program (1) { 0 Begin Standard Block} 1 program P; 2 type T = array[1..100] of integer; 3 var x: T; 4 5 procedure Q(x: integer); 6 const c = 13; 7 begin... x... end{Q}; 8 9 procedure R; 10 var b, c: Boolean; 11 begin... x...end{R}; begin... end.{P} 14 {End Standard block}
CPSC Pascal- Program (2) {Constant = Numeral | ConstantName.} procedure Constant(Stop: Symbols); begin if Symbol = Numeral1 then Expect(Numeral, Stop) else if Symbol = Name1 then begin Find(Argument); Expect(Name1, Stop) end else SyntaxError(Stop) end;
CPSC Pascal- Program (3) {ConstantDefinition = ConstantName '=' Constant ';'.} procedure ConstantDefinition(stop: Symbols); begin ExpectName(Name, Symbols[Equal1, Semicolon1] + ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1] + Stop); Constant(Symbols[Semicolon1] + Stop); Define(Name); Expect(Semicolon1, Stop) end;
CPSC Pascal- Program (4) {Program = 'program' ProgramName ';' BlockBody '.'} procedure Programx(Stop: Symbols); begin Expect(Program1, Symbols[Name1, Semicolon1, Period1] + BlockSymbols + Stop); Expect(Name1, Symbols[Semicolon1, Period1] + BlockSymbols + Stop); Expect(Semicolon1, Symbols[Period1] + BlockSymbols + Stop); NewBlock; BlockBody(Symbols[Period1] + Stop); EndBlock; Expect(Period1, Stop) end;
CPSC Pascal- Program (5-1) {Constant = Numeral | ConstantName.} procedure Constant(var Value: integer; var Typex: Pointer; Stop: Symbols); begin if Symbol = Numeral1 then begin Value := Argument; Typex := TypeInteger; Expect(Numeral, Stop) end else if Symbol = Name1 then begin Find(Argument, Object); if = Constantx then begin Value := Typex := end
CPSC Pascal- Program (5-2) else begin KindError(object); Value := 0; Typex := TypeUniversal; end; Expect(Name1, Stop) end else begin SyntaxError(Stop); Value := 0; Typex := TypeUniversal; end;
CPSC Pascal- Program (6) {ConstantDefinition = ConstantName '=' Constant ';'.} procedure ConstantDefinition(stop: Symbols); var Name, Value: integer; Constx, Typex: Pointer; begin ExpectName(Name, Symbols[Equal1, Semicolon1] + ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1] + Stop); Constant(Value, Typex, Symbols[Semicolon1] + Stop); Define(Name, Constantx, Constx); := Value; := Typex; Expect(Semicolon1, Stop) end;
CPSC Static and Dynamic Scope #include int main() { int x = 1; char x = ‘ b ’ ; char y = ‘ a ’ ; q(); void p() { return 0 double x = 2.5; } printf( “ %c\n ”,y}; { int y[10]; } } void q() { int y = 42; printf(%d\n ”, x); p(); }