2016-3-161 Bottom Up Parsing PITT CS 1622. 2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down.

2016-3-161 Bottom Up Parsing PITT CS 1622

2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down  Don’t need left factored grammars  Can handle left recursion  Attempt to construct parse tree from an input string  beginning at leaves and working to top  Process of reducing strings to a non terminal – shift-reduce  Uses parse stack Contains symbols already parsed “shifted” until match RHS of production “reduced” to non-terminal on LHS Eventually reduced to start symbol

2016-3-16 PITT CS 1622 3 Z  b M b M  ( L | a L  M a ) | ) Considering string: w = b ( a a ) b Z  b M b  b ( L b  b ( M a ) b  b ( a a ) b Trying to find handles and then reduce – sentential form b ( a a ) b  b ( M a ) b  b ( L b  b M b  Z – rightmost derivation Z M L M a)ab(b

2016-3-16 PITT CS 1622 4  Grammar E  E+E E  E*E E  (E) E  id  Use # to indicate where we are in string id 1 #+id 2 *id 3  E # + id 2 * id 3  E + # id 2 * id 3 E + E # * id 3  E + E * id 3 #  E + E * E # E + E #  E Sentential formHandleProducts id 1 + id 2 * id 3 id 1 E  id E + id 2 * id 3 id 2 E  id E + E * id 3 id 3 E  id E + E * E E*E E  E*E E + E E  E+E E

2016-3-16 PITT CS 1622 5 Handle  Intuition: reduce only if it leads to the start symbol  Handle has to  match RHS of production and  lead to rightmost derivation, if reduced to LHS of some rule  Definition:  Let  w be a sentential form where  is an arbitrary string of symbols X  is a production w is a string of terminals Then  at  is a handle of  w if S   Xw   w by a rightmost derivation  Handles formalize the intuition (reduce  to X), not really say how to find handle

2016-3-16 PITT CS 1622 6 Issues  Locate handle – right sentential form  What production to reduce it to – which of the RHS Notice in right-most derivation, where right sentential form is 1 62 437 5 Never have to go into the middle of the string – left to right – right always contains terminals Rightmost derivation in reverse: 7 6 5 4 3 2 1

2016-3-16 PITT CS 1622 7  Consider our usual grammar – problem when to reduce E  T + E | T T  int * T | int | ( E ) Consider the string: int * int + int sentential form production int * int + intT  int int * T + intT  int * T T + intT  int T + TE  T T + EE  T + E E E T+E int*T T

2016-3-16 PITT CS 1622 8 Viable Prefix  Definition:  is a viable prefix if There is a w where  w is a right sentential form  #w is a configuration of a shift-reduced parser b ( a #a ) b  b ( M# a ) b  b ( L# b  b M # b  Z#  Alternatively, a prefix of a rightmost derived sentential form is viable if it does not extend the right end of the handle

Properties of Viable Prefix  A prefix is viable because it can be extended by adding terminals to form a valid (rightmost derived) sentential form  As long as the parser has viable prefixes on the stack, no parsing error has been detected.  Types of bottom up parsers Simple precedence Operator precedence LR family 2016-3-16 PITT CS 2210 9

2016-3-16 PITT CS 1622 10 $ table stack input string Operations 1.Shift – shift input symbol onto the stack 2.Reduce – RHS of a non-terminal “handle” is at the top of the stack. Decide which non-terminal to reduce it to 3.Accept – success 4.Error Stack Implementation Parser Driver

2016-3-16 PITT CS 1622 11 Z  b M b M  ( L | a L  M a ) | ) String b ( a a ) $ StackInputAction $ $ b $ b ( $ b ( a $ b ( M $ b ( M a $ b ( M a ) $ b ( L $ b M $ b M b $ Z b ( a a ) b $ ( a a ) b $ a a ) b $ a ) b $ ) b $ b $ $ shift reduce shift reduce shift reduce accept

2016-3-16 PITT CS 1622 12 Ambiguous Grammars  Conflicts arise with ambiguous grammars  Ambiguous grammars generate conflicts but so do other types of grammars  Example:  Consider the ambiguous grammar E  E * E | E + E | ( E ) | int Sentential formActions int * int + int … E * E # + int E # + int E + # int E + int # E + E # E # shift … reduce E  E * E shift reduce E  int reduce E  E + E Sentential formActions int * int + int … E * E # + int E * E + # int E * E + int # E * E + E # E * E # E # shift … shift reduce E  int reduce E  E + E reduce E  E * E

2016-3-16 PITT CS 1622 13 Ambiguity  In the first step shown, we can either shift or reduce by E  E * E Choice because of precedence of + and * Same problem with association of * and +  Can always rewrite ambiguous grammars of this sort to encode precedence and association in the grammar Sometimes result in convoluted grammars Tools have other means to encode precedence and association  But must get rid of conflicts !  Know what a handle is but not clear how to detect it

2016-3-16 PITT CS 1622 14 Properties about Bottom Up Parsing  Handles always appear at the top of the stack  Never in middle of stack  Justifies use of stack in shift – reduce parsing  General shift – reduce strategy  If there is no handle on the stack, shift  If there is a handle, reduce to the non-terminal,  Conflicts  If it is legal to either shift or reduce then there is a shift-reduce conflict.  If it is legal to reduce by two or more productions, then there is a reduce-reduce conflict.

2016-3-16 PITT CS 1622 15 LR Parsers  LR family of parsers  LR(k)L – left to right R – rightmost derivation in reverse k elements of look ahead  Attractive 1.LR(k) is powerful – virtually all language constructs 2.Efficient 3.LL(k)  LR(k) 4.LR parsers can detect an error as soon as it is possible to do so 5.Automatic technique to generate – YACC, Bison

2016-3-16 PITT CS 1622 16 LR and LL Parsers  LR parser, each reduction needed for parse is detected on the basic of  Left context  Reducible phrase  k terminals of look ahead  LL parser  Left context  First k symbols of what right hand side derive (combined phrase and what is to right of phrase)

2016-3-16 PITT CS 1622 17 Types of LR Parsers  SLR – simple LR  Easiest to implement  Not as powerful  Canonical LR  Most powerful  Expensive to implement  LALR  Look ahead LR  In between the 2 previous ones in power and overhead Overall parsing algorithm is the same – table is different

2016-3-16 PITT CS 1622 18 Parsing Overview X i grammar symbol with S i state (state says it all) Each state summarizes the information contained in the stack below it – what has been seen so far (S 0 X 1 S 1 X 2 S 2 …X n S n #a i a i+1 …a n $)  X 1 X 2 …X n a i …a n – right sentential State at the top of the stack and current input – index into parsing table to determine whether to shift or reduce $ LR Parsing program stack input string a2a2 a1a1 --- X1X1 X m-1 XmXm XmXm … ActionsGoTo Table S0S0 S1S1 S m-1 SmSm S m+1 …

2016-3-16 PITT CS 1622 19 Action --- X1X1 X m-1 XmXm … S0S0 S1S1 S m-1 SmSm … $X m+1 …… S m+1 --- X1X1 X m-1 XmXm … S0S0 S1S1 S m-1 SmSm … $X m+1 …… --- X1X1 X m-1 … S0S0 S1S1 S m-1 … Shift Reduce(1) Xz  X m X m+1 --- X1X1 X m-1 … S0S0 S1S1 S m-1 … XzXz (2) $X m+2 …… --- X1X1 X m-1 … S0S0 S1S1 S m-1 … XzXz S m+1 GOTO XmXm SmSm X m+1 S m+1

2016-3-16 PITT CS 1622 20 Parse Table: Action and Goto  Action [S m,A i ]  shift S: where S is a state reduce by a grammar production  accept error  Goto [Sm, grammar symbol] change to another state

2016-3-16 PITT CS 1622 21 Actions Assume S 0 X 1 S 1 X 2 S 2 …X n S n #a i a i+1 …a n $ right sentential form X 1 X 2 …X n a i …a n 1.Action [S m, a i ] is shift input and goto state (S 0 X 1 S 1 X 2 S 2 …X m S m a i S # a i+1 …$) 2.Action [S m, a i ] is reduce A   |  | = r, pop 2r symbols (S 0 X 1 S 1 …X m-r S m-r A S # a i a i+1 …$) where S = goto [S m-r, A] output generated after reduce tree 3.Action [S m, a i ] = accept – parsing is complete 4.Action [S m, a i ] = error – report and stop

2016-3-16 PITT CS 1622 22  Grammar 1.S  E 2.E  E+T 3.E  T 4.T  id 5.T  (E) Non-terminalFollow SETSET $ + ) $ +id()$ S0S3S4 S1S7accept S2r3 S3r4 S4S3S4 S5S7S6 r5 S7S3S4 S8r2 ETS S012 S1 S2 S3 S452 S5 S6 S78 S8 ACTION GOTO

2016-3-16 PITT CS 1622 23 Power Added to DFA  Return to state where non-terminal was predicted and continue  do this by counting states  Example StackInputAction S0 id S3+ id +id$r4, goto[S0, T] S0 T S2+ id +id$r3, goto[S0, E] ……… S0 E S1 + S7 T S8+id$r2, goto[S0, E] …. +id +

2016-3-16 PITT CS 1622 24 Two Operations  Bottom up only uses 2 operations  Shift and reduce on stack#input stack. input shift: ExT. abc  ExTa. bc Reduce: ExTa. bc  ExF. bc

2016-3-16 PITT CS 1622 25 LR Parsers  Can tell handle by looking at stack top [grammar symbol, state] and k input symbols – FSA  In practice, k<=1  How to construct LR parse table from grammar  First construct SLR parser  LR and LALR are augmented basic SLR techniques  2 phases to construct table Build deterministic finite state automation to go from state to state Build table from DFA  Each state – how do we know from grammar where we are in the parse. Production already seen.

2016-3-16 PITT CS 1622 26 Notation of an LR(0) item  An item is a production with a distinguished position on the right hand side – position indicates how much of the production already seen  Example S  a B S is a production Items for the production: S . a B S S  a. B S S  a B. S S  a B S. Called LR(0) items – Basic idea – construct a DFA that recognizes the viable prefixes group items into sets – state of SLR a B S

2016-3-16 PITT CS 1622 27 Construction of LR(0) items 1.Create augmented grammar G’ G: S   |  G’: S’  SS   |  What else is needed A  c. d E – indicate new state by consuming symbol d: need go to function A  c d. E – what are all possible things to see – all possible derivations from E? Add strings derivable from E – closure function A  c d E. – reduce to A and goto another state 2.Compute functions closure and go to – will be used to determine the action and go to parts of the parsing table closure – essentially defines what is expected go to – moves from one state to another by consuming symbol

2016-3-16 PITT CS 1622 28 Closure  Closure(I) where I is a set of items – form state  Let N be a non-terminal  If distinguished point is in front of N then add each production for that N and put distinguished point at the beginning of the RHS A  . B  is in I ; we expect to see a string derive from B B .  is added to the closure, where B   is a production Apply rule until nothing is added  Example: Grammar S  E E  E + T E  T T  id | ( E ) Assume I = { S . E } Closure(I) = { S .E, E . E + T, E . T, T . id, T . ( E ) }

2016-3-16 PITT CS 1622 29 Items  Two kinds of items  Kernel items Include S . S and all items where points not at left end  Non-kernel items Items with points at left end can always add Not keep around to save storage  Closure of kernel items

2016-3-16 PITT CS 1622 30 Goto Operation  Goto (I, X), where X is a grammar symbol I – set of items Shift action moves from one state to another by absorbing single symbol. Successor states will contain each item with distinguished point advanced by 1 grammar symbol  If A  . B  is in I then closure of A   B.  is added to goto(I, X)  Sets are viable prefixes  If  is a viable prefix for I then  X is a viable prefix to goto(I,X)  Example Goto(I, ( ) = closure( T  (. E ) )

2016-3-16 PITT CS 1622 31  Procedure items (C), C is set of items – state begin C:={closure(S . S)} Repeat for each set of items I in C and each grammar symbol X Such that goto(X) is not empty and not in C do add goto(I, X) to C Until no more sets can be added New states are generated by goto Construction of set-of-items

2016-3-16 PITT CS 1622 32  Example : S  E E  E + T | T T  id | ( E ) S 0 = closure {[S . E]} = {S . E, E . E + T, E . T, T . id, T . ( E )} goto(S 0, E) = closure {[S  E.], [S  E. + T]} S 1 = {S  E., S  E. + T} goto(S 0, T) = closure {[E  T.]} S 2 = {E  T. } goto(S 0, id) = closure {[T  id.]} S 3 = {T  id. } …… S 8 = …

2016-3-16 PITT CS 1622 33  DFA for the previous grammar( * closure of state ) * S  E. * E  E.+T S1S1 * S . E E . E+T E . T T . Id T . (E) S0S0 * E  E+.T T . id T . (E) S7S7 * E  T. S2S2 * T  id. S3S3 * E  E+T. S8S8 * T  (. E) E . E+T E . T T . Id T . (E) S4S4 * T  (E. ) * E  E. +T * T  (E). S6S6 E+ T ( id T E T S2S2 S3S3 ( S4S4 S5S5 S2S2 ( S4S4 ) + S7S7

2016-3-16 PITT CS 1622 34 Building Parse Table from DFA ACTION [state, input symbol/terminal symbol] GOTO [state, non-terminal symbol]  ACTION: 1.If [A  a  ] is in S i and a is a terminal and goto(S i, a) = S j then ACTION[S i, a] = shift j 2.If [A  ] is in S i, then ACTION[S i, a] = reduce A  for all a is Follow(A) if no conflicts in 1 and 2 – then SLR(1) grammar 3.If [S’  S 0  ] is in S i, then ACTION[S i, $] = accept  GOTO 1.if goto(S i, A) = S j then GOTO[S i,A]=S j 2.all entries not filled are errors

2016-3-16 PITT CS 1622 35  Grammar 1.S  E 2.E  E+T 3.E  T 4.T  id 5.T  (E) Non-terminalFollow SETSET $ + ) $ +id()$ S0S3S4 S1S7accept S2r3 S3r4 S4S3S4 S5S7S6 r5 S7S3S4 S8r2 ETS S012 S1 S2 S3 S452 S5 S6 S78 S8 ACTION GOTO

2016-3-16 PITT CS 1622 36 Power Added to DFA  Return to state where non-terminal was predicted and continue  do this by couting states StackInputAction S0id + id $S3 S0 id S3+ id $r4, goto[S0, T] S0 T S2+ id $r3, goto[S0, E] S0 E S1+ id $S7 S0 E S1 + S7id $S3 S0 E S1 + S7 id S3$r4, goto[S7, T] S0 E S1 + S7 T S8$r2, goto[S0, E] S0 E S1$accept S E E + T T id 1 2 3 4 5

2016-3-16 PITT CS 1622 37  Consider the grammar G S  A b c | B b d A  a b  Follow(A) and also b  Follow(B) B  a S .Abc S .Bbd A .a B .a A  a. B  a. a S0S0  What is reduced when “a b” is seen? reduce to A or B?  conflict  G is not SLR(1) but SLR(2)  We need 2 symbols of look ahead to look past b: b c – reduce to A b d – reduce to B  Possible to extend SLR(1) to k symbols of look ahead – allows larger class of CFGs to be parsed Confliction S1S1

2016-3-16 PITT CS 1622 38 SLR(k)  Extend SLR(1) definition to SLR(k) as follows let ,   V *  First k (  ) = { x  V T *| (  *x  and |x|<=k)} gives all terminal strings of size <= k derivable from  all k-symbol terminal prefixes of strings derivable from   Follow k (B) = {w  V T *| S  *  B  and w  First k (  )} gives all k symbol terminal strings that can follow B in some derivation all shorter terminal strings that can follow B in a sentential form

2016-3-16 PITT CS 1622 39 Parse Table Let S be a state and b  V T * such that |b|  k 1.If A .  S and b  Follow k (A) then  Action(S,b) – reduce to production A , 2.If D .a   S and a  V T and b  First k (a  Follow k (D))  Action(S,b) = shift j where goto(S,a)=S j For k =1, this definition reduces to SLR(1) First k (a  Follow k (D)) = {a}

2016-3-16 PITT CS 1622 40 SLR(k)  Consider S  A b k-1 c | B b k-1 d A  a B  a SLR(k) not SLR(k-1)  cannot decide what to reduce,  reduce a to A or B depends the next k symbols b k-1 c or b k-1 d

2016-3-16 PITT CS 1622 41 Non SLR(k)  Consider another Grammar G S  j A j | A m | a j A  a Follow(A) = {j, m} State S1: [A  a.] – reduce using this production [S  a.j] – shift j  not SLR(1) ? SLR(k)? Follow k (A) = First k (j) = {j}, First k (m) = {m} for S  a.j, First k (jFollow k (S)) = {j} so not SLR(k) for any k !!! S .jAj S .Am S .aj A .a S  a.j A  a. S  j.Aj A .a S0 S1 S2 a j j a A Conflict Note only m can follow A Note only j can follow A

2016-3-16 PITT CS 1622 42 Why?  Look ahead is too crude  In S1, if A  a is reduced then m is the only possible symbol that can be seen – the only valid look ahead  Fact that j can follow A in another context is irrelevant  Want to limit look ahead in a state to those symbols that might actually occur in the context represented by the state.  Done in Canonical LR !!!  Determine look ahead appropriate to a configuration – working backwards from DFA  States contains a look ahead – will be used only for reductions

2016-3-16 PITT CS 1622 43 A  B. X Y Z …, A  B. Follow(A)={a,b,c} SLR(1) A  B.,a/b X Y Z …, A  B.,b/c subset of Follow(A) LALR(1) X Y Z …, A  B.,c state splitting LR(1) A  B.,a A  B.,b

2016-3-16 PITT CS 1622 44 Constructing Canonical LR  Problem: Follow set in SLR is not precise enough  need to carry more information in state for reduction A   Extra information is used only for reductions  Extra information is incorporated into state by redefining items to include terminal symbol as second component  Form of item – LR(1) item  [A . , a] where A  is a production and a is a terminal or $  LR(1) item  only effect [A ., a] calls for a reduction only if next input symbol is a  second component will always be a subset of Follow(A) [A ., a]: S  *  A    A , a first  or $

2016-3-16 PITT CS 1622 45 Constructing Canonical LR  Essentially the same as LR(0) set of items only add look ahead  modify closure and goto function  Changes for closure  [A .  a] and B  then [B . , c] where c  First(  a)  Changes for goto function  carry look ahead [A .X  a]  I then goto (I, X) = [A  X.  a] Why?

2016-3-16 PITT CS 1622 46 Example  Grammar S’  S S  CC C  eC | d  S0: [S’ .S,$]first(  $)={$} [S .CC, $] [C .eC, e/d] first(C$)={e,d} [C .d, e/d]  S1: goto(S0, S) closure(S’  S., $) [S’  S., $]  S2: goto(S0, C) closure(S  C.C,$)first(  $)={$} [S  C.C, $] [C .eC, $] [C .d, $]

2016-3-16 PITT CS 1622 47  S3: goto(S0,e)=closure(C  e.C, e/d) [C  e.C, e/d]First(  ed) = {e,d} [C .eC, e/d] [C .d, e/d]  S4: goto(S0, d)=closure(C  d., e/d) [C  d., e/d]  S5: goto(S2, C)=closure(S  CC., $) [S  CC., $]  S6: goto(S2,e)=closure(C  e.C, $) [C  e.C, $]First(  $) = {$} [C .eC, $] [C .d, $]  S7: goto(S2, d)=closure(C  d., $) [C  d., $]  S8: goto(S3, C)=closure(C  eC., e/d) [C  eC., e/d]  S9: goto(S6,C)=closure(C  eC., $) [C  eC., $]

2016-3-16 PITT CS 1622 48 S0S1 S2 S8 S4 S6 S7 S8 S3 S C e d e e d C d e C C Note S3 and S6 are same except for look ahead (same for S4 and S7) In SLR(1) – one state represents both

2016-3-16 PITT CS 1622 49 Constructing Canonical LR Parse Table  same as before for shift  don’t use follow set for reduce  for reduce, reduce only on look ahead  Action and GOTO 1.if [A  a ,b]  Si and goto(Si, a) = Sj, Action[I,a] = Sj – shift and goto state j 2.if [A ,a]  Si Action[I,a] = reduce A  note previously true for all symbols in Follow(A) 3.if [S’  S., $]  Si, Action[i,$] = accept

2016-3-16 PITT CS 1622 50  Revisit SLR and LR  S  aEa | bEb | aFb | bFa E  e F  e  SLR: reduce/reduce conflict follow(E,F) = {a,b} aea  E  eandbea  F  e  can LR(1) work? will not have a conflict because states will be split to take into account this left context E if followed by a/b preceded by a/b F if followed by a/b preceded by b/a S .aEa S .bEb S .aFb S .bFa S  a.Ea S  a.Fb E .e F .e E  e. F  e. a b e

2016-3-16 PITT CS 1622 51 S .aEa S .bEb S .aFb S .bFa S  a.Ea S  a.Fb E .e F .e E  e. F  e. a b e  SLR: Follow(E) = Follow(F) = {a,b} S  b.Eb S  b.Fa E .e F .e E  e. F  e. e  LR: Follow sets more precise S0 S1 S2 S3 S4 [E  e., a] [F  e., b] [E  e., b] [F  e., a] a b e e

2016-3-16 PITT CS 1622 52 SLR(1) and LR(1)  Every SLR(1) grammar is LR(1) but LR(1) has more states than SLR(1) – orders of magnitude differences  LALR(1) – look ahead LR parsing method  used in practice because most syntactic structure can be represented by LALR (not true for SLR)  Same number of states as SLR  Can be constructed by merging states with the same core

2016-3-16 PITT CS 1622 53 Example  Grammar S’  S S  CC C  eC | d S3: goto(S0,e)=closure(C  e.C, e/d) [C  e.C, e/d] [C .eC, e/d] [C .d, e/d] S4: goto(S0, d)=closure(C  d., e/d) [C  d., e/d] S8: goto(S3, C)=closure(C  eC., e/d) [C  eC., e/d] S6: goto(S2,e)=closure(C  e.C, $) [C  e.C, $] [C .eC, $] [C .d, $] S7: goto(S2, d)=closure(C  d., $) [C  d., $] S9: goto(S6,C)=closure(C  eC., $) [C  eC., $]

2016-3-16 PITT CS 1622 54 S0S1 S2 S8 S4 S6 S7 S8 S3 S C e d e e d C d e C C Note S3 and S6 are same except for look ahead (same for S4 and S7) In SLR(1) – one state represents both

2016-3-16 PITT CS 1622 55 Merging states  Can merge S3 and S6  Similarly S47: [C  d., e/d/$] S89: [C  eC., e/d/$] S3: goto(S0,e)=closure(C  e.C, e/d) [C  e.C, e/d] [C .eC, e/d] [C .d, e/d] S6: goto(S2,e)=closure(C  e.C, $) [C  e.C, $] [C .eC, $] [C .d, $] S36:[C  e.C, e/d/$] [C .eC, e/d/$] [C .d, e/d/$]

2016-3-16 PITT CS 1622 56 Effects of Merging 1.Detection of errors can be delayed  LALR parsers will not shift another symbol after the LR parser declares error but may proceed to do some more reductions  Example: S’  SS  S | CC C  eC | d and string eed$ Canonical LR: Parse Stack 0e3e3d4 state 4 $input= error S4:{C  d., e/d} LALR: stack: 0e 36 e 36 d 47  state 47 input $, reduce C  d stack: 0e 36 e 36 C 89  reduce C  eC stack: 0e 36 C 89  reduce C  eC stack: 0 C 2  state 2 input $, error Why ?

2016-3-16 PITT CS 1622 57 Effects of Merging 2.Merging of states can introduce conflicts  cannot introduce shift-reduce conflicts  can introduce reduce-reduce conflicts  Shift-reduce conflicts Suppose Sij: [A ., a] reduce on input a [B .a , b] shift on input a formed by merging Si and Sj However, these cores must be same and must contain [A ., a] and [B .a , b]  shift-reduce conflicts were present in Si and Sj and not introducted by merging Why ?

2016-3-16 PITT CS 1622 58 Reduce-reduce Conflicts S  aEa | bEb | aFb | bFa E  e F  e S3: [E  e., a]viable prefix ae [F  e., b] S4:[E  e., b]viable prefix be [F  e., a] MergingS34:[E  e., a/b] [F  e., a/b] both reductions are called on inputs a and b, i.e. reduce-reduce conflict

2016-3-16 PITT CS 1622 59 A  B. X Y Z …, A  B. Follow(A)={a,b,c} SLR(1) A  B.,a/b X Y Z …, A  B.,b/c subset of Follow(A) LALR(1) X Y Z …, A  B.,c state splitting LR(1) A  B.,a A  B.,b

2016-3-16 PITT CS 1622 60 Construction on LALR Parser  One solution:  construct LR(1) items  merge states  if no conflicts, you have a LALR parser  Inefficient because of building LR(1) items are expensive in time and space  Efficient construction of LALR parsers  avoid construction of LR(1) items  construction states containing only LR(0) kernel items  compute look ahead for the kernel items  predict actions using kernel items

2016-3-16 PITT CS 1622 61 LALR vs. LR Parsing  LALR languages are not natural  They are an efficiency hack on LR languages  Any reasonable programming language has a LRLR(1) grammar  LALR(1) has become a standard for programming languages and for parser generators

2016-3-16 PITT CS 1622 62 A Hierarchy of Grammar Classes

2016-3-1663 Using Automatic Tools -- YACC PITT CS 1622

2016-3-16 PITT CS 1622 64 Using Parser Generator  Most common parser generators are LALR(1)  A parser generator constructs a LALR(1) table and reports an error when a table entry is multiply defined  A shift and a reduce – report shift/reduce conflict  Multiple reduces – report reduce/reduce conflict An ambiguous grammar will generate conflicts Must resolve conflicts

2016-3-16 PITT CS 1622 65 Shift/Reduce Conflicts  Typically due to ambiguities in the grammar  Classic example: the dangling else S  if E then S | if E then S else S | OTHER will have DFA state containing [S  if E then S., else] [S  if E then S. else S, x] if else follows then we can shift or reduce default (YACC, bison, etc.) is to shift – default behavior is as needed in this case

2016-3-16 PITT CS 1622 66 More Shift/Reduce Conflicts  Consider the ambiguous grammar E  E+E | E*E | int we will have the states containing [E  E*. E, +][E  E*E., +] [E . E+E, +]  [E  E. +E, +]... Again we have a shift/reduce on input + we need to reduce (* is higher than +) recall solution: declare the precedence of * and + E

2016-3-16 PITT CS 1622 67 More Shift/Reduce Conflicts  In YACC declare precedence and associativity %left + %left *  Precedence of a rule = that of its last terminal see yacc manual for ways to override this default  Resolve shift/reduce conflict with a shift if: no precedence declared for either rule or terminal input terminal has higher precedence than the rule the precedence are the same and right associative

2016-3-16 PITT CS 1622 68 Use Precedence to Solve S/R Conflict [E  E*. E, +][E  E*E., +] [E . E+E, +]  [E  E. +E, +]...  we will choose reduce because precedence of rule E  E*E is higher than that of terminal + [E  E+. E, +][E  E+E., +] [E . E+E, +]  [E  E. +E, +]...  we will choose reduce because E  E+E and + have the same precedence and + is left-associative E E

2016-3-16 PITT CS 1622 69  Back to our dangling else example [S  if E then S., else] [S  if E then S. else S, x] can eliminate conflict by declaring else with higher precedence than then But this starts to look like “hacking the tables” Best to avoid overuse of precedence declarations or you will end with unexpected parse trees

2016-3-16 PITT CS 1622 70 Reduce/Reduce Conflicts  Usually due to ambiguity in the grammar  Example:a sequence of identifiers S   | id | id S There are two parse trees for the string id S  id How does this confuse the parser?

2016-3-16 PITT CS 1622 71 Reduce/Reduce Conflicts  Consider the states [S’ .S, $][S  id., $] [S ., $][S  id.S, $] [S . id, $]  [S ., $] [S . id S, $][S . id, $] [S . id S, $] Reduce/reduce conflict on input “id$” S’  S  id S’  S  id S  id Better rewrite the grammar: S   | id S E

2016-3-16 PITT CS 1622 72 Semantic Actions  Semantic actions are implemented for LR parsing  keep attributes on the semantic stack – parallel syntax stack on shift a, push attribute for a on semantic stack on reduce X  pop attributes for  compute attribute for X push it on the semantic stack  Create AST  Bottom up  Creating leaf node for tokens and create internal nodes from subtrees.

2016-3-16 PITT CS 1622 73 Performing Semantic Actions  Compute the value E  T + E1 {E.val = T.val + E1.val} | T{E.val = T.val} T  int * T1{T.val = int.val * T1.val} | int {T.val = int.val} consider the parsing of the string 3 * 5 + 8  Recall: creating the AST E  intE.ast = mkleaf(int.lexval) | E1+E2E.ast = mktree(plus, E1.ast, E2.ast) | (E1)E.ast = E1.ast  a bottom-up evaluation of the ast attribute: E.ast = mktree(plus, mkleaf(5), mktree(plus, mkleaf(2), mkleaf(3) ) ) PLUS 523

2016-3-16 PITT CS 1622 74 Error Recovery  Error detected when parser consults parsing table  empty entry  Canonical LR will not make a single reduction before announcing an error  SLR and LALR parser may make several reductions before announcing an error but not shift an erroneous symbol on the stack  Simple error recovery  continue to scan down the stack until a state S with a goto on a particular non-terminal A is found  zero or more input symbols are discarded until a symbol “a” is found that can follow A  goto[s,A] put on stack and parsing continues  Choice of A – non-terminal representing major program piece e.g. if A is ‘stmt’ the ‘a’ may be ‘end’ or ‘;’

2016-3-16 PITT CS 1622 75 Compaction of LR Parse Table  A typical language grammar with 50-100 terminals and 100 productions may have an LALR parser with several hundred states and thousands of action entries  Often row of table are the same so share the rows  Rows can be made shorter – use lists (input, action)  Slows access to the table

2016-3-16 PITT CS 1622 76 Notes on Parsing  Parsing  A solid foundation: CFG  A simple parser: LL(1)  A more powerful parser: LR(1)  An efficient hack: LALR(1)  LALR(1) parser generators

2016-3-161 Bottom Up Parsing PITT CS 1622. 2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down.

Similar presentations

Presentation on theme: "2016-3-161 Bottom Up Parsing PITT CS 1622. 2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

2016-3-161 Bottom Up Parsing PITT CS 1622. 2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down.

Similar presentations

Presentation on theme: "2016-3-161 Bottom Up Parsing PITT CS 1622. 2016-3-16 PITT CS 1622 2 Bottom Up Parsing  Also known as Shift-Reduce parsing  More powerful than top down."— Presentation transcript:

Similar presentations

About project

Feedback