Formal Aspects Term 2, Week4 LECTURE: LR “Shift-Reduce” Parsers: The JavaCup Parser-Generator CREATES LR “Shift-Reduce” Parsers, they are very commonly used and superior to LL parsers TUTORIAL: How to use, and how to create, a Shift-Reduce Parser
LR “Shift-Reduce” Parsers LR S-R parsers consist of a Parsing Table (a DFSM) PLUS a Stack of States and Symbols. States are numbered in the Table, and Symbols are tokens or non- terminals. The Parser is input with a string which it has to parse. It shifts the tokens from the string to the stack. State Symbol State Symbol State Tokens States PARSING TABLE - ACTIONS ON THE STACK STACK
LR “Shift-Reduce” Parser - The Start Assume String = T1 T2 T3 T Tn is input. The first token T1 from the Left of the string is input to the Table with state 1. The Table is used to find out what to do: SHIFT or REDUCE. EXAMPLE: Stack 1: state 1 INPUT T1.... consult table => SHIFT T1, move to state X Stack 2: state x T1 state 1
LR “Shift-Reduce” Parsers - General Workings Given a symbol and a state input to the Table, carry out the following: (see PAGE 60 in Appel’s book) Sn: (means “Shift symbol, move to state n”) Put symbol onto the top of the stack; Put the new state number n on top of the stack Rk: (means “Reduce with rule k”) matching the RHS of rule k with the top of the stack and REMOVE all the matched top; Push the LHS of rule k onto the top of the stack; Input LHS of rule k + state below it to the Table.
To Create a LR(1) Parser We will now go through the steps required to BUILD a shift-reduce parser This method is embedded in JavaCup
Jargon 1 : ITEM An ITEM is a grammar’s production rule with a “DOT” somewhere in its Right Hand Side. The DOT represents a notional parsing position e.g. E ::= (.S,E) E ::= (S,.E) S ::=.S;S S ::=.id := E are example items from Grammar 3.1
Jargon 2: Closure of an Item The CLOSURE of an item R (or set of items) is the set C of items such that (1) C contains R AND (2) IF there is a member of C of the form X ::= w.Y z where Y is a non-terminal, then ALL the defining production rules of Y must appear in C with the DOT at the start of their RHS. E.g. closure(E ::= (.S,E) ) = { E ::= (.S,E) S ::=.S ; S S ::=.id := E S ::=.print (L) }
LR “Shift-Reduce” Parsers - Generation TWO STAGE PROCESS: 1: CREATE A FINITE STATE MACHINE WITH NODES = SETS OF ITEMS ARCS ANNOTATED WITH NON-TEMINALS OR TOKENS 2: CREATE A PARSING TABLE FROM THE MACHINE
1: CREATING THE FINITE STATE MACHINE To generate a new state from an old one: newstate(w: SYMBOL,S: OLDSTATE) = closure( set of items of the form Z ::=.... w..... where Z ::=.....w.... is a member of S )
ALGORITHM TO CREATE FSM T = set of STATES in the FSM, E = set of TRANSITIONS E = { } ; T = { closure( S’ ::=.S$ ) } ; repeat for each state S in T for each item: ‘Z ::=.....w....’ in S add newstate(w,S) to T add S --w--> newstate(w,S) to E end for until E and T do not change NB ‘ACCEPT’ STATE OF FSM = newstate($, anystate)
2: TO CREATE THE TABLE FROM THE FSM 1. NUMBER STATES 1,2,3, For a transition n ---- x ----> m where m contains an item of the form Z ::=... w. Put ‘reduce X’ all along row m under the token column, where X is the no. of Z ::=... W Otherwise: 3. For a transition n ---- x ----> m where x is a token, put ‘Shift m’ in row n column x 4. For a transition n ---- Y ----> m where Y is a non-terminal, put ‘goto m’ in row n column Y
LR Parsers - Summary In this lecture we have seen HOW LR parsers work and HOW they can be automatically created from a grammar specification. NB LR means parse string from Left to right, but build up the parse tree from the Right of the string first. “Most” parsers are “LR(1)” - the “1” means they look at the 1 next token in the string.