Building Readahead FSMs for Grammars

Building Readahead FSMs for Grammars
Building Readahead FSMs for Grammars

Goals for This Section

To understand what a readahead FSM does
Goals To understand what a readahead FSM does To understand how to build one. and what do we need to build one. A slightly augmented grammar

What’s a Readahead FSM?

How does a Parser Work Again!
Find the right end of a handle (while munching inputs, stacking stuff (3 stacks that grow on the right), and moving R (indicator for the right end of a handle)) Find the left end of a handle (while traversing the stack from right to left and moving L) Reduce to a nonterminal A (using the stack contents between L and R, build a new tree and replace everything by an A-token) and repeat until no more input (EndOfFile encountered); equivalent to reaching an accept table.

How does a Parser Work Again!
Find the right end of a handle. Find the left end of a handle. Reduce to a nonterminal A. A readahead FSM is used to guide this process edge indicates that we can be at the right end of a handle for G. Readahead FSM Grammar '|-' a 1 2 G {EndOfFile} -> a *. @G @G is short form for Follow (G) A state is final if it has transition Most books use “goalpost” symbols '|-’and '-|’ (instead of EndOfFile) for goals but don’t discuss scanners with multiple end goalposts. Since a readahead FSM always has 1 initial state, a final state if it has transition  It’s not worth showing which states are initial or final states.

Note When we write or we mean
@G {a, b, c} When we write or 2 3 2 3 = Follow (G) = {a, b, c} {a} we mean 2 3 {b} {c} is lookahead information; i.e., look attribute, not read attribute.

If a Grammar is More Complicated
There is still exactly one readahead FSM for the whole grammar. Grammar Readahead FSM '|-' A G {EndOfFile} -> A *. A -> a. 1 2 @G a @A 3 It’s convenient to add an extra production and call the result an augmented grammar. Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. This also to be added to the readahead FSM.

How do we use it to find the right end of a handle?
Ignoring the mechanism involving peek and next to process the input, let’s just assume the input is all there Augmented Grammar Readahead FSM '|-' A G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. 1 2 @G {EndOfFile} @G' G 3 4 a @A 5 Input |- a a EndOfFile Readahead FSM finds the right end of the handle; see R Trace |- a and reach state 5 L R Humans know (but it doesn’t) where the left end is but it told us to reduce it to A because Human (us for now) will find the left end (grayed out) and reduce; see L. |- A a EndOfFile Let’s repeat

How do we use it to find the right end of a handle?
Augmented Grammar Readahead FSM '|-' A G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. 1 2 @G {EndOfFile} @G' G 3 4 a @A 5 Showing R only At each step, restart from the beginning, go as far as you can |- a a EndOfFile Reached state 5 |- A a EndOfFile Reached state 5 again |- A A EndOfFile Reached state 2 |- G EndOfFile Reached state 4 This means STOP

We Have Performed A Sequence of Reductions
|- a a EndOfFile G |- A a EndOfFile => AA Flipping the order => Aa |- A A EndOfFile => aa |- G EndOfFile Seems to be replacing the rightmost nonterminal at each step (called a right derivation) These kinds of parsers 1. Work left to right 2. Work bottom up 3. Simulate the reverse of a right derivation

Not Necessary to Restart from the Beginning Each Time
Augmented Grammar Readahead FSM '|-' A G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. 1 2 @G {EndOfFile} @G' G 3 4 a @A 5 Also showing the table number stack… Look to the left of the handle for where to restart -1 |-2 a5 a EndOfFile Reached state 5 Resume at 2 (to the left of a) -1 |-2 A2 a5 EndOfFile Reached state 5 again Resume at 2 (to the left of a) -1 |-2 A2 A2 EndOfFile Reached state 2 Resume at 2 (to the left of first A) -1 |-2 G2 EndOfFile Reached state 4 This means STOP

A More Complex Readahead FSM to Make it Clearer
Augmented Grammar Readahead FSM A A '|-' 1 2 3 G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. @G @G {EndOfFile} @G' G G 4 5 a a @A 6 Look to the left of the handle for where to restart -1 |-2 a6 a EndOfFile Reached state 6 Resume at 2 (to the left of a) -1 |-2 A3 a6 EndOfFile Reached state 6 again Resume at 3 (to the left of a) -1 |-2 A3 A3 EndOfFile Reached state 2 Resume at 2 (to the left of first A) -1 |-2 G4 EndOfFile Reached state 5 This means STOP

Building Readahead FSMs
Building Readahead FSMs

By hand via tracing By hand via dot pushing Algorithmically

Building Readahead FSMs By Hand Via Tracing

Building Readahead FSMs By Hand Tracing
Start with the augmented grammar and extra production and expand the right part into an FSM at the end if it a G'-production. This is a G’ production Augmented Grammar Readahead FSM G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. '|-' G {EndOfFile} @G' 1 2 3 4 Pick another nonterminal in the FSM and expand it the same way (if they have all been expanded, you’re done) YES, you need a loop The G expansion: 0 or more A’s followed

But there is 1 Constraint
The FSM we build MUST be deterministic. So, if there is already a path in the FSM for AbBc and the new path to be added is The new path must run along the old path at the end of which we add

What to Watch Out For… When Creating Loops
Augmented Grammar Readahead FSM G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. '|-' G {EndOfFile} @G' 1 2 3 4 '|-' {EndOfFile} @G' G Let’s expand G 1 2 3 4 A @G more minimal Expansion with 1 A OR '|-' @G' G {EndOfFile} 1 2 3 4 @G A @G Eliminates potential future conflicts Expansion with 2 A’s 5 A

Continuing '|-' G {EndOfFile} @G' G’ -> '|-’ G {EndOfFile}
Augmented Grammar Readahead FSM '|-' G {EndOfFile} @G' G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. 1 2 3 4 @G A @G 5 A Expanding first A Readahead FSM '|-' @G' G {EndOfFile} 1 2 4 3 @G A @G 5 A Expanding second A Next slide a @A 6

Share States or Not '|-' G {EndOfFile} @G' G’ -> '|-’ G {EndOfFile}
Augmented Grammar Readahead FSM '|-' G {EndOfFile} @G' G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. 1 2 3 4 @G A @G 5 A Expanding second A a @A 6 Readahead FSM '|-' @G' G {EndOfFile} 1 2 3 4 @G @G 5 A A more minimal to share state 6 rather than creating a new state 7 a a @A 6

Creating a Readahead FSM By Hand Via Tracing
Another Example Creating a Readahead FSM By Hand Via Tracing

Another Example E {EndOfFile} -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . Step 1: Create augmented grammar E' -> '|-' E {EndOfFile} E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . Step 2: Create initial expansion of E' '|-' E {EndOfFile} @E' 1 2 4 3

Another Example All subsequent steps: Pick a nonterminal to expand, expand it, and mark it expanded with a checkmark. E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . '|-' @E' E {EndOfFile} 1 2 3 4 Pick E

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' @E' E {EndOfFile} 1 2 3 4 + T @E 5 6 - T @E 7 8 T @E 9 Keep it deterministic: Note the first E of E+T was already there… Also, it looks like state 5 and 7 are EQUAL, 6 and 8 are EQUAL (so don’t duplicate)

Without duplicating states
Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' E {EndOfFile} @E' 1 2 3 4 + T @E 5 6 Without duplicating states - To 5 T @E 7 Keep it deterministic: Note the first E of E+T was already there…

More compact way of drawing it.
Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4 +, - T @E 5 6 More compact way of drawing it. T @E 7

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4 +, - T @E 5 6 T @E 7 Pick T

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4 +, - T @E 5 6  T @E 7 *, / P @T 8 9 P @T 10 Keep it deterministic: Note the first T of T*P was already there…

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4 +, - T @E 5 6  T @E 7 *, / P @T 8 9 P @T 10 Pick P Move up to make room and pick P

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' @E' E {EndOfFile} 1 2 3 4 +, - T @E 5 6  T T @E 7  *, / P @T 8 9 P @T 10 '(' ')' E @P 11 12 13 i @P 14 Expand P

   Another Example E -> E + T | E – T | T .
T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' @E' E {EndOfFile} 1 2 3 4 +, - + T @E 5 6  - T @E 7  *, / P @T 8 9 P @T 10 '(' ')' E @P 11 12 13 i @P 14 Pick E

Avoid duplicating if it’s going to be exactly the same
Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' @E' E {EndOfFile} 1 2 3 4 +, - T @E 5 6  T @E 7  *, / P @T 8 9 P @T 10  '(' ')' E @P 11 12 13 Avoid duplicating if it’s going to be exactly the same T +, - To 7 To 5 i @P 14 Keep it deterministic: Note the first E of E+T was already there…

Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4 +, - T @E 5 6  T @E Pick T 7  *, / P @T 8 9 P @T 10  '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14

Avoid duplicating if it’s going to be exactly the same
Another Example E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i .  '|-' {EndOfFile} @E' E 1 2 3 4  +, - T @E 5 6 *, / Avoid duplicating if it’s going to be exactly the same  To 8 P To 10 T @E 7  *, / P @T 8 9 P @T 10  '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14 Keep it deterministic: Note the first T of T*P was already there…

     '|-' @E' E {EndOfFile} 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6 *, / To 8 P To 10  T @E 7 *, / P @T  8 9 @T P 10  '(' ')' E @P Make some room 11 12 13 T +, - To 7 To 5 i @P 14

     '|-' @E' E {EndOfFile} 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6 *, / To 8 P To 10  T @E Pick P 7 *, / P @T  8 9 @T P 10  '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14

      '|-' {EndOfFile} @E' E 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7 To 14 *, / P @T  8 9 @T P 10  '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14

      '|-' {EndOfFile} @E' E 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7 To 14 *, / P @T  8 9 @T P 10 Pick P  '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14

       '|-' @E' E {EndOfFile} 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7  To 14 *, / P @T  8 9 @T '(' P To 11 10 i  To 14 '(' ')' E @P 11 12 13 T +, - To 7 To 5 i @P 14

       '|-' @E' E {EndOfFile} 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7  To 14 *, / P @T  8 9 @T '(' P To 11 10 i  To 14 '(' ')' E @P 11 12 13 T +, - To 7 To 5 Pick T i @P 14

        '|-' @E' E {EndOfFile} 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7  To 14 *, / P @T P  8 9 To 10 @T '(' '(' P To 11 To 11 10 i i  To 14 To 14 '(' ')' E @P 11 12 13  T +, - T * P already there To 7 To 5 T / P already there P To 10 P not there i @P 14

        '|-' E {EndOfFile} @E' 1 2 3 4
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . +, - T @E 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7  To 14 P To 10 *, / P @T  8 9 '(' To 11 @T '(' P i To 11 To 14 10 i  To 14 '(' ')' E @P 11 12 13 T * P already there  T +, - T / P already there To 7 To 5 P not there P To 10 Pick P i @P 14

         '|-' E {EndOfFile} @E' 1 2 3 4 +, - T @E
E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . 5 6  *, / To 8 P To 10  '(' To 11 T @E i 7  To 14 *, / P @T  8 9 @T '(' P To 11 10 i  To 14 '(' ')' E @P 11 12 13  T +, - To 7 To 5  P To 10 Every nonterminal has been checked. SO WE ARE DONE '(' To 11 i To 14 i @P 14

This is the readahead FSM for grammar
'|-' E {EndOfFile} @E' 1 2 3 4 +, - T @E 5 6 *, / To 8 P To 10 '(' To 11 This is the readahead FSM for grammar T @E i 7 To 14 *, / P @T 8 9 E -> E + T | E – T | T . T -> T * P | T / P | P . P -> '(' E ')' | i . @T '(' P To 11 10 i To 14 '(' ')' E @P 11 12 13 T +, - To 7 To 5 P To 10 '(' To 11 i To 14 i @P 14

What’s Wrong With This Technique
Impossible to automate Deciding where to loop back is not so obvious. Doing it by hand for a toy grammar is doable, but not for a real grammar. The grammar for C++ is estimated to be about 1000 productions.

Building Readahead FSMs By Hand Via Dot Pushing

Building Readahead FSMs By Dot Pushing
Introduce the notion of where you are in a production by using a dot. Rule 1: Dots allowed only on transition labels and at the VERY END, not in front of metasymbols or brackets. Example: For productions A -> a b c* d A -> .a b c* d At the beginning A -> a .b c* d After moving the dot right past a 2 dots represents 2 dotted productions A -> a b .c* .d After moving the dot right past b A -> a b c* d. After moving the dot right past d You need a clear rule for moving a dot right.

More compact way of showing it
Moving the Dot Right Rule 2: Move a dot right by determining which symbols can come after (you need to understand regular expressions to do this). It may move to more than one place. Because regular expressions can be complex, it can be hard to do. A -> .a (b | c?)* d At the beginning After moving the dot right past a More compact way of showing it A -> a (.b | c?)* d A -> a (b | .c?)* d All valid A -> a (b | c?)* .d A -> a (.b | .c?)* .d A -> a (.b | .c?)* .d After moving the dot right past c (ignoring b and d) A -> a (.b | .c?)* .d Because of *, you can be back to .b and .c if you iterate again AND to .d if you don’t iterate again.

? Moving the Dot Right Now try A -> .a (b | c?)* d – .ab*
At the beginning After moving the dot right past a ? It can be done but it’s hard in general Can you write a program to do it. That’s why existing compiler books only allow grammars with productions of the form A -> abc | aBcde | g | PQ i.e., strings separated by “|” Such grammars don’t allow arbitrarily long handles; so, even if the parser automatically constructed trees (they don’t), you could not get a FUNCTION PARAMETER List with an arbitrary number of children.

We Need 2 Building Rules To Proceed
Push Right Rule (.a) Move the dot over the a to all places it can go. Notation for pushing right over a:  a Push Down Rule (.A) If the symbol is a nonterminal A, move the dot in front of each “starting” symbol on the right side of the A production. It’s possible to have a dot at the end if there is nothing to the left.. Notation for pushing down below A:  A

Use a set of dotted productions to represent a readahead state.
What’s the Plan? Use a set of dotted productions to represent a readahead state. dotted production 1 dotted production 2 The set obtained before down operations … dotted production 1 more dotted production 1 more dotted production 2 The set added by down operations … more dotted production n doing something repeatedly is sometimes called taking the closure Two states are equal if they contain exactly the same dotted productions; i.e. the same productions and the dots at exactly the same places.

To make sure there is no confusion, let’s use a really big dot
An example To make sure there is no confusion, let’s use a really big dot Augmented Grammar . G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. Two unequal states G’ -> '|-’ . G {EndOfFile} G -> . A *. A -> . a G -> . A *. A -> . a 2 productions, 3 dots 3 productions, 4 dots

An algorithm Create a “states” collection with the first one built from the extra production of the augmented grammar. Pick the next state in the collection to process it. Apply downs. Make a pass to determine the transitions. For each transition, Construct a potential successor by applying right. If it already exists (there is one equal to it already there), use the existing one. Otherwise, add the potential successor to the collection. This stops once all states have been processed

The algorithm in Pictures
… … … … … … states … Processed (downs all done) Unprocessed (no downs done) Step 2: Determine transition names Guarantees the FSM is deterministic Step 3: Compute potential successsors (rights, no downs) … empty a … empty initially A … empty … d Step 4: If they’re new, add to states (on the right), if it’s there already, refer to it. … empty Step 1: Pick a state to process. Fill in downs

Creating a Readahead FSM: Processing State 1
Augmented Grammar Use to increase the dot count. Can’t  G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. Use to compute successors. One, namely  State 1 NEW State 2 '|-' G’ -> .'|-’ G {EndOfFile} G’ -> '|-’ . G {EndOfFile} Step 2: Determine transition names Step 3: Compute potential successsors (no downs) Step 1: Fill in downs Using  Using  Step 4: It’s new (becomes state 2)

Processing State 2 G’ -> '|-’ G {EndOfFile} G -> A *. A -> a.
Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 2 State 2 G’ -> '|-’ . G {EndOfFile} G’ -> '|-’ . G {EndOfFile} 2 dots because of * G -> . A *. A -> . a Step 1: Fill in downs until there are no more downs to do Using 

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 2 G G’ -> '|-’ . G {EndOfFile} A G -> . A *. @G dot at the end A -> . a a Step 2: Determine transition names

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. New State 3 G’ -> '|-’ G {. EndOfFile} State 2 G G’ -> '|-’ . G {EndOfFile} New State 4 A G -> . A *. G -> . A *. @G A -> . a a New State 5 A -> a . Step 3: Compute potential successsors (no downs) All new, so they become state 3, 4, 5

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 3 NEW State 6 G’ -> '|-’ G {. EndOfFile} {EndOfFile} G’ -> '|-’ G {EndOfFile}. @G' Working ahead, we get this Step 1: Fill in downs Nothing Step 2: Determine transition names Step 3: Compute potential successsors (no downs) Its new, so it becomes state 6

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 4 State 4 G -> . A *. G -> . A *. A -> . a Step 1: Fill in downs until there are no more downs to do Using 

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 4 A G -> . A *. @G dot at the end A -> . a a Step 2: Determine transition names

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. Already exists as State 4 State 4 G -> . A *. A G -> . A *. @G Already exists as State 5 A -> . a a A -> a . Step 3: Compute potential successsors (no downs) The A-successor exists (compare only portion without downs) The a-successor exists (compare only portion without downs)

Augmented Grammar G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. State 5 @A A -> a . Step 2: Determine transition names Step 3: Compute potential successsors Step 1: Fill in downs Using  Using  Step 4: Nothing to do

Drawing it All Together
State 1 G’ -> .'|-’ G {EndOfFile} '|-' State 6 G’ -> '|-’ G {. EndOfFile} @G' State 2 State 3 G’ -> '|-’ . G {EndOfFile} G’ -> '|-’ G {. EndOfFile} {EndOfFile} G -> . A *. State 4 @G A -> . a G -> . A *. A @G A -> . a a G A a State 5 A -> a . @A

Discarding the Dotted Productions
'|-' and numbering… 1 @G' 6 2 {EndOfFile} 3 A This can be redrawn 4 a @G @G G A a @A 5

Redrawn Readahead FSM via dot pushing '|-' @G 1 2 @G' G {EndOfFile} 3 6 A 4 A a @G a @A 5 Readahead FSM via tracing Same FSM (6 states) though drawn slightly differently '|-' G {EndOfFile} @G' 1 2 3 4 @G @G 5 A A a a @A 6

Attaching a Type To Each State
Readahead FSM Ra = Readahead Ra '|-' Ra @G 1 2 Ra Ra @G' G {EndOfFile} 3 6 Ra A 4 A a @G Ra a @A 5

Difficult to automate the push right rule because the right side is a regular expression.  a The rest can be automated. No problem with looping states (taken care of automatically by the algorithm) Doing it by hand for a toy grammar is doable, but still not possible for a real grammar. The grammar for Java is probably a little less than 1000 productions.

Building Readahead FSMs Algorithmically

Just that we are using productions with FSMs in the right part.
What’s Different Just that we are using productions with FSMs in the right part.  a  A Still using and We will now call them relations. Let’s redefine the relations more exactly. With the relations, we can describe the algorithm concisely.

where r is an initial state Is a relation really a set?
2 Relations The right relation X = {(p,q) such that there is a production right part of the form } to go right... A -> … X … p q  a  A Still using and We will now call them relations. The down relation A = {(p,r) such that there is a production right part of the form and } ? -> … A … p q to go down... A -> … … r where r is an initial state Is a relation really a set?

Relations

a = b rather than [a,b] is in =
What’s a Relation A relation is a set of pairs; e.g., denote the equal relation on the non-negative integers as =. You’re already familiar with = but perhaps not this way of looking at it Then = is {[0,0], [1,1], [2,2], ...} domain range Denote the “add 1” relation on the non-negative integers as Plus1. Then Plus1 = {[0,1], [1,2], [2,3], ...} We are more used to writing a = b rather than [a,b] is in =

What Does It Mean To Apply A Relation
Applying a relation R to a set A (denoted AR) means constructing a new set B such that if p is in A and [p,q] is in R (or p R q), then q is in B. {10, 40} Plus1 = {11, 41} And you can apply a relation to the result too. More generally, you can use regular expressions on the relations and even inverse. Plus1-1 = {(q,p) | (p,q) is in Plus} {10, 40} Plus1 Plus1 = {11,41} Plus1 = {12,42} {10, 40} (Plus1 | Plus1 Plus1) = {11,41,12,42} {10, 40} Plus1? = {10,40,11,41} {10, 40} Plus1-1 = {9,39}

Given a relation R and the fact that a R b,
A Note About Relations Given a relation R and the fact that a R b, The set {a} R contains b among others The set {b} R-1 contains a among others For example, if a R b, a R c, d R b, then {a} R = {b, c} and {b} R-1 = {a, d} More sophisticated variations {a} R* = {a, b, c} and {a, d} R+ = {b, c} from {a}R from {d}R already there

Small Extension When we write the relation without indicating an X, as in , we mean the union of all X X equivalently, for any X When we write the relation without indicating an A, as in , we mean the union of all A A equivalently, for any A

Back to Building Readahead FSMs

where r is an initial state
Recall: 2 Relations The right relation X = {(p,q) such that there is a production right part of the form } to go right... A -> … X … p q X p q The down relation A = {(p,r) such that there is a production right part of the form and } ? -> … A … p q to go down... A A -> … … p r r where r is an initial state

Implications Our example grammar now looks like
G’ -> '|-’ G {EndOfFile} G -> A *. A -> a. We already have an algorithm that does this conversion '|-' G {EndOfFile} G' -> 1 2 3 4 G -> 5 A Let me call these right part states a 6 7 A -> We use the same algorithm as before substituting right part states for dotted productions

Instead of This G’ -> .'|-’ G {EndOfFile} '|-'
State 1 State 1 G’ -> .'|-’ G {EndOfFile} '|-' State 6 G’ -> '|-’ G {. EndOfFile} @G' State 2 State 2 State 3 G’ -> '|-’ . G {EndOfFile} G’ -> '|-’ G {. EndOfFile} {EndOfFile} G -> . A *. State 4 @G A -> . a G -> . A *. A @G A -> . a a G A a State 5 A -> a . @A These are all new states because no other state has the same set We need to build their successors too.

It Looks LikeThis State 1 '|-' G {EndOfFile} G' -> 1 2 3 4 1 '|-' G -> 5 A a Right part states 6 7 State 2 A -> State 3 2 3 State 6 {EndOfFile} @G' 5, 6 4 State 4 5 @G @G G A 6 Manually check it. a A What do symbols mean? a State 5 Do they contain state objects or state numbers? 6 @A These are all new states because no other state has the same set We need to build their successors too.

Programming Considerations
The readaheadState should contain state objects (not state numbers) That way, you can get the transitions you need directly from the right part state objects to compute right and down. You can even renumber the states without affecting anything If you insist on using state numbers, you will need a scheme to get the right part state objects from the state numbers ALL OVER THE PLACE.

This Suggest a Class Hierarchy
FiniteStateMachineState isInitial isFinal transitions ReadaheadState withoutDowns withDowns Past names Example unclosured closured State 2 The act of repeatedly applying down is called “taking the closure” in text books 2 withoutDowns = {2} withDowns = {2,5,6} 5, 6 It’s easier to compute right from 1 collection than it is to do it from 2 independent collections In smalltalk: withDowns := withoutDowns shallowCopy

Other Programming Consideration
Don’t need set objects. orderedCollection addIfIdenticalAbsent: anObject orderedCollection includesIdentical: anObject Uses == orderedCollection includesAllIdentical: aCollection orderedCollection includes: anObject Uses = orderedCollection includesAll: aCollection Consider an = implemented in class ReadaheadFSM = anotherReadaheadFSM self withoutDowns size = anotherReadaheadFSM withoutDowns size ifFalse: [^false]. self withoutDowns includesAllIdentical: anotherReadaheadFSM withoutDowns ifFalse: [^false]. anotherReadaheadFSM withoutDowns includesAllIdentical: self withoutDowns ifFalse: [^false]. ^true

When You Implement This
State 1 Don’t bother with transitions '|-' G {EndOfFile} G' -> 1 2 3 4 1 '|-' G -> 5 A a Right part states 6 7 State 2 A -> State 3 2 3 State 6 {EndOfFile} @G' 5, 6 4 State 4 5 @G @G Because you can tell what they are from the right part states (the ones that are final) G A 6 a A a State 5 7 @A These are all new states because no other state has the same set We need to build their successors too.

The readahead FSM was built using right and down operations; i.e., and
Review We described an algorithm that builds a readahead FSM that works from left to right to find a handle. The readahead FSM was built using right and down operations; i.e., and  a  A Can we describe the steps of the process in terms of the relations?

The Table Building Algorithm
Let the initial goal dotted production (or equivalent) for a augmented grammar be IG’. The initial readahead state before closure: {IG’} * The initial readahead state after closure: {IG’} Let the R be any readahead state after closure; i.e., a set of dotted productions (or equivalent) M The M-successor of R before closure: R Provided this set has something in it; i.e., something in it has an M-successor M * The M-successor of R after closure: R R M Provided has something in it

Nothing as far as it goes. But in the next section, we will want to build a readback FSM. If we don’t prepare for that task DURING THE PROCESS of building a readahead FSM, it will become difficult. This means you should NOT start to implement this YET. Let’s carry on with the next task. PS: Scanners don’t need readback, so you could implement it for scanners. But why implement it twice, wait for the parser version and use it for scanners too.

Building Readahead FSMs for Grammars

Similar presentations

Presentation on theme: "Building Readahead FSMs for Grammars"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building Readahead FSMs for Grammars

Similar presentations

Presentation on theme: "Building Readahead FSMs for Grammars"— Presentation transcript:

Similar presentations

About project

Feedback