Chapter 4 - Part 3: Bottom-Up Parsing

Chapter 4 - Part 3: Bottom-Up Parsing
Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT (860) Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre

Basic Intuition Recall that LL(k) works Our new motto LR(k) works
TOP-DOWN With a LEFTMOST Derivation Predicts the right production to select based on lookahead Our new motto LR(k) works BOTTOM-UP With a RIGHTMOST Derivation Commits to the production choice after seeing the whole body (left hand side), working in “reverse”

Bottom-Up Parsing Inverse or Complement of Top-Down Parsing
Top Down Parsing Utilizes “Start Symbol” and Attempts to Derive the Input String using Productions Bottom-Up Parsing Makes Modifications to the Input String which Allows it to Reduce to Start Symbol For Example, Consider Grammar & Derivations: S  a A B e A  Abc | b B  d What Does Each Derivation Represent? Top-Down Leftmost Derivation Bottom-Up ---- Rightmost Derivation in Reverse! abbcde  aAbcde  aAde  aABe  S  S  aABe  aAbcBe  abbcBe  abbcde

Type of Derviation Grammar: S  a A B e A  Abc | b B  d Key Issues:
How do we Determine which Substring to “Reduce”? How do we Know which Production Rule to Use? What is the General Processing for BUP? How are Conflicts Resolved? What Types of BUP are Considered? TDP: S  aABe  aAbcBe  abbcBe  abbcde BUP: S  aABe  aAde aAbcde  abbcde Is a rightmost derivation that happens in reverse!

What is a Handle? Defn: A Right-Sentential Form is Sentential Form that has Been Derived in a Righmost Derivation S  aABe  aAde aAbcde  abbcde Underline all Right Sentential Forms Handle is a Substring of a Right Sentential Form that: Appears on Right Hand Side of Production Rule Can be Used to Reduce the Right Sentential Form via a Substitution in a Step of a RM Derivation Formally is a rule A → β and position in Right Sentential Form γ s.t. S  RM* αAw  RM αβw and A occurs at γ in αAw Example: Handles are Underlined in: Abc is Right hand Side of Rule A → Abc at Position 2 in Right Sentential Form γ = aAbcde

What is a Handle? Consider again... S  aABe  aAde  aAbcde  abbcde
A → Abc | b B → d

Handle Pruning What bottom-up really means... abbcde aAbcde

Handle Pruning aAbcde aAde

Handle Pruning aAde aABe

Handle Pruning aABe S

What’s Going on in Parse Tree?
Consider Right Sentential Form: αβw and Rule A  β S A α What Does α Signify? w β What Does w Contain? What Does β Represent? Input Processed Still on Parsing Stack Input yet to be Consumed Candidate Handle to be Reduced

Bottom-Up Parsing … Recognized body of last production applied in rightmost derivation Replace the symbol sequence of that body by the RHS of the Production Rule Based on “Current” Input Repeats At the end Either We are left with the start symbol  Success! Or We get “stuck” somewhere  Syntax error! Key Issue: If there are Multiple Handles for the “Same” Sentential Form, then the Grammar G is Ambiguous

General Processing of BUP
Basic mechanisms “Shift” “Reduce” Basic data-structure A stack of grammar symbols (Terminals and Non-Terminals) Basic idea Shift input symbols on the stack until ... the entire handle of the last rightmost reduction When the body of the last RM reduction is on Stack, reduce it by replacing the body by the right-hand-side of the Production Rule When only start symbol is left We are done.

Example $ abbcde$ Shift $a bbcde$ $ab bcde$ Reduce $aA $aAb cde$ $aAbc
$aAd e$ $aAB $aABe $S Accept Rule to Reduce with Handle

$aAd e$ $aAB $aABe $S Accept Handle Rule to Reduce with

$aAd e$ $aAB $aABe $S Accept

Key Observation At any point in time
Content of the stack is a prefix of a right-sentencial form This prefix is called a viable prefix Check again! Below = all the right-sentencial form of a rightmost derivation S  aABe  aAde  aAbcde  abbcde $ $a $ab $aA $aAb $aAbc $aAd $aAB $aABe $S

What is General Processing for BUP?
Utilize a Stack Implementation: Contains Symbols, Non-Terminals, and Input Input is Examined w.r.t. Stack/Current State General Operation: Options to Process Stack Include: Shift Symbols from Input onto Stack When Handle β on Top of Stack Reduce by using Rule: A  β Pop all Symbols of Handle β Push Non-Terminal A onto Stack When Configuration ($S, $) of Stack, ACCEPT Error Occurs when Handle Can’t be Found or S is on Stack with Non-Empty Input

Consider the Example Below

What are Possible Grammar Conflicts?
Shift-Reduce (S/R) Conflict: Content of Stack and Reading Current Input More than One Option of What to do Next stmt  if expr then stmt | if expr then stmt else stmt | other Consider Stack as below with input of token else $ …. if expr then stmt Do we Reduce if expr then stmt to stmt Do we Shift “else” onto Stack?

What are Possible Grammar Conflicts?
Reduce-Reduce (R/R) Conflict: stmt  id ( parameter_list ) parameter_list  parameter_list, parameter parameter  id expr  id ( expression_list ) | id expression_list  expression_list, expr | expr Consider Stack as below with input of token $ …. id (id, … , id) …. Do we Reduce to stmt? Do we Reduce to expr?

Bottom-Up Parsing Techniques
LR(k) Parsers Left to Right Input Scanning (L) Construct a Rightmost Derivation in Reverse (R) Use k Lookahead Symbols for Decisions Advantages Well Suited to Almost All PLs Most General Approach/Efficiently Implemented Detects Syntax Errors Very Quickly Disadvantages Difficult to Build by Hand Tools to Assist Parser Construction (Yacc, Bison)

Components of an LR Parser
Table Generator Grammar Parsing Table Driver Routines Parsing Table Output Parse Tree Input Tokens Differs Based on Grammar/Lookaheads Common to all LR Parsers

Three Classes of LR Parsers
Simple LR (SLR) or LR(0) Easiest but Limited in Grammar Applicability Grammar Leads to S/R and R/R Conflicts Canonical LR Powerful but Expensive LR(k) – Usually LR(1) Lookahead LR (LALR) – In Between Two Two Fold Focus: Parser Table Construction – Item and Item Sets Examination of LR Parsing Algorithm

LR Parser Structure a1 ... ai ai an$ sm Xm sm-1 Xm-1 … X1 s0 INPUT (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai an $) O U T P Grammar symbol (Terminal or non-terminal) LR Parsing Program state action goto action[sm , ai ] is Parsing Table with Four Options 1. Shift S onto Stack 2. Reduce by Rule 3. Accept ($,$) 4. Report an Error goto[sm , ai ] determines next state for action Question: What does following Represent? X1 X2 ... Xm-1 Xm ai ai an

What is the Parsing Table?
Combination of State, Action, and Goto Shift s5 means shift input symbol and state 5 Reduce r2 means reduce using rule 2 goto state/NT indicates the next state

Actions Against Configuration
Configuration: (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm , ai ai an $) action[sm , ai ] = Shift s in Parsing Table – Move aism+1 to Stack (s0 X1 s1 X2 ... Xm-1 sm-1 Xm sm ai sm+1 , ai an $) Reduce A  β means Remove 2×| β| symbols from stack and Push A along with state s = goto[sm-1 , A] onto stack Uses Prior State after popping to determine goto Accept – Parsing Complete Error – Call recovery Routine

How Does BUP Work? Stack Input Action

Another Detailed Example

Constructing Parsing Tables
Three Types of Parsers (SLR, Canonical, LALR) all have Shared Concept for Parsing Table Construction An Item Characterizes for Each Grammar Rule What we’ve Seen or Derived What we’ve Yet to See or Derive Consider the Grammar Rule: E → E + T There are Four Items for this Rule E → . E + T E → E . + T E → E + . T E → E + T . E . + T Means we’ve Derived E and have yet to Derive + T, so we are Expecting “+” Next Note: A → ε has Item A → . ____.____ Has To Be Been Seen/ Seen/ Derived Derived

Another Characterization of Items
Consider the Grammar Rule: E → E + T There are Four Items for this Rule E → . E + T E → E . + T E → E + . T E → E + T . This Represents Summary of History of Parse Each Item Refers to: What’s Been Placed on Stack (Left of “.”) What Remains to Reduce for a Rule (Right of “.”) E → E + . T on stack left to derive/reduce Seen a string derived from E Looking for String Derivable from T Found input through the “+” Yet to process input for T

Start with SLR Parsing Table Construction
Step 1: Construct an Augmented Grammar which has a Single Alternative/Production Rule: Now, Every Derivation Starts with the Production Rule: E’ → E $ Original E → E + T E → T T → T * F T → F F → ( E ) F → Id Augmented E’ → E $ E → E + T E → T T → T * F T → F F → ( E ) F → Id

Start with SLR Parsing Table Construction
Step 2: Construct the Closure of All Items Intuitively, if A → α . B β is in Closure, we would Expect to see B β at Some Point in Derivation If B → γ is a Production Rule, Expect to see a Substring Derivable from γ in Future Step 3: Compute the GOTO (Item_Set, X), where X is a Grammar Symbol Intuitively, Identifies Which Items are Valide for Viable Prefix γ Utilized to Determine Next Action (State) for the Parser Note: Different from goto as Previously Discussed!

Calculating Closure Closure ([I]) where I is Set of Items
1: E’→ E $ 2: E→ E + T 3: E → T 4: T → T * F 5: T → F 6: F → ( E ) 7: F → Id Closure ([I]) where I is Set of Items All Items in I are in Closure ([I]) If A → α . B β in Closure ([I]) and B → γ is a Production Rule, then Add B → . γ to Closure ([I]) Repeat Step 2 Until there are No New Items Added I0 = Closure ([E’ → . E]) --- Add in Following Items E’ → . E - Rule 1 - Any Rules E → γ - Yes… E → . E + T - Rule 2 E → . T - Rule 3 - Any Rules T → γ - Yes… T → . T * F - Rule 4 T → . F - Rule 5 - Any Rules T → γ - Yes… F → . ( E ) - Rule 6 F → . id - Rule 7

What’s Next Step? Recall the Parsing Table
States are 0, 1, 2, … 11 which Correspond to Item Sets actions based on Input and Current State goto is What State to Transition to Next This is a Push Down Automata! What are Three Critical Functions to Calculate? State closure To compute the set of productions in a given state Transition function To compute the states reachable from a given state Items To compute the set of states in the PDA

What is Important Part of Process?
Viable Prefix Definition (1) a string that equals a prefix of a right-sentential form up to (and including) its unique handle. (2) any prefix of a string that satisfies (1) Essentially a subset of a right-sentential form May be inclusive of entire handle (right hand side of a production rule) Examples of Viable Prefixes are: a, aA, aAd, aAbc, ab, aAb,… Not viable prefixes: aAde, Abc, aAA,…

What is The Big Deal ? Consider the stack again
Each Element of Stack Represents a right sentential form They are all Viable Prefixes When Parsing, two Alternatives: lengthening a viable prefix pruning a handle In other words... States represent viable prefixes We transition between viable prefixes! $ $a $ab $aA $aAb $aAbc $aAd $aAB $aABe $S Answer: We are either -

Intuition for this Process
Objective Turn a Grammar into a PDA We want A PDA With states the capture viable prefixes We have A grammar With production rules We know that Production rules are used to derive handles Viable prefixes are (strings) prefixes of handles

Example Consider augmented grammar given below…. Assume that
We start the parsing (with E’) and therefore We are at the initial state of the PDA We have some input: (e.g., id + id * id) Questions Which productions are activated at this point ? In other words, which productions could be used to match the rest of the input ? 1: E’ → E $ 2: E → E + T 3: → T 4: T → T * F 5: → F 6: F → ( E ) 7: → Id

Example II Consider the Derivation Given Below…
In Example, Production Rules: 1,2,3,5,7 are active and utilized to “lead” to the viable prefix “id” 1: E’ → E $ 2: E → E + T 3: → T 4: T → T * F 5: → F 6: F → ( E ) 7: → Id E’  E $ by (1)  E + T $ by (2)  T + T $ by (3)  F + T $ by (5)  id + T $ by (7) ....

PDA State (Closure([E’ → E $])
A PDA State is... The set of productions that are active in the state Question How do we compute that from G ? 1: E’ → E $ 2: E → E + T 3: → T 4: T → T * F 5: → F 6: F → ( E ) 7: → Id State I0 E’ → . E $ E’ → . E $ E → . E + T E’ → . E $ E → . E + T E → . T E’ → . E $ E → . E + T E → . T T → . T * F T → . F E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id

PDA Transition How can we leave state I0 ?
What does it mean to leave I0 ? Terminals – mean’s that we’ve Consumed the terminal from the input stream Non-terminals – mean’s that we have pushed onto the stack the non-terminal, input, and states that will allow for a future reduction State I0 E’→ . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id E T F ( Id This defines the GOTO Function!

The GOTO Function GOTO(I, X) is Defined for An item set I
A grammar symbol (non-terminal or terminal) X GOTO(I, X) = {items [A → α X . Β] where A → α . X β in I} Algorithmically: Look for Rules of Form: A → α . X β Identify the Grammar Symbols in I to Right of “.” Group all A → α . X β with Same “X” to Form a New State Compute the Closure of the New State for All X This leads to …

Destination states State I0 E’ → E . $ GOTO(I0, E) E → E . + T
T → . T * F T → . F F → . ( E ) F → . Id GOTO(I0, T) State I2 E → T . T → T . * F GOTO(I0, F) State I3 T → F . State I4 GOTO(I0, ( ) F → ( . E ) E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id State I5 GOTO(I0, id ) F → Id .

Destination states State I0
GOTO(I0, ( ) F → ( . E ) E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id For GOTO(I0, ( ) we compute Closure([F→ ( . E ) ]) Since E→ E + T and E→T, include E→ . E + T, E → . T Since T→ T * F and T→F, include T→ . T * F, T → . F Since F→ ( E ) and F→ Id, include F→ . ( E ) , F → . Id Now, compute GOTO(I1, X ) for X = E, T, F, ( , Id

What Does it Mean when “.” at End of Rule?
GOTO(I0, T) State I2 E → T . T → T . * F State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id GOTO(I0, F) State I3 T → F . State I5 GOTO(I0, id ) F → Id . For the Three States above, the “.” Occurs at the end of an Item E→ T . and T→ F . and F→ id . Each if these is a “Reduction” to Replace T by E on Stack T by F on Stack F by id on Stack

How is this Interpreted …
State I0 E’ → . E $ E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id GOTO(I0, E) State I1 E’ → E . $ E → E . + T Represents the Possible Next Steps in a Derivation Consider Symbol Directly to Right of “.” That is what we Expect to see Next in a Derivation For two Rules, we Expect to See “E” Move “.” to Right to Consume “E” for Both Production Rules We’ve Seen “E” We expect to see What Follows “.” Next Now, Compute: Closure([E’→ . E $, E→ . E + T]) = State I1 E’→ . E $ E→ . E + T

Continue Process to Yield …
The State Machine also Represents Viable Prefixes Possible Combinations that appear on Parsing Stack

Viable Prefixes and Valid Items
Consider a Derivation: Let α β1 be a Viable . Prefix A → β1 . β2 is Valid Item if the above derivation exists When α β1 is on the Parsing Stack – Two Cases: If β2 ≠ ε Then we Don’t have Handle on Stack If β2 = ε Then Perhaps A → β1 is the Reduction However, Reduction Choice may not be Limited to a Single Production Rule: There may be two or more Valid Items for the Same Viable Prefix! Shift/Reduce or Reduce/Reduce Conflicts Possible! S’  α A w  α β1 β2 w * rm

How Does this Relate to State Machine?
Consider the Viable Prefix E+T* Each State in Machine Represents a Set of One or More Items Specifically, for E+T*, we end up in State I7 if you Follow the Transitions of the State Machine

Consider the State Item Set is: with three possible derivations:
Which do you Choose? Why? T → T * . F F → . ( E ) F → . Id E’  E  E + T  E + T * F E’  E  E + T  E + T * F  E + T * ( E ) E’  E  E + T  E + T * F  E + T * id

End Result of Process? Machine that Contains All Item Set States
Transitions Between States on Terminals Non-Terminals What do we need this for? To Construct the Parsing Table!

What’s Next Step? Constructing SLR Parsing table action[state,symbol]
goto[state,symbol] Easy Part of this Process: Determining “shift” actions Examine Machine for all terminal transitions These are “shifts” from one state to next Push both the terminal and state onto parsing stack More Difficult Part of this Process: Reductions are Items with “.” at End of Item Two Questions What is the “input” that Determines Correct Reduction? What is the “state” to push onto Stack?

Recall First and Follow Calculations
Recall the Grammar: First (E’) = First (E) = First (T) = { (, id } Follow (E’) = {$} Follow (E)={First( +T ), First( ) ), First ($)}={+, ), $ } Follow (T)={Follow (E), First (*F)} = {+, ), $, * } Follow (F) = {Follow(T)} = {+, ), $, * } 1: E’ → E $ 2: E → E + T 3: → T 4: T → T * F 5: → F 6: F → ( E ) 7: → Id

Return to Item Sets Suppose an Item Set Contains the Item: A → α .
When Reach this Item it is Time to Reduce and Replace α on the Stack with A However, What is the “Input” under which this Reduction is Allowed to Occur? Want to Replace α with A Reading some current input x Only Do the Reduction if x in Follow (A) Consider Two Reductions in a Same Item Set: A → α . and B → α . and current input x If x in Follow (A), reduce using A → α If x in Follow (B), reduce using B → α If x in both, Reduce/Reduce Error! We’ll See Two Examples Shortly …

Back to Item Sets/State Machine
RED underlines are all shifts with associated gotos BLUE circles are all gotos for non-terminals GREEN underlines are all reductions Reductions are based on Follow

Action and goto tables 1: E’→ E $ 2: E→ E + T 3: E → T 4: T → T * F 5: T → F 6: F → ( E ) 7: F → Id Action contains shifts, reduction, and accept (green) All other states are error states Goto contains the next state to shift onto the stack State id + * ( ) $ E T F 5 4 1 2 3 6 7 8 9 10 11 State id + * ( ) $ E T F S 1 2 R2 3 R4 4 5 R6 6 7 8 9 R1 10 R3 11 R5

Formal Algorithms To Calculate the Parsing Table, we Require Three Algorithms State closure To compute the set of productions in a given state Transition function To compute the states reachable from a given state Items To compute the set of states in the PDA Algorithms from Prof. Michel …

State Closure Algorithm
function closure(set{Item} I) : set{Item} { set{Item} J0 = I; repeat Ji+1 = Ji; for each A→α.Bβ in Ji and each B→γ in P s.t. B→.γ in Ji Ji+1 = Ji+1 ∪ { B → .γ } i = i + 1; until Ji = Ji-1; return Ji; }

GOTO Function function GOTO (set{Item} s,symbol X) : set{Item} {
set{Item} J = ε; for each c in s if c of the form A→α.Xβ J = J ∪ { A→αX.β } return closure(J); }

All State Functions (set-of-items)
function items(Grammar G’) : set{State} { set{State} C0 = { closure({S’ →.S}) }; i = 0; repeat Ci+1 = Ci; for each S in Ci and each symbol X in G’ Z = goto(S,X); if Z ≠ ε AND Z in Ci then Ci+1 = Ci+1 ∪ { Z }; i = i + 1; until Ci = Ci-1; return Ci; }

Using Ambiguous Grammars
Ambiguous Grammars will Cause Multiple Entries for a given state/terminal in Parsing Table Results in Two Types of Conflicts Shift/Reduce Conflicts Reduce/Reduce Conflicts Compiler Writing Tools (Yacc, Bison, etc.) Automatically Resolve these by: For Shift/Reduce – chooses Shift For Reduce/Reduce – Reduce by “earlier” rule Consider Two Examples Dangling Else Simplified Expression Grammar

Dangling Else Ambiguity
Recall the Grammar: stmt  if expr then stmt else stmt | if expr then stmt | other Rewrite the Grammar as: s  i s e s | i s | a Essentially collapsing “expr then stmt” into “s” and with “a” representing all other statements Now Compute LR(0) Items and SLR Parsing Table

The Item Sets for the Grammar
Follow(s’)= $ Follow(s)=$, e s I0: s’  .s s  . i s e s s  . i s s  . a I1: s’  s . i I4: s  i s . e s s  i s . I2: s  i . s e s s  i . s s  . i s e s s  . i s s  . a s a a e i I3: s  a . a I5: s  i s e . s s  . i s e s s  . i s s  . a s I6: s  i s e s .

The Parsing table State action goto i e a $ s 0 s2 s3 1 1 acc 2 s2 s3 4 3 r2 r3 4 s5 r2 r2 5 s2 s3 6 6 r1 r1 Follow(s’)= $ Follow(s)=$, e Rules: s  i s e s s  i s s  a Notice s/r conflict for action[4,e] if <expr> then <stmt> else <stmt> If shift on else what is the result w.r.t. language? If reduce else on what is the result w.r.t language?

Solution to Dangling Else
Pick Shift over Reduce: action[4, e] = s5 Consider input iiaea which is equivalent to: if <expr> then if <expr> then <stmt> else <stmt> Parser as follows w.r.t. stack/input: Using this approach, we eliminate the need for a more complex unambiguous grammar with more rules $ …. ea$ shift e $ ….e a$ shift a $ ….e...a $ reduce using s  a $ ….e $ reduce using s  i s e s $ ..i $ reduce using s  i s $ $ accept

Example 2 – Simplified Expression Grammar
Consider the Grammar: E  E + E | E * E | ( E ) | id What’s Problem with this Grammar? Why would this Grammar be Preferable? Employ Techniques Similar to Previous Example to Remove Multiple Table Entries Result is to Achieve both Associative and Precedence Behavior for + and * Change Assoc/Precedence by Changing Table No more Extra Work  Improve Performance

First, Calculate Item Sets
Follow(E’)= $ Follow(E)=$, +, *, ) E id ( I0: E’  .E E  . E + E E  . E * E E  . (E) E  . id I1: E’ E. E  E . + E E  E . * E + * E id ( I3: E  id . I4: E  E + . E E  . E + E E  . E * E E  . (E) E  . id I5: E  E * . E E  . E + E E  . E * E E  . (E) E  . id ( id E E id ( I2: E  (.E) E  . E + E E  . E * E E  . (E) E  . id ( + * * + + * I6: E  (E.) E  E . + E E  E . * E I7: E’ E + E. E  E . + E E  E . * E I8: E’ E * E. E  E . + E E  E . * E I9: E  (E).

Consider States I7 and I8 State I7 E’ E + E. action[7,+] = reduce by E  E + E action[7,*] = reduce by E  E + E action[7,)] = reduce by E  E + E action[7,$] = reduce by E  E + E E  E . + E action[7,+] = shift to state 4 E  E . * E action[7,*] = shift to state 5 State I8 action[7,+] = reduce by E  E + E or shift to state 4 action[7,*] = reduce by E  E + E or shift to state 5 How is Each Conflict Resolve?

Parsing Table: State action goto id + * ( ) $ E 0 s3 s2 1 1 s4 s5 acc
3 r4 r4 r4 r4 4 s3 s2 8 5 s3 s2 8 6 s4 s5 s9 7 r1 s5 r1 r1 8 r2 r2 r2 r2 9 r3 r3 r3 r3 Rules: 1 E’  .E 2 E  . E + E 3 E  . E * E 4 E  . (E) 5 E  . id “+”is left assoc Shift “*” onto stack since it has higher precedence Reduce using rule 2 regardless of + or *

Canonical Parser Table Construction
Not all Parser Tables are Created Equally! Differentiate between SLR/LR(0), LR(1), and LALR(1) (Yacc/Bison) Key Issue: Utilization of Lookaheads SLR – Current Input LR(1) – Current Input plus Next Token LR(k) – Current Input plus Next k Tokens Consider id + id * id LR(1) – id determines if shift or reduce – 2nd token (+) determines rule – if conflict, 2nd token can break tie – on the fly dis-ambiguity – sometimes s, sometimes r – depends on that 2nd toek SLR/LR(0) Current Input

Recall the Prior Grammar
Item set I0 as given below left … For LR(1) items, we must consider basis on which the rule causes a shift on a lookahead terminal When we put E’→ . E into LR(1) set, we must also consider the first terminal that appears after E This is the lookahead… LR(0) E’ → . E E → . E + T E → . T T → . T * F T → . F F → . ( E ) F → . Id Step 1: LR(1) E’ → . E, $ E → . E + T, $ E → . T, $ Step 2: LR(1) E’ → . E, $ E → . E + T, $/+ E → . T, $ Step 3: LR(1) E’ → . E, $ E → . E + T, $/+ E → . T, $/+ What appear after E in 2nd Item? If it appears after E, what else does it appear after?

Another Way to View Process …
Closure[E’→ E] begins with placing: E’ → . E, $ into the item set… Since E → E + T, we place: E→ . E + T, $ into item set carrying along lookahead $ from E’→ . E, $ Now, for E→ . E + T, what can “E” on right hand side be replaced with? E → E + T again! If we do this replacement, we need to ask what is the lookahead that follows E on r.h.s. in E → E + T ? We calculate First (+T) the remainder of the rule This is “+” so we add in this additional lookahead E’ → . E, $ E → . E + T, $ E → . E + T, + E’ → . E, $ E → . E + T, $/+ We abbreviate this as …

Continuing … Since E → T, we add: E→ . T, $/+ into the Set
Now, what does T go to? T → T * F and T→ F So we add: T → . T * F, $/+ and T→ . F , $/+ into Set What can T go to? T → T * F What is the First token following T? First (*F) = * So, add in: * to get: T → . T * F, $/+/* Since T→ F, we also add “*” to yield: T→ . F , $/+/* Are we done?

Continuing … Since T → . F, we now consider the two F rules: F → ( E ) and F → Id We add in the items: F → . ( E ), $/+/* F → . Id, $/+/* bringing along the lookaheads from T→ . F , $/+/* The lookaheads in this case are: First (what follows F concatenated with $/+/*) This is $/+/*! We arrive at item set I0 : LR(1) E’ → . E, $ E → . E + T, $/+ E → . T, $/+ T → . T * F , $/+/* T → . F , $/+/* F → . ( E ) , $/+/* F → . Id , $/+/*

Another Example … LR(0) Sets
S’  S S’  CC C  cC | d Follow(S’)= $ Follow(S)=$ Follow(C)=c,d,$ S I0: S’  .S S  . CC C . cC C  . d I1: S’  S . C I2: S  C.C C . cC C  . d C I5: S  CC. d c d c I3: C c.C C . cC C  . d C I4: C  d . d I6: C  cC . c

Now Consider … LR(1) Sets
Follow(S’)= $ Follow(S)=$ Follow(C)=c,d,$ S I0: S’  .S, $ S  . CC, $ C . cC, c/d C  . d , c/d I1: S’  S ., $ C I2: S  C.C, $ C . cC, $ C  . d, $ C c I5: S  CC., $ d c d I6: C c.C, $ C . cC, $ C  . d, $ d I4: C  d ., c/d I7: C  d ., $ d C c I3: C c.C, c/d C . cC, c/d C  . d, c/d I9: C  cC ., $ C c I8: C  cC ., c/d

Parsing Table Easy to Construct from the State Machine …
Shifts on terminals (arcs) Reductions based on lookaheads Gotos as with SLR case State action goto c d $ S C 0 s3 s4 1 2 1 acc 2 s6 s7 5 3 s3 s4 8 4 r3 r3 5 r1 6 s6 s7 9 7 r3 8 r2 r2 9 r2

What’s Real Problem Here?
Grammar we used with 3 Production Rules Result was 10 LR(1) states! For Expression Grammar (slide 58), LR(1) would have 22 states! Lookahead LR Parsing (LALR), on which Compiler Tools (Yacc, Bison) are Based, Achieve Similar Results with Less States Objective is to Create LR(1) Sets Identify Sets with Similar Cores (Items are the same but lookaheads may be different) Merge Sets with Similar Cores Factor of 10 in Reduction of States

What are the Similar Cores?
I0: S’  .S, $ S  . CC, $ C . cC, c/d C  . d , c/d I1: S’  S ., $ C I2: S  C.C, $ C . cC, $ C  . d, $ C c I5: S  CC., $ d c d I6: C c.C, $ C . cC, $ C  . d, $ d I4: C  d ., c/d I7: C  d ., $ d C c I3: C c.C, c/d C . cC, c/d C  . d, c/d I9: C  cC ., $ C c I8: C  cC ., c/d

Resulting State Machine …
I0: S’  .S, $ S  . CC, $ C . cC, c/d C  . d , c/d I1: S’  S ., $ C I2: S  C.C, $ C . cC, $ C  . d, $ C c I5: S  CC., $ d d c I36: C c.C, c/d/$ C . cC, c/d/$ C  . d, c/d/$ d I47: C  d ., $/c/d C c I89: C  cC ., $/c/d

… With Simplified Parsing Table
State action goto c d $ S C 0 s36 s 1 acc 2 s36 s47 5 36 s36 s 47 r3 r3 r3 5 r1 89 r2 r2 r2

Parser Generators The entire process we describe can be automated
Computation of the machine states Computation of the lookaheads Computation of the action and goto tables Optimization of the LALR tables. Therefore... Tools exist to do this for you!

Parser Generators II Table-driven leftmost In the C/C++ world
Most famous parser generator YACC LALR(1) Most used parser generator BISON LALR(1) Table-driven leftmost PCCTS LL(k) In the Java world Several alternatives CUP (a BISON/YACC lookalike) LALR(1) JACK LALR(1)

Big Picture

The Road Ahead What are we missing ? A parse tree!
How can we get one ? By augmenting the grammar! With actions [pieces of Java code] Purpose of actions Manufacture the tree as a side-effect of parsing. Reading Syntax directed translation via Attribute Grammars Yacc

Chapter 4 - Part 3: Bottom-Up Parsing

Similar presentations

Presentation on theme: "Chapter 4 - Part 3: Bottom-Up Parsing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 - Part 3: Bottom-Up Parsing

Similar presentations

Presentation on theme: "Chapter 4 - Part 3: Bottom-Up Parsing"— Presentation transcript:

Similar presentations

About project

Feedback