Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing
Bottom Up Parsing Assume we have a group of tokens Bottom up parsing tries to group tokens into things it can reduce Like our precedence rules make 1 + 2 * 3 Understood to be (1 + (2 * 3)) Same idea but with bottom up parsing we use special operators: Wirth-Weber
Wirth-Weber Operators x < y y has higher precedence than x x = y x and y have equal precedence x > y x has higher precedence than y
Operation Given a b c d e f g h Keep in mind that the letters represent tokens thus they can be operators, operands, special symbols, etc
Operation Given < a b c d e f g h > Assumed
Operation Given < a = b c d e f g h > Parser moves through tokens comparing precedence
Operation Given < a = b > c d e f g h > A series starting with < and ending with > is called a handle. When a handle is found the parser looks for a match with the right hand side of some production
Operation Given < a = b > c d e f g h > Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r c d e f g h > Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r c d e f g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c d e f g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c < d e f g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c < d = e f g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c < d = e = f g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c < d = e = f > g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c < d = e = f > g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c s g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c = s g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c = s > g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < r = c = s > g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < t g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < t = g h > We continue Assume the following is in effect r ::= a b s ::= d e f t ::= r c s
Operation Given < t = g = h > If there is a rule for: "t g h" successful parse If not, syntax error
What kind of algorithm? Stack based Known as semantic stack or shift/reduce algorithm
Example < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e = z = g > < y = m > < n > If this is the start symbol: Successful Parse!
< a = b < c = d > e < f > g > Stack a, < < a = b < c = d > e < f > g > We process the tokens adding the token and the precedence operator on its left to the stack. Encountering a > indicates a handle has been found and thus will initiate special proceesing •
Stack b, = a, < < a = b < c = d > e < f > g >
Stack c, < b, = a, < < a = b < c = d > e < f > g >
Stack d, = c, < b, = a, < < a = b < c = d > e < f > g >
Stack d, = c, < b, = a, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > Finding a ">" between d and e, we trace back...
Stack x, > b, = a, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > x is not really on stack because of the: >
Stack x, < y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > We trace back to process the ab handle and then x goes on stack
Stack e, = x, < y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > Continuing...
Stack f, < e, = x, < y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > < y < x = e = z = g > Converting f to z
Stack g, = f, < e, = x, < y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > < y < x = e = z = g >
Stack #, > g, = z, < e, = x, < y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > < y < x = e = z = g > # signifies end of input
Stack #, > m, = y, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > < y < x = e = z = g > < y = m > Reducing
Stack #, > n, < < a = b < c = d > e < f > g > < a = b > x = e < f > g > < y < x = e < f > g > < y < x = e = z = g > < y = m > < n > If n is our start symbol: successful parse!
Shift/Reduce Parsing x < y x = y Shift x > y ? Reduce Error: No relationship Syntax error Not allowed: More than one relationship Actually < = would be okay, >= or >< is called shift/reduce error
Shift/Reduce Parsing Reduce-Reduce Error No Right Hand Side Match When trying to match, parser finds two rules which match No Right Hand Side Match Syntax error
Simple Precedence Only Possibilities x < y x = y x > y No relationship exists (No others allowed)
Bottom-Up Parsing No issues regarding left-recursive versus right-recursive such as those found with Top-down parsing Note: There are grammars that will break a bottom-up parser.
Precedence Consider our simple grammar... Between symbols there must be ? <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id
Precedence Consider our simple grammar... Between symbols there must be = <expr> ::= <expr> = + = <term> | <term> <term> ::= <term> = * = <factor> | <factor> <factor> ::= '(' = <expr> = ')' | num | id
Precedence To determine "end points" we must look at multiple rules to see how they interact... x c = d
Precedence To determine "end points" we must look at multiple rules to see how they interact... x c = d To determine what goes here...
Precedence To determine "end points" we must look at multiple rules to see how they interact... x c = d We look here.
Precedence <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id + = <term> * = <factor> + < <factor> * < ( = <expr> = ) + < ( = <expr> = )
Precedence Table <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> <term> <factor> + * ( ) num id
Precedence Table <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> <term> <factor> + * ( ) num id
Precedence Table <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> = = <term> > = > <factor> > > > + < < < < * = < < < ( < < < ) > > > num > > > id > > >
Precedence Table '(' <expr>... '(' < '(' <expr>... <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> = = <term> > = > <factor> > > > + < < < < * = < < < ( < < < '(' <expr>... '(' < '(' <expr>... ) > > > num > > > id > > >
Precedence Table <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> = = <term> > = > <factor> > > > + = < < < < < * = < < < ( = < < < < ) > > > num > > > id > > >
Precedence Table '(' = <expr> ')' '(' < <expr> + <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id L R <expr> <term> <factor> + * ( ) num id <expr> = = <term> > = > '(' = <expr> ')' <factor> > > > + = < < < < < * = < < < ( = < < < < ) > > > '(' < <expr> + num > > > id > > >
Resolving Ambiguity + = <term> + < <term> * <factor> Solve by lookahead: + = <term> + or ) + < <term> * Ambiguity can be resolved by increasing k in LR(k) but that's not the only way: We could rewrite grammar
Original Grammar <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id
Sources of Ambiguity <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id
Add 2 New Rules <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id <e> ::= <expr> <t> ::= <term>
Modify <expr> ::= <expr> + <t> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <e> ')' | num | id <e> ::= <expr> <t> ::= <term>
Original Grammar Rewritten Grammar <expr> ::= <expr> + <term> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <expr> ')' | num | id Rewritten Grammar <expr> ::= <expr> + <t> | <term> <term> ::= <term> * <factor> | <factor> <factor> ::= '(' <e> ')' | num | id <e> ::= <expr> <t> ::= <term>
Performance Size of table is O(n2) If we use operator precedence then it only uses terminals thus the table size is not affected by adding non-terminals
Question 1 a < b = c = d > e a ? x e What happens if there is no relationship here?
Question 2 Where do precedence relationships come from? Make a table by hand Write a program to make table How such a program works or how to write it are topics beyond the scope of this course.
Example
< num > + 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < num > + < <factor> > + 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * < <expr> = + < <term> = * < num 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * < <expr> = + < <term> = * < num < <expr> = + < <term> = * < num > 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * < <expr> = + < <term> = * < num < <expr> = + < <term> = * < num > < <expr> = + < <term> = * = <factor> > 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * < <expr> = + < <term> = * < num < <expr> = + < <term> = * < num > < <expr> = + < <term> = * = <factor> > < <expr> = + = <term> > 1 + 2 * 3 Tokenized: num + num * num
< <factor> > + < <term> > + < num > + < <factor> > + < <term> > + < <expr> = + < <expr> = + < num < <expr> = + < num > * < <expr> = + < <factor> > * < <expr> = + < <term> = * < <expr> = + < <term> = * < num < <expr> = + < <term> = * < num > < <expr> = + < <term> = * = <factor> > < <expr> = + = <term> > < <expr> > 1 + 2 * 3 Tokenized: num + num * num
Questions?