CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412

cmp $0,ecx cmovz edx,ecx Simplified Compiler Structure Source code Understand source code Generate assembly code Assembly code Front end (machine-independent) Back end (machine-dependent) if (b == 0) a = b; Optimize Intermediate code Optimizer

Simplified Front-End Structure Source code (character stream) Lexical Analysis Syntax Analysis (Parsing) Token stream Abstract Syntax Tree (AST) Semantic Analysis if (b == 0) a = b; if(b)a=b;0== if == b0 = ab

Parse Tree vs. AST Parse tree also called “concrete syntax” Parse Tree (Concrete Syntax) Abstract Syntax Tree Discards (abstracts) unneeded information + 5+ + 34 1 2 + ( S ) S E + S ( S ) 1 2 3 E E + S E 4 5 E S + E

How to build an AST Need to find a derivation for the program in the grammar Want an efficient algorithm –should only read token stream once –exponential brute-force search out of question –even CKY is too slow Two main ways to parse: –top-down parsing (recursive descent) –bottom-up parsing (shift-reduce)

Parsing Top-down Goal: construct a leftmost derivation of string while reading in token stream Partly-derived StringLookahead S ((1+2+(3+4))+5  E+S ((1+2+(3+4))+5  (S) +S 1(1+2+(3+4))+5  (E+S)+S 1 (1+2+(3+4))+5  (1+S)+S2 (1+2+(3+4))+5  (1+E+S)+S 2 (1+2+(3+4))+5  (1+2+S)+S2 (1+2+(3+4))+5  (1+2+E)+S((1+2+(3+4))+5  (1+2+(S))+S3(1+2+(3+4))+5  (1+2+(E+S))+S3(1+2+(3+4))+5 parsed part unparsed part S  E + S | E E  num | ( S )

Problem S  E + S | E E  num | ( S ) Want to decide which production to apply based on next symbol (1) S  E  (S)  (E)  (1) (1)+2 S  E + S  (S) + S  (E) + S  (1)+E  (1)+2 Why is this hard?

Grammar is Problem This grammar cannot be parsed top-down with only a single look-ahead symbol Not LL(1) = L eft-to-right-scanning, L eft-most derivation, 1 look-ahead symbol Is it LL(k) for some k? Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language

Making a grammar LL(1) S  E + S S  E E  num E  ( S ) S  ES' S'   S'  + S E  num E  ( S ) Problem: can’t decide which S production to apply until we see symbol after first expression Left-factoring: Factor common S prefix, add new non-terminal S' at decision point. S' derives (+E)*

Parsing with new grammar S ((1+2+(3+4))+5  E S' ((1+2+(3+4))+5  (S) S' 1(1+2+(3+4))+5  (E S') S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1+E S' ) S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5  (1+2 + S) S' ( (1+2+(3+4))+5  (1+2 + E S') S' ( (1+2+(3+4))+5  (1+2 + (S) S') S'3 (1+2+(3+4))+5  (1+2 + (E S' ) S') S' 3 (1+2+(3+4))+5  (1+2 + (3 S') S') S' + (1+2+(3+4))+5  (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5 S  ES 'S '   | + S E  num | ( S )

Predictive Parsing LL(1) grammar: –for a given non-terminal, the look-ahead symbol uniquely determines the production to apply –top-down parsing = predictive parsing –Driven by predictive parsing table of non-terminals  terminals  productions

Using Table S ((1+2+(3+4))+5  E S' ((1+2+(3+4))+5  (S) S' 1(1+2+(3+4))+5  (E S' ) S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1 + S) S' 2 (1+2+(3+4))+5  (1+E S' ) S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5 num + ( ) $ S  E S '  E S ' S '  +S     E  num  ( S ) S  E S ' S '   | + S E  num | ( S )

How to Implement? Table can be converted easily into a recursive- descent parser num + ( ) $ S  E S '  E S ' S '  +S     E  num  ( S ) Three procedures: parse_S, parse_S’, parse_E

Recursive-Descent Parser void parse_S () { switch (token) { case num: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S ) lookahead token

Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)’: return; case EOF: return; default: throw new ParseError(); } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S )

Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S )

Call Tree = Parse Tree (1 + 2 + (3 + 4)) + 5 S E S’ ( S ) + S E S’ 5 1 + S E S’ 2 + S E S’ ( S )  E S’ + S E 4 3 parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S

N + ( ) $ SES’ ES’ S’ +S   E N ( S ) How to Construct Parsing Tables There exists an algorithm for automatically generating a predictive parse table from a grammar (take 412 for details) S  ES’ S’   | + S E  number | ( S )

Summary for top-down parsing LL(k) grammars –left-to-right scanning –leftmost derivation –can determine what production to apply from the next k symbols –Can automatically build predictive parsing tables Predictive parsers –Can be easily built for LL(k) grammars from the parsing tables –Also called recursive-descent, or top-down parsers

Top-Down Parsing Summary Language grammar LL(1) grammar predictive parsing table recursive-descent parser parser with AST generation Left-recursion elimination Left-factoring

Now: Bottom-up Parsing A more powerful parsing technology LR grammars -- more expressive than LL –construct right-most derivation of program –virtually all programming languages –easier to express programming language syntax Shift-reduce parsers –Parsers for LR grammars –automatic parser generators (e.g. yacc,CUP)

Bottom-up Parsing Right-most derivation -- backward –Start with the tokens –End with the start symbol (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5  (S+(S+4))+5  (S+(S+E))+5  (S+(S))+5  (S+E)+5  (S)+5  E+5  S+E  S S  S + E | E E  num | ( S )

Progress of Bottom-up Parsing (1+2+(3+4))+5  (1+2+(3+4))+5 (E+2+(3+4))+5  (1 +2+(3+4))+5 (S+2+(3+4))+5  (1 +2+(3+4))+5 (S+E+(3+4))+5  (1+2 +(3+4))+5 (S+(3+4))+5  (1+2+(3 +4))+5 (S+(E+4))+5  (1+2+(3 +4))+5 (S+(S+4))+5  (1+2+(3 +4))+5 (S+(S+E))+5  (1+2+(3+4 ))+5 (S+(S))+5  (1+2+(3+4 ))+5 (S+E)+5  (1+2+(3+4) )+5 (S)+5  (1+2+(3+4) )+5 E+5  (1+2+(3+4)) +5 S+E  (1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation

Bottom-up Parsing (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5 … Advantage of bottom-up parsing: can postpone the selection of productions until more of the input is scanned S S + E E ( S ) 5 S + E S + ES + E ( S ) S + E E E 1 2 3 4 S  S + E | E E  num | ( S )

Top-down Parsing S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E... In left-most derivation, entire tree above a token (2) has been expanded when encountered S S + E E ( S ) 5 S + E ( S ) S + E E E 1 2 3 4 S  S + E | E E  num | ( S ) (1+2+(3+4))+5

Top-down vs. Bottom-up scanned unscanned Top-downBottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input

Shift-reduce Parsing Parsing actions: is a sequence of shift and reduce operations Parser state: a stack of terminals and non-terminals (grows to the right) Current derivation step = always stack+input Derivation stepstack unconsumed input (1+2+(3+4))+5  (1+2+(3+4))+5 (E+2+(3+4))+5  (E +2+(3+4))+5 (S+2+(3+4))+5  (S +2+(3+4))+5 (S+E+(3+4))+5  (S+E +(3+4))+5

Shift-reduce Parsing Parsing is a sequence of shifts and reduces Shift : move look-ahead token to stack stack inputaction ( 1+2+(3+4))+5 shift 1 (1 +2+(3+4))+5 Reduce : Replace symbols  from top of stack with non-terminal symbol X, corresponding to production X   (pop , push X) stack input action (S+E +(3+4))+5 reduce S  S+E (S +(3+4))+5

Shift-reduce Parsing (1+2+(3+4))+5  (1+2+(3+4))+5shift (1+2+(3+4))+5  (1 +2+(3+4))+5 reduce E  num (E+2+(3+4))+5  (E +2+(3+4))+5 reduce S  E (S+2+(3+4))+5  (S +2+(3+4))+5 shift (S+2+(3+4))+5  (S+2 +(3+4))+5reduce E  num (S+E+(3+4))+5  (S+E +(3+4))+5reduce S  S + E (S+(3+4))+5  (S +(3+4))+5shift (S+(3+4))+5  (S+(3 +4))+5reduce E  num derivation stack input streamaction S  S + E | E E  num | ( S )

Problem How do we know which action to take: whether to shift or reduce, and which production? Issues: –Sometimes can reduce but shouldn’t –Sometimes can reduce in different ways

Action Selection Problem Given stack  and look-ahead symbol b, should parser: –shift b onto the stack (making it  b) –reduce X   assuming that stack has the form   (making it  X) If stack has form  , should apply reduction X   (or shift) depending on stack prefix  –  is different for different possible reductions, since  ’s have different length.

LR Parsing Engine Basic mechanism: –Use a set of parser states –Use a stack with alternating symbols and states E.g: 1 ( 6 S 10 + 5 –Use a parsing table to: Determine what action to apply (shift/reduce) Determine the next state The parser actions can be precisely determined from the table

The LR Parsing Table Algorithm: look at entry for current state S and input terminal C If Table[S,C] = s(S’) then shift: push(C), push(S’) If Table[S,C] = X  then reduce: pop(2*|  |), S’=top(), push(X), push(Table[S’,X]) Non-terminals Next state Terminals State Next action and next state Action table Goto table

LR Parsing Table Example ()id,$SL 1s3s2g4 2 S  id S  id S  id S  id S  id 3s3s2g7g5 4 accept 5 s6s8 6 S  (L) S  (L) S  (L) S  (L) S  (L) 7 L  S L  S L  S L  S L  S 8 s3 s2g9 9 L  L,S L  L,S L  L,S L  L,S L  L,S

LR(k) Grammars LR(k) = Left-to-right scanning, Right-most derivation, k look-ahead characters Main cases: LR(0), LR(1), and some variations (SLR and LALR(1)) Parsers for LR(0) Grammars: –Determine the actions without any lookahead symbol

Building LR(0) Parsing Tables To build the parsing table: –Define states of the parser –Build a DFA to describe the transitions between states –Use the DFA to build the parsing table

Summary for bottom-up parsing LR(k) grammars –left-to-right scanning –rightmost derivation –can determine whether to shift or reduce from the next k symbols –Can automatically build predictive parsing tables Shift-reduce parsers –Can be built for LR(k) grammars using automated parser generator tools, eg. CUP, yacc.

Top-down vs. Bottom-up again scanned unscanned Top-downBottom-up LL(k), recursive descent LR(k), shift-reduce

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

Similar presentations

Presentation on theme: "CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

Similar presentations

Presentation on theme: "CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412."— Presentation transcript:

Similar presentations

About project

Feedback