Download presentation
Presentation is loading. Please wait.
1
CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412
2
cmp $0,ecx cmovz edx,ecx Simplified Compiler Structure Source code Understand source code Generate assembly code Assembly code Front end (machine-independent) Back end (machine-dependent) if (b == 0) a = b; Optimize Intermediate code Optimizer
3
Simplified Front-End Structure Source code (character stream) Lexical Analysis Syntax Analysis (Parsing) Token stream Abstract Syntax Tree (AST) Semantic Analysis if (b == 0) a = b; if(b)a=b;0== if == b0 = ab
4
Parse Tree vs. AST Parse tree also called “concrete syntax” Parse Tree (Concrete Syntax) Abstract Syntax Tree Discards (abstracts) unneeded information + 5+ + 34 1 2 + ( S ) S E + S ( S ) 1 2 3 E E + S E 4 5 E S + E
5
How to build an AST Need to find a derivation for the program in the grammar Want an efficient algorithm –should only read token stream once –exponential brute-force search out of question –even CKY is too slow Two main ways to parse: –top-down parsing (recursive descent) –bottom-up parsing (shift-reduce)
6
Parsing Top-down Goal: construct a leftmost derivation of string while reading in token stream Partly-derived StringLookahead S ((1+2+(3+4))+5 E+S ((1+2+(3+4))+5 (S) +S 1(1+2+(3+4))+5 (E+S)+S 1 (1+2+(3+4))+5 (1+S)+S2 (1+2+(3+4))+5 (1+E+S)+S 2 (1+2+(3+4))+5 (1+2+S)+S2 (1+2+(3+4))+5 (1+2+E)+S((1+2+(3+4))+5 (1+2+(S))+S3(1+2+(3+4))+5 (1+2+(E+S))+S3(1+2+(3+4))+5 parsed part unparsed part S E + S | E E num | ( S )
7
Problem S E + S | E E num | ( S ) Want to decide which production to apply based on next symbol (1) S E (S) (E) (1) (1)+2 S E + S (S) + S (E) + S (1)+E (1)+2 Why is this hard?
8
Grammar is Problem This grammar cannot be parsed top-down with only a single look-ahead symbol Not LL(1) = L eft-to-right-scanning, L eft-most derivation, 1 look-ahead symbol Is it LL(k) for some k? Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language
9
Making a grammar LL(1) S E + S S E E num E ( S ) S ES' S' S' + S E num E ( S ) Problem: can’t decide which S production to apply until we see symbol after first expression Left-factoring: Factor common S prefix, add new non-terminal S' at decision point. S' derives (+E)*
10
Parsing with new grammar S ((1+2+(3+4))+5 E S' ((1+2+(3+4))+5 (S) S' 1(1+2+(3+4))+5 (E S') S' 1 (1+2+(3+4))+5 (1 S') S' + (1+2+(3+4))+5 (1+E S' ) S' 2 (1+2+(3+4))+5 (1+2 S') S' + (1+2+(3+4))+5 (1+2 + S) S' ( (1+2+(3+4))+5 (1+2 + E S') S' ( (1+2+(3+4))+5 (1+2 + (S) S') S'3 (1+2+(3+4))+5 (1+2 + (E S' ) S') S' 3 (1+2+(3+4))+5 (1+2 + (3 S') S') S' + (1+2+(3+4))+5 (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5 S ES 'S ' | + S E num | ( S )
11
Predictive Parsing LL(1) grammar: –for a given non-terminal, the look-ahead symbol uniquely determines the production to apply –top-down parsing = predictive parsing –Driven by predictive parsing table of non-terminals terminals productions
12
Using Table S ((1+2+(3+4))+5 E S' ((1+2+(3+4))+5 (S) S' 1(1+2+(3+4))+5 (E S' ) S' 1 (1+2+(3+4))+5 (1 S') S' + (1+2+(3+4))+5 (1 + S) S' 2 (1+2+(3+4))+5 (1+E S' ) S' 2 (1+2+(3+4))+5 (1+2 S') S' + (1+2+(3+4))+5 num + ( ) $ S E S ' E S ' S ' +S E num ( S ) S E S ' S ' | + S E num | ( S )
13
How to Implement? Table can be converted easily into a recursive- descent parser num + ( ) $ S E S ' E S ' S ' +S E num ( S ) Three procedures: parse_S, parse_S’, parse_E
14
Recursive-Descent Parser void parse_S () { switch (token) { case num: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } number + ( ) $ S ES’ ES’ S’ +S E number ( S ) lookahead token
15
Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)’: return; case EOF: return; default: throw new ParseError(); } number + ( ) $ S ES’ ES’ S’ +S E number ( S )
16
Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S ES’ ES’ S’ +S E number ( S )
17
Call Tree = Parse Tree (1 + 2 + (3 + 4)) + 5 S E S’ ( S ) + S E S’ 5 1 + S E S’ 2 + S E S’ ( S ) E S’ + S E 4 3 parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S
18
N + ( ) $ SES’ ES’ S’ +S E N ( S ) How to Construct Parsing Tables There exists an algorithm for automatically generating a predictive parse table from a grammar (take 412 for details) S ES’ S’ | + S E number | ( S )
19
Summary for top-down parsing LL(k) grammars –left-to-right scanning –leftmost derivation –can determine what production to apply from the next k symbols –Can automatically build predictive parsing tables Predictive parsers –Can be easily built for LL(k) grammars from the parsing tables –Also called recursive-descent, or top-down parsers
20
Top-Down Parsing Summary Language grammar LL(1) grammar predictive parsing table recursive-descent parser parser with AST generation Left-recursion elimination Left-factoring
21
Now: Bottom-up Parsing A more powerful parsing technology LR grammars -- more expressive than LL –construct right-most derivation of program –virtually all programming languages –easier to express programming language syntax Shift-reduce parsers –Parsers for LR grammars –automatic parser generators (e.g. yacc,CUP)
22
Bottom-up Parsing Right-most derivation -- backward –Start with the tokens –End with the start symbol (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 (S+(3+4))+5 (S+(E+4))+5 (S+(S+4))+5 (S+(S+E))+5 (S+(S))+5 (S+E)+5 (S)+5 E+5 S+E S S S + E | E E num | ( S )
23
Progress of Bottom-up Parsing (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5 (1 +2+(3+4))+5 (S+2+(3+4))+5 (1 +2+(3+4))+5 (S+E+(3+4))+5 (1+2 +(3+4))+5 (S+(3+4))+5 (1+2+(3 +4))+5 (S+(E+4))+5 (1+2+(3 +4))+5 (S+(S+4))+5 (1+2+(3 +4))+5 (S+(S+E))+5 (1+2+(3+4 ))+5 (S+(S))+5 (1+2+(3+4 ))+5 (S+E)+5 (1+2+(3+4) )+5 (S)+5 (1+2+(3+4) )+5 E+5 (1+2+(3+4)) +5 S+E (1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation
24
Bottom-up Parsing (1+2+(3+4))+5 (E+2+(3+4))+5 (S+2+(3+4))+5 (S+E+(3+4))+5 … Advantage of bottom-up parsing: can postpone the selection of productions until more of the input is scanned S S + E E ( S ) 5 S + E S + ES + E ( S ) S + E E E 1 2 3 4 S S + E | E E num | ( S )
25
Top-down Parsing S S+E E+E (S)+E (S+E)+E (S+E+E)+E (E+E+E)+E (1+E+E)+E (1+2+E)+E... In left-most derivation, entire tree above a token (2) has been expanded when encountered S S + E E ( S ) 5 S + E ( S ) S + E E E 1 2 3 4 S S + E | E E num | ( S ) (1+2+(3+4))+5
26
Top-down vs. Bottom-up scanned unscanned Top-downBottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input
27
Shift-reduce Parsing Parsing actions: is a sequence of shift and reduce operations Parser state: a stack of terminals and non-terminals (grows to the right) Current derivation step = always stack+input Derivation stepstack unconsumed input (1+2+(3+4))+5 (1+2+(3+4))+5 (E+2+(3+4))+5 (E +2+(3+4))+5 (S+2+(3+4))+5 (S +2+(3+4))+5 (S+E+(3+4))+5 (S+E +(3+4))+5
28
Shift-reduce Parsing Parsing is a sequence of shifts and reduces Shift : move look-ahead token to stack stack inputaction ( 1+2+(3+4))+5 shift 1 (1 +2+(3+4))+5 Reduce : Replace symbols from top of stack with non-terminal symbol X, corresponding to production X (pop , push X) stack input action (S+E +(3+4))+5 reduce S S+E (S +(3+4))+5
29
Shift-reduce Parsing (1+2+(3+4))+5 (1+2+(3+4))+5shift (1+2+(3+4))+5 (1 +2+(3+4))+5 reduce E num (E+2+(3+4))+5 (E +2+(3+4))+5 reduce S E (S+2+(3+4))+5 (S +2+(3+4))+5 shift (S+2+(3+4))+5 (S+2 +(3+4))+5reduce E num (S+E+(3+4))+5 (S+E +(3+4))+5reduce S S + E (S+(3+4))+5 (S +(3+4))+5shift (S+(3+4))+5 (S+(3 +4))+5reduce E num derivation stack input streamaction S S + E | E E num | ( S )
30
Problem How do we know which action to take: whether to shift or reduce, and which production? Issues: –Sometimes can reduce but shouldn’t –Sometimes can reduce in different ways
31
Action Selection Problem Given stack and look-ahead symbol b, should parser: –shift b onto the stack (making it b) –reduce X assuming that stack has the form (making it X) If stack has form , should apply reduction X (or shift) depending on stack prefix – is different for different possible reductions, since ’s have different length.
32
LR Parsing Engine Basic mechanism: –Use a set of parser states –Use a stack with alternating symbols and states E.g: 1 ( 6 S 10 + 5 –Use a parsing table to: Determine what action to apply (shift/reduce) Determine the next state The parser actions can be precisely determined from the table
33
The LR Parsing Table Algorithm: look at entry for current state S and input terminal C If Table[S,C] = s(S’) then shift: push(C), push(S’) If Table[S,C] = X then reduce: pop(2*| |), S’=top(), push(X), push(Table[S’,X]) Non-terminals Next state Terminals State Next action and next state Action table Goto table
34
LR Parsing Table Example ()id,$SL 1s3s2g4 2 S id S id S id S id S id 3s3s2g7g5 4 accept 5 s6s8 6 S (L) S (L) S (L) S (L) S (L) 7 L S L S L S L S L S 8 s3 s2g9 9 L L,S L L,S L L,S L L,S L L,S
35
LR(k) Grammars LR(k) = Left-to-right scanning, Right-most derivation, k look-ahead characters Main cases: LR(0), LR(1), and some variations (SLR and LALR(1)) Parsers for LR(0) Grammars: –Determine the actions without any lookahead symbol
36
Building LR(0) Parsing Tables To build the parsing table: –Define states of the parser –Build a DFA to describe the transitions between states –Use the DFA to build the parsing table
37
Summary for bottom-up parsing LR(k) grammars –left-to-right scanning –rightmost derivation –can determine whether to shift or reduce from the next k symbols –Can automatically build predictive parsing tables Shift-reduce parsers –Can be built for LR(k) grammars using automated parser generator tools, eg. CUP, yacc.
38
Top-down vs. Bottom-up again scanned unscanned Top-downBottom-up LL(k), recursive descent LR(k), shift-reduce
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.