CS 381 - Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Mooly Sagiv and Roman Manevich School of Computer Science
Predictive Parsing l Find derivation for an input string, l Build a abstract syntax tree (AST) –a representation of the parsed program l Build a symbol.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
LR(k) Grammar David Rodriguez-Velazquez CS6800-Summer I, 2009 Dr. Elise De Doncker.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Lexical and syntax analysis
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
COMP3190: Principle of Programming Languages Formal Language Syntax.
LR Parsing Compiler Baojian Hua
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CS 153 A little bit about LR Parsing. Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Announcements/Reading
More Parsing CPSC 388 Ellen Walker Hiram College.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Lecture 5: LR Parsing CS 540 George Mason University.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CS 536 © CS 536 Spring Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 9.
Syntax and Semantics Structure of programming languages.
Announcements/Reading
Programming Languages Translator
Table-driven parsing Parsing performed by a finite state machine.
Bottom-Up Syntax Analysis
4 (c) parsing.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Compiler Design 7. Top-Down Table-Driven Parsing
Presentation transcript:

CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412

cmp $0,ecx cmovz edx,ecx Simplified Compiler Structure Source code Understand source code Generate assembly code Assembly code Front end (machine-independent) Back end (machine-dependent) if (b == 0) a = b; Optimize Intermediate code Optimizer

Simplified Front-End Structure Source code (character stream) Lexical Analysis Syntax Analysis (Parsing) Token stream Abstract Syntax Tree (AST) Semantic Analysis if (b == 0) a = b; if(b)a=b;0== if == b0 = ab

Parse Tree vs. AST Parse tree also called “concrete syntax” Parse Tree (Concrete Syntax) Abstract Syntax Tree Discards (abstracts) unneeded information ( S ) S E + S ( S ) E E + S E 4 5 E S + E

How to build an AST Need to find a derivation for the program in the grammar Want an efficient algorithm –should only read token stream once –exponential brute-force search out of question –even CKY is too slow Two main ways to parse: –top-down parsing (recursive descent) –bottom-up parsing (shift-reduce)

Parsing Top-down Goal: construct a leftmost derivation of string while reading in token stream Partly-derived StringLookahead S ((1+2+(3+4))+5  E+S ((1+2+(3+4))+5  (S) +S 1(1+2+(3+4))+5  (E+S)+S 1 (1+2+(3+4))+5  (1+S)+S2 (1+2+(3+4))+5  (1+E+S)+S 2 (1+2+(3+4))+5  (1+2+S)+S2 (1+2+(3+4))+5  (1+2+E)+S((1+2+(3+4))+5  (1+2+(S))+S3(1+2+(3+4))+5  (1+2+(E+S))+S3(1+2+(3+4))+5 parsed part unparsed part S  E + S | E E  num | ( S )

Problem S  E + S | E E  num | ( S ) Want to decide which production to apply based on next symbol (1) S  E  (S)  (E)  (1) (1)+2 S  E + S  (S) + S  (E) + S  (1)+E  (1)+2 Why is this hard?

Grammar is Problem This grammar cannot be parsed top-down with only a single look-ahead symbol Not LL(1) = L eft-to-right-scanning, L eft-most derivation, 1 look-ahead symbol Is it LL(k) for some k? Can rewrite grammar to allow top-down parsing: create LL(1) grammar for same language

Making a grammar LL(1) S  E + S S  E E  num E  ( S ) S  ES' S'   S'  + S E  num E  ( S ) Problem: can’t decide which S production to apply until we see symbol after first expression Left-factoring: Factor common S prefix, add new non-terminal S' at decision point. S' derives (+E)*

Parsing with new grammar S ((1+2+(3+4))+5  E S' ((1+2+(3+4))+5  (S) S' 1(1+2+(3+4))+5  (E S') S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1+E S' ) S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5  (1+2 + S) S' ( (1+2+(3+4))+5  (1+2 + E S') S' ( (1+2+(3+4))+5  (1+2 + (S) S') S'3 (1+2+(3+4))+5  (1+2 + (E S' ) S') S' 3 (1+2+(3+4))+5  (1+2 + (3 S') S') S' + (1+2+(3+4))+5  (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5 S  ES 'S '   | + S E  num | ( S )

Predictive Parsing LL(1) grammar: –for a given non-terminal, the look-ahead symbol uniquely determines the production to apply –top-down parsing = predictive parsing –Driven by predictive parsing table of non-terminals  terminals  productions

Using Table S ((1+2+(3+4))+5  E S' ((1+2+(3+4))+5  (S) S' 1(1+2+(3+4))+5  (E S' ) S' 1 (1+2+(3+4))+5  (1 S') S' + (1+2+(3+4))+5  (1 + S) S' 2 (1+2+(3+4))+5  (1+E S' ) S' 2 (1+2+(3+4))+5  (1+2 S') S' + (1+2+(3+4))+5 num + ( ) $ S  E S '  E S ' S '  +S     E  num  ( S ) S  E S ' S '   | + S E  num | ( S )

How to Implement? Table can be converted easily into a recursive- descent parser num + ( ) $ S  E S '  E S ' S '  +S     E  num  ( S ) Three procedures: parse_S, parse_S’, parse_E

Recursive-Descent Parser void parse_S () { switch (token) { case num: parse_E(); parse_S’(); return; case ‘(’: parse_E(); parse_S’(); return; default: throw new ParseError(); } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S ) lookahead token

Recursive-Descent Parser void parse_S’() { switch (token) { case ‘+’: token = input.read(); parse_S(); return; case ‘)’: return; case EOF: return; default: throw new ParseError(); } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S )

Recursive-Descent Parser void parse_E() { switch (token) { case number: token = input.read(); return; case ‘(‘: token = input.read(); parse_S(); if (token != ‘)’) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } } number + ( ) $ S  ES’  ES’ S’  +S     E  number  ( S )

Call Tree = Parse Tree ( (3 + 4)) + 5 S E S’ ( S ) + S E S’ S E S’ 2 + S E S’ ( S )  E S’ + S E 4 3 parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S parse_E parse_S’ parse_S

N + ( ) $ SES’ ES’ S’ +S   E N ( S ) How to Construct Parsing Tables There exists an algorithm for automatically generating a predictive parse table from a grammar (take 412 for details) S  ES’ S’   | + S E  number | ( S )

Summary for top-down parsing LL(k) grammars –left-to-right scanning –leftmost derivation –can determine what production to apply from the next k symbols –Can automatically build predictive parsing tables Predictive parsers –Can be easily built for LL(k) grammars from the parsing tables –Also called recursive-descent, or top-down parsers

Top-Down Parsing Summary Language grammar LL(1) grammar predictive parsing table recursive-descent parser parser with AST generation Left-recursion elimination Left-factoring

Now: Bottom-up Parsing A more powerful parsing technology LR grammars -- more expressive than LL –construct right-most derivation of program –virtually all programming languages –easier to express programming language syntax Shift-reduce parsers –Parsers for LR grammars –automatic parser generators (e.g. yacc,CUP)

Bottom-up Parsing Right-most derivation -- backward –Start with the tokens –End with the start symbol (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5  (S+(3+4))+5  (S+(E+4))+5  (S+(S+4))+5  (S+(S+E))+5  (S+(S))+5  (S+E)+5  (S)+5  E+5  S+E  S S  S + E | E E  num | ( S )

Progress of Bottom-up Parsing (1+2+(3+4))+5  (1+2+(3+4))+5 (E+2+(3+4))+5  (1 +2+(3+4))+5 (S+2+(3+4))+5  (1 +2+(3+4))+5 (S+E+(3+4))+5  (1+2 +(3+4))+5 (S+(3+4))+5  (1+2+(3 +4))+5 (S+(E+4))+5  (1+2+(3 +4))+5 (S+(S+4))+5  (1+2+(3 +4))+5 (S+(S+E))+5  (1+2+(3+4 ))+5 (S+(S))+5  (1+2+(3+4 ))+5 (S+E)+5  (1+2+(3+4) )+5 (S)+5  (1+2+(3+4) )+5 E+5  (1+2+(3+4)) +5 S+E  (1+2+(3+4))+5 S(1+2+(3+4))+5 right-most derivation

Bottom-up Parsing (1+2+(3+4))+5  (E+2+(3+4))+5  (S+2+(3+4))+5  (S+E+(3+4))+5 … Advantage of bottom-up parsing: can postpone the selection of productions until more of the input is scanned S S + E E ( S ) 5 S + E S + ES + E ( S ) S + E E E S  S + E | E E  num | ( S )

Top-down Parsing S  S+E  E+E  (S)+E  (S+E)+E  (S+E+E)+E  (E+E+E)+E  (1+E+E)+E  (1+2+E)+E... In left-most derivation, entire tree above a token (2) has been expanded when encountered S S + E E ( S ) 5 S + E ( S ) S + E E E S  S + E | E E  num | ( S ) (1+2+(3+4))+5

Top-down vs. Bottom-up scanned unscanned Top-downBottom-up Bottom-up: Don’t need to figure out as much of the parse tree for a given amount of input

Shift-reduce Parsing Parsing actions: is a sequence of shift and reduce operations Parser state: a stack of terminals and non-terminals (grows to the right) Current derivation step = always stack+input Derivation stepstack unconsumed input (1+2+(3+4))+5  (1+2+(3+4))+5 (E+2+(3+4))+5  (E +2+(3+4))+5 (S+2+(3+4))+5  (S +2+(3+4))+5 (S+E+(3+4))+5  (S+E +(3+4))+5

Shift-reduce Parsing Parsing is a sequence of shifts and reduces Shift : move look-ahead token to stack stack inputaction ( 1+2+(3+4))+5 shift 1 (1 +2+(3+4))+5 Reduce : Replace symbols  from top of stack with non-terminal symbol X, corresponding to production X   (pop , push X) stack input action (S+E +(3+4))+5 reduce S  S+E (S +(3+4))+5

Shift-reduce Parsing (1+2+(3+4))+5  (1+2+(3+4))+5shift (1+2+(3+4))+5  (1 +2+(3+4))+5 reduce E  num (E+2+(3+4))+5  (E +2+(3+4))+5 reduce S  E (S+2+(3+4))+5  (S +2+(3+4))+5 shift (S+2+(3+4))+5  (S+2 +(3+4))+5reduce E  num (S+E+(3+4))+5  (S+E +(3+4))+5reduce S  S + E (S+(3+4))+5  (S +(3+4))+5shift (S+(3+4))+5  (S+(3 +4))+5reduce E  num derivation stack input streamaction S  S + E | E E  num | ( S )

Problem How do we know which action to take: whether to shift or reduce, and which production? Issues: –Sometimes can reduce but shouldn’t –Sometimes can reduce in different ways

Action Selection Problem Given stack  and look-ahead symbol b, should parser: –shift b onto the stack (making it  b) –reduce X   assuming that stack has the form   (making it  X) If stack has form  , should apply reduction X   (or shift) depending on stack prefix  –  is different for different possible reductions, since  ’s have different length.

LR Parsing Engine Basic mechanism: –Use a set of parser states –Use a stack with alternating symbols and states E.g: 1 ( 6 S –Use a parsing table to: Determine what action to apply (shift/reduce) Determine the next state The parser actions can be precisely determined from the table

The LR Parsing Table Algorithm: look at entry for current state S and input terminal C If Table[S,C] = s(S’) then shift: push(C), push(S’) If Table[S,C] = X  then reduce: pop(2*|  |), S’=top(), push(X), push(Table[S’,X]) Non-terminals Next state Terminals State Next action and next state Action table Goto table

LR Parsing Table Example ()id,$SL 1s3s2g4 2 S  id S  id S  id S  id S  id 3s3s2g7g5 4 accept 5 s6s8 6 S  (L) S  (L) S  (L) S  (L) S  (L) 7 L  S L  S L  S L  S L  S 8 s3 s2g9 9 L  L,S L  L,S L  L,S L  L,S L  L,S

LR(k) Grammars LR(k) = Left-to-right scanning, Right-most derivation, k look-ahead characters Main cases: LR(0), LR(1), and some variations (SLR and LALR(1)) Parsers for LR(0) Grammars: –Determine the actions without any lookahead symbol

Building LR(0) Parsing Tables To build the parsing table: –Define states of the parser –Build a DFA to describe the transitions between states –Use the DFA to build the parsing table

Summary for bottom-up parsing LR(k) grammars –left-to-right scanning –rightmost derivation –can determine whether to shift or reduce from the next k symbols –Can automatically build predictive parsing tables Shift-reduce parsers –Can be built for LR(k) grammars using automated parser generator tools, eg. CUP, yacc.

Top-down vs. Bottom-up again scanned unscanned Top-downBottom-up LL(k), recursive descent LR(k), shift-reduce