Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.

Slides:



Advertisements
Similar presentations
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Advertisements

Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
CSE 5317/4305 L4: Parsing #21 Parsing #2 Leonidas Fegaras.
Mooly Sagiv and Roman Manevich School of Computer Science
Cse321, Programming Languages and Compilers 1 6/12/2015 Lecture #10, Feb. 14, 2007 Modified sets of item construction Rules for building LR parse tables.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #10 Parsing.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Lecture #8, Feb. 7, 2007 Shift-reduce parsing,
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Top-Down Parsing.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
Shift/Reduce and LR(1) Professor Yihjia Tsai Tamkang University.
Bottom-up parsing Goal of parser : build a derivation
Lexical and syntax analysis
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
11 Outline  6.0 Introduction  6.1 Shift-Reduce Parsers  6.2 LR Parsers  6.3 LR(1) Parsing  6.4 SLR(1)Parsing  6.5 LALR(1)  6.6 Calling Semantic.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Syntax and Semantics Structure of programming languages.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
111 Chapter 6 LR Parsing Techniques Prof Chung. 1.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
4. Bottom-up Parsing Chih-Hung Wang
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
8 January 2004 Department of Software & Media Technology 1 Top Down Parsing Recursive Descent Parsing Top-down parsing: –Build tree from root symbol –Each.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 6: LR grammars and automatic parser generators.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
Conflicts in Simple LR parsers A SLR Parser does not use any lookahead The SLR parsing method fails if knowing the stack’s top state and next input token.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Chapter 8. LR Syntactic Analysis Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Announcements/Reading
Programming Languages Translator
UNIT - 3 SYNTAX ANALYSIS - II
Table-driven parsing Parsing performed by a finite state machine.
Fall Compiler Principles Lecture 4: Parsing part 3
Bottom-Up Syntax Analysis
Subject Name:COMPILER DESIGN Subject Code:10CS63
Top-Down Parsing CS 671 January 29, 2008.
CSC 4181 Compiler Construction Parsing
Parsing #2 Leonidas Fegaras.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Parsing #2 Leonidas Fegaras.
Kanat Bolazar February 16, 2010
Presentation transcript:

Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically from grammar. Language generator tables parser Input tables stack

Pushdown Automata A context-free grammar can be recognized by a finite state machine with a stack: a PDA. The PDA is defined by set of internal states and a transition table. The PDA can read the input and read/write on the stack. The actions of the PDA are determined by its current state, the current top of the stack, and the current input symbol. There are three distinguished states: –start state: nothing seen –accept state: sentence complete –error state: current symbol doesn’t belong.

Top-down parsing Parse tree is synthesized from the root (sentence symbol). Stack contains symbols of rhs of current production, and pending non-terminals. Automaton is trivial (no need for explicit states) Transition table indexed by grammar symbol G and input symbol a. Entries in table are terminals or productions: P ABC…

Top-down parsing Actions: –initially, stack contains sentence symbol –At each step, let S be symbol on top of stack, and a be the next token on input. –if T (S, a) is terminal a, read token, pop symbol from stack –if T (S, a) is production P ABC…., remove S from stack, push the symbols A, B, C on the stack (A on top). –If S is the sentence symbol and a is the end of file, accept. –If T (S, a) is undefined, signal error. Semantic action: when starting a production, build tree node for non-terminal, attach to parent.

Table-driven parsing and recursive descent parsing Recursive descent: every production is a procedure. Call stack holds active procedures corresponding to pending non-terminals. Stack still needed for context-sensitive legality checks, error messages, etc. Table-driven parser: recursion simulated with explicit stack.

Building the parse table Define two functions on the symbols of the grammar: FIRST and FOLLOW. For a non-terminal N, FIRST (N) is the set of terminal symbols that can start any derivation from N. – First (If_Statement) = {if} – First (Expr) = {id, ( } FOLLOW (N) is the set of terminals that can appear after a string derived from N: – Follow (Expr) = {+, ), $ }

Computing FIRST (N) If N  First (N) includes  if N aABC First (N) includes a if N X1X2 First (N) includes First (X1) if N X1X2… and X1 , First (N) includes First (X2) Obvious generalization to First (  ) where a is X1X2...

Computing First (N) Grammar for expressions, without left-recursion: E TE’ | T E’ +TE’ |  T FT’ | F T’ *FT’ |  F id | (E) First (F) = { id, ( } First (T’) = { *,  } First (T) = { id, ( } First (E’) = { +,  } First (E) = { id, ( }

Computing Follow (N) Follow (N) is computed from productions in which N appears on the rhs For the sentence symbol S, Follow (S) includes $ if A  N , Follow (N) includes First (  ) –because an expansion of N will be followed by an expansion from  if A  N, Follow (N) includes Follow (A) –because N will be expanded in the context in which A is expanded if A  N B, B , Follow (N) includes Follow (A)

Computing Follow (N) E TE’ | T E’ +TE’ |  T FT’ | F T’ *FT’ |  F id | (E) Follow (E) = { ), $ } Follow (E’) = { ), $ } Follow (T) = First (E’ ) + Follow (E’) = { +, ), $ } Follow (T’) = Follow (T) = { +, ), $ } Follow (F) = First (T’) + Follow (T’) = { *, +, ), $ }

Building LL (1) parse tables Table indexed by non-terminal and token. Table entry is a production: for each production P: A  loop for each terminal a in First (  ) loop T (A, a) := P; end loop; if  in First (  ), then for each terminal b in Follow (  ) loop T (A, b) := P; end loop; end if; end loop; All other entries are errors. If two assignments conflict, parse table cannot be built.

LL (1) grammars If table construction is successful, grammar is LL (1): left-to right, leftmost derivation with one-token lookahead. If construction fails, can conceive of LL (2), etc. Ambiguous grammars are never LL (k) If a terminal is in First for two different productions of A, the grammar cannot be LL (1). Grammars with left-recursion are never LL (k) Some useful constructs are not LL (k)

Bottom-up parsing Synthesize tree from fragments Automaton performs two actions: –shift: push next symbol on stack –reduce: replace symbols on stack Automaton synthesizes (reduces) when end of a production is recognized States of automaton encode synthesis so far, and expectation of pending non-terminals Automaton has potentially large set of states Technique more general than LL (k)

LR (k) parsing Left-to-right, rightmost derivation with k-token lookahead. Most general parsing technique for deterministic grammars. In general, not practical: tables too large (10^6 states for C++, Ada). Common subsets: SLR, LALR (1).

The states of the LR(0) automaton An item is a point within a production, indicating that part of the production has been recognized: – A . B , seen the expansion of , expect to see expansion of B A state is a set of items Transition within states are determined by terminals and non-terminals Parsing tables are built from automaton: –action: shift / reduce depending on next symbol –goto: change state depending on synthesized non-terminal

Building LR (0) states If a state includes: A . B  it also includes every state that is the start of B: B. X Y Z Informally: if I expect to see B next, I expect to see anything that B can start with, and so on: X. G H I States are built by closure from individual items.

A grammar of expressions: initial state E’ E E E + T | T; -- left-recursion ok here. T T * F | F; F id | (E) S0 = { E’.E, E.E + T, E.T, F.id, F. ( E ), T.T * F, T.F}

Adding states If a state has item A .a , and the next symbol in the input is a, we shift a on the stack and enter a state that contains item A  a.  (as well as all other items brought in by closure) if a state has as item A ., this indicates the end of a production: reduce action. If a state has an item A .N , then after a reduction that find an N, go to a state with A  N. 

The LR (0) states for expressions S1 = { E’ E., E E. + T } S2 = { E T., T T. * F } S3 = { T F. } S4 = { F (. E), } + S0 (by closure) S5 = { F id. } S6 = { E E +. T, T.T * F, T.F, F.id, F.(E)} S7 = { T T *. F, F.id, F.(E)} S8 = { F (E.), E E.+ T} S9 = { E E + T., T T.* F} S10 = { T T * F.}, S11 = {F (E).}

Building SLR tables An arc between two states labeled with a terminal is a shift action. An arc between two states labeled with a non- terminal is a goto action. if a state contains an item A , (a reduce item) the action is to reduce by this production, for all terminals in Follow (A). If there are shift-reduce conflicts or reduce-reduce conflicts, more elaborate techniques are needed.

LR (k) parsing Canonical LR (1): annotate each item with its own follow set: (A ->  a. , f ) f is a subset of the follow set of A, because it is derived from a single specific production for A A state that includes A ->  a.  is a reduce state only if next symbol is in f: fewer reduce actions, fewer conflicts, technique is more powerful than SLR (1) Generalization: use sequences of k symbols in f Disadvantage: state explosion: impractical in general, even for LR (1)

LALR (1) Compute follow set for a small set of items Tables no bigger than SLR (1) Same power as LR (1), slightly worse error diagnostics Incorporated into yacc, bison, etc.