Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Top-Down Parsing.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #7 Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Parsing VI The LR(1) Table Construction. LR(k) items The LR(1) table construction algorithm uses LR(1) items to represent valid configurations of an LR(1)
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing V Introduction to LR(1) Parsers. from Cooper & Torczon2 LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
Compiler Construction Parsing Part I
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing IV Bottom-up Parsing. Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves.
LR(1) Parsers Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Syntax and Semantics Structure of programming languages.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Bottom-up Parsing, Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Parsing V LR(1) Parsers C OMP 412 Rice University Houston, Texas Fall 2001 Copyright 2000, Keith D. Cooper, Ken Kennedy, & Linda Torczon, all rights reserved.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Parsing V LR(1) Parsers N.B.: This lecture uses a left-recursive version of the SheepNoise grammar. The book uses a right-recursive version. The derivations.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Parsing V LR(1) Parsers. LR(1) Parsers LR(1) parsers are table-driven, shift-reduce parsers that use a limited right context (1 token) for handle recognition.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing III (Top-down parsing: recursive descent & LL(1) )
Programming Languages Translator
Parsing — Part II (Top-down parsing, left-recursion removal)
Parsing VI The LR(1) Table Construction
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 8 Bottom Up Parsing
Lecture 8: Top-Down Parsing
Introduction to Parsing
Introduction to Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Nonrecursive Predictive Parsing
Compiler Construction
Presentation transcript:

Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. COMP 412 FALL 2010 The lecture is self consistent, but the order of the three productions for Factor in the expression grammar is different than in 2e

Comp 412, Fall Roadmap (Where are we?) We set out to study parsing Specifying syntax —Context-free grammars  Top-down parsers —Algorithm & its problem with left recursion  —Ambiguity  —Left-recursion removal  Predictive top-down parsing —The LL(1) condition  —Simple recursive descent parsers today —First and Follow sets today —Table-driven LL(1) parsers today

Comp 412, Fall Predictive Parsing Basic idea Given A    , the parser should be able to choose between  &  F IRST sets For some rhs   G, define F IRST (  ) as the set of tokens that appear as the first symbol in some string that derives from  That is, x  F IRST (  ) iff   * x , for some  The LL(1) Property If A   and A   both appear in the grammar, we would like F IRST (  )  F IRST (  ) =  This would allow the parser to make a correct choice with a lookahead of exactly one symbol ! This is almost correct See the next slide

Comp 412, Fall Predictive Parsing What about  -productions?  They complicate the definition of LL(1) If A   and A   and   F IRST (  ), then we need to ensure that F IRST (  ) is disjoint from F OLLOW (A), too, where F OLLOW (A) = the set of terminal symbols that can immediately follow A in a sentential form Define F IRST + (A  ) as F IRST (  )  F OLLOW (A), if   F IRST (  ) F IRST (  ), otherwise Then, a grammar is LL(1) iff A   and A   implies F IRST + (A  )  F IRST + (A  ) =  The intuition is straightforward. If A expands with 2 right hand sides, those rhs’ must have distinct first symbols. We only need F IRST + sets because of  -productions.

Comp 412, Fall What If My Grammar Is Not LL(1) ? Can we transform a non-LL(1) grammar into an LL(1) gramar? In general, the answer is no In some cases, however, the answer is yes Assume a grammar G with productions A    1 and A    2 If  derives anything other than , then F IRST + (A    1 )  F IRST + (A    2 ) ≠  And the grammar is not LL(1) If we pull the common prefix, , into a separate production, we may make the grammar LL(1). A   A’, A’   1 and A’   2 Now, if F IRST + (A’   1 )  F IRST + (A’   2 ) = , G may be LL(1)

Comp 412, Fall What If My Grammar Is Not LL(1) ? Left Factoring This transformation makes some grammars into LL(1) grammars There are languages for which no LL(1) grammar exists For each nonterminal A find the longest prefix  common to 2 or more alternatives for A if  ≠  then replace all of the productions A    1 |   2 |   3 | … |   n | γ with A   A’ | γ A’   1 |  2 |  3 | … |  n Repeat until no nonterminal has alternative rhs’ with a common prefix

Comp 412, Fall Left Factoring Example Consider a simple right-recursive expression grammar 0Goal  Expr 1  Term + Expr 2|Term - Expr 3|Term 4  Factor * Term 5|Factor / Term 6|Factor 7  number 8|id

Comp 412, Fall Left Factoring Example After Left Factoring, we have 0Goal  Expr 1  Term Expr’ 2Expr’  + Expr 3|- Expr 4|  5Term  Factor Term’ 6Term’  * Term 7|/ Term 8|  9Factor  number 10|id This transformation makes some grammars into LL(1) grammars. There are languages for which no LL(1) grammar exists.

Comp 412, Fall F IRST and F OLLOW Sets F IRST (  ) For some   ( T  NT )*, define F IRST (  ) as the set of tokens that appear as the first symbol in some string that derives from  That is, x  F IRST (  ) iff   * x , for some  F OLLOW (A) For some A  NT, define F OLLOW (A) as the set of symbols that can occur immediately after A in a valid sentential form F OLLOW (S) = { EOF }, where S is the start symbol To build F OLLOW sets, we need F IRST sets …

Comp 412, Fall Computing F IRST Sets for each x  T, FIRST(x)  { x } for each A  NT, FIRST(A)  Ø while (FIRST sets are still changing) do for each p  P, of the form A  do if  is B 1 B 2 …B k then begin; rhs  FIRST(B 1 ) – {  } for i  1 to k–1 by 1 while   FIRST(B i ) do rhs  rhs  ( FIRST(B i+1 ) – {  } ) end // for loop end // if-then if i = k and   FIRST(B k ) then rhs  rhs  {  } FIRST(A)  FIRST(A)  rhs end // for loop end // while loop Outer loop is monotone increasing for FIRST sets  | T  NT   | is bounded, so it terminates Inner loop is bounded by the length of the productions in the grammar For SheepNoise: F IRST (Goal) = { baa } F IRST (SN ) = { baa } F IRST (baa) = { baa }

Comp 412, Fall Computing F IRST Sets Outer loop is monotone increasing for FIRST sets  | T  NT   | is bounded, so it terminates Inner loop is bounded by the length of the productions in the grammar For SheepNoise: F IRST (Goal) = { baa } F IRST (SN ) = { baa } F IRST (baa) = { baa } For SN  baa for each x  T, FIRST(x)  { x } for each A  NT, FIRST(A)  Ø while (FIRST sets are still changing) do for each p  P, of the form A  do if  is B 1 B 2 …B k then begin; rhs  FIRST(B 1 ) – {  } for i  1 to k–1 by 1 while   FIRST(B i ) do rhs  rhs  ( FIRST(B i+1 ) – {  } ) end // for loop end // if-then if i = k and   FIRST(B k ) then rhs  rhs  {  } FIRST(A)  FIRST(A)  rhs end // for loop end // while loop

Comp 412, Fall Computing F IRST Sets Outer loop is monotone increasing for FIRST sets  | T  NT   | is bounded, so it terminates Inner loop is bounded by the length of the productions in the grammar For SheepNoise: F IRST (Goal) = { baa } F IRST (SN ) = { baa } F IRST (baa) = { baa } For Goal  SN for each x  T, FIRST(x)  { x } for each A  NT, FIRST(A)  Ø while (FIRST sets are still changing) do for each p  P, of the form A  do if  is B 1 B 2 …B k then begin; rhs  FIRST(B 1 ) – {  } for i  1 to k–1 by 1 while   FIRST(B i ) do rhs  rhs  ( FIRST(B i+1 ) – {  } ) end // for loop end // if-then if i = k and   FIRST(B k ) then rhs  rhs  {  } FIRST(A)  FIRST(A)  rhs end // for loop end // while loop

Comp 412, Fall Computing F OLLOW Sets for each A  NT, FOLLOW(A)  Ø FOLLOW(S)  {EOF } while (FOLLOW sets are still changing) for each p  P, of the form A  B 1 B 2 … B k TRAILER  FOLLOW(A) for i  k down to 1 if B i  NT then // domain check FOLLOW(B i )  FOLLOW(B i )  TRAILER if   FIRST(B i ) // add right context then TRAILER  TRAILER  ( FIRST(B i ) – {  } ) else TRAILER  FIRST(B i ) // no  => no right context else TRAILER  {B i } // B i  T => only 1 symbol

Comp 412, Fall Classic Expression Grammar SymbolFIRSTFOLLOW num Ø id Ø ++Ø --Ø **Ø //Ø ((Ø ))Ø eof Ø  Ø Goal(,id,numeof Expr(,id,num), eof Expr’ +, -,  ), eof Term(,id,num+, -, ), eof Term’ *, /,  +,-, ), eof Factor(,id,num+,-,*,/,),eof FIRST + (A  ) is identical to FIRST(  ) except for productiond 4 and 8 FIRST + (Expr’   ) is { ,), eof} FIRST + (Term’   ) is { ,+,-, ), eof} 0Goal  Expr 1  Term Expr’ 2Expr’  + Term Expr’ 3|- Term Expr’ 4|  5Term  Factor Term’ 6Term’  * Factor Term’ 7|/ Factor Term’ 8|  9Factor  number 10|id 11|( Expr )

Comp 412, Fall Classic Expression Grammar Prod’nFIRST+ 0 (,id,num ,), eof 5 (,id,num 6 * 7 / 8 ,+,-, ), eof 9 number 10 id 11 ( 0Goal  Expr 1  Term Expr’ 2Expr’  + Term Expr’ 3|- Term Expr’ 4|  5Term  Factor Term’ 6Term’  * Factor Term’ 7|/ Factor Term’ 8|  9Factor  number 10|id 11|( Expr )

Comp 412, Fall Building Top-down Parsers for LL(1) Grammars Given an LL(1) grammar, and its F IRST & F OLLOW sets … Emit a routine for each non-terminal —Nest of if-then-else statements to check alternate rhs’s —Each returns true on success and throws an error on false —Simple, working (perhaps ugly) code This automatically constructs a recursive-descent parser Improving matters Nest of if-then-else statements may be slow —Good case statement implementation would be better What about a table to encode the options? —Interpret the table with a skeleton, as we did in scanning I don’t know of a system that does this …

Strategy Encode knowledge in a table Use a standard “skeleton” parser to interpret the table Example The non-terminal Factor has 3 expansions —( Expr ) or Identifier or Number Table might look like: Comp 412, Fall Goal  Expr 1  Term Expr’ 2Expr’  + Term Expr’ 3|- Term Expr’ 4|  5Term  Factor Term’ 6Term’  * Factor Term’ 7|/ Factor Term’ 8|  9Factor  number 10|id 11|( Expr ) Building Top-down Parsers +-*/Id.Num.()EOF Factor————10911—— Terminal Symbols Non- terminal Symbols Expand Factor by rule 9 with input “number” Cannot expand Factor into an operator  error

Comp 412, Fall Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T

Comp 412, Fall LL(1) Expression Parsing Table +–*/IdNum()EOF Goal————000—— Expr————111—— Expr’23—————44 Term————555—— Term’8867———88 Factor————10911—— Table differs from 2e because order of productions in Factor differs. Row we built earlier

Comp 412, Fall Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T Need an interpreter for the table (skeleton parser)

Comp 412, Fall LL(1) Skeleton Parser word  NextWord() // Initial conditions, including push EOF onto Stack // a stack to track local goals push the start symbol, S, onto Stack TOS  top of Stack loop forever if TOS = EOF and word = EOF then break & report success // exit on success else if TOS is a terminal then if TOS matches word then pop Stack // recognized TOS word  NextWord() else report error looking for TOS // error exit else // TOS is a non-terminal if TABLE[TOS,word] is A  B 1 B 2 …B k then pop Stack // get rid of A push B k, B k-1, …, B 1 // in that order else break & report error expanding TOS TOS  top of Stack

Comp 412, Fall Building Top-down Parsers Building the complete table Need a row for every NT & a column for every T Need a table-driven interpreter for the table Need an algorithm to build the table Filling in TABLE[X,y], X  NT, y  T 1. entry is the rule X  , if y  F IRST + (X   ) 2. entry is error if rule 1 does not define If any entry has more than one rule, G is not LL(1) We call this algorithm the LL(1) table construction algorithm

Comp 412, Fall Grammar and Sets for the LL(1) Construction 0Goal  Expr 1  Term Expr’ 2Expr’  + Term Expr’ 3|- Term Expr’ 4|  5Term  Factor Term’ 6Term’  * Factor Term’ 7|/ Factor Term’ 8|  9Factor  number 10|id 11|( Expr ) Prod’n FIRST+ 0 (,id,num ,), eof 5 (,id,num 6 * 7 / 8 ,+,-, ), eof 9 number 10 id 11 ( Right-recursive variant of the classic expression grammar Augmented FIRST sets for the grammar