1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #7 Parsing.

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
1 CIS 461 Compiler Design & Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #12 Parsing 4.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?
Top-Down Parsing - recursive descent - predictive parsing
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntax and Semantics Structure of programming languages.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) ) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
1 Compiler Construction Syntax Analysis Top-down parsing.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #8 Parsing Techniques.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #6 Parsing.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Syntax and Semantics Structure of programming languages.
Parsing — Part II (Top-down parsing, left-recursion removal)
Programming Languages Translator
Lecture #12 Parsing Types.
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Top-down parsing cannot be performed on left recursive grammars.
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
Parsing IV Bottom-up Parsing
Parsing — Part II (Top-down parsing, left-recursion removal)
Syntax Analysis - Parsing
Lecture 7 Predictive Parsing
Presentation transcript:

1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #7 Parsing

2 Announcements Programming assignment 2 will be on the class webpage, due in two weeks, October 31, Thursday –In this assignment you will work as teams of two. Please find a partner. –Start the project early, don’t leave it to the last weekend! Homework 2 will be due Read chapter 4 Midterm will be in two weeks, November 5, Tuesday –in class –closed books, closed notes

3 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree from the start symbol and grow toward leaves (similar to a derivation) Pick a production and try to match the input Bad “pick”  may need to backtrack Some grammars are backtrack-free (predictive parsing) Bottom-up parsers (LR(1), operator precedence) Start at the leaves and grow toward root We can think of the process as reducing the input string to the start symbol At each reduction step a particular substring matching the right-side of a production is replaced by the symbol on the left-side of the production Bottom-up parsers handle a large class of grammars

4 Eliminating Immediate Left Recursion To remove left recursion, we can transform the grammar Consider a grammar fragment of the form A  A  |  where  or  are strings of terminal and nonterminal symbols and neither  nor  start with A We can rewrite this as A   R R   R |  where R is a new non-terminal This accepts the same language, but uses only right recursion A A   A  A   R R  R 

5 Left-Recursive and Right-Recursive Expression Grammar 1S  Expr 2Expr  Expr + Term 3 |Expr – Term 4|Term 5Term  Term * Factor 6|Term / Factor 7|Factor 8Factor  num 9|id 1S  Expr 2Expr  Term Expr 3 Expr  + Term Expr 4|– Term Expr 5 |  6Term  Factor Term 7Term  * Factor Term 8 |/ Factor Term 9 |  10Factor  num 11|id

6 Predictive Parsing Basic idea Given A    , the parser should be able to choose between  &  F IRST sets For a string of grammar symbols , define FI RST (  ) as the set of tokens that appear as the first symbol in some string that derives from  That is, x  F IRST (  ) iff   * x , for some  The LL(1) Property If A   and A   both appear in the grammar, we would like F IRST (  )  F IRST (  ) =  This would allow the parser to make a correct choice with a lookahead of exactly one symbol ! (Pursuing this idea leads to LL(1) parser generators...)

7 Recursive Descent Parsing Recursive-descent parsing A top-down parsing method The term descent refers to the direction in which the parse tree is traversed (or built). Use a set of mutually recursive procedures (one procedure for each nonterminal symbol) –Start the parsing process by calling the procedure that corresponds to the start symbol –Each production becomes one clause in procedure We consider a special type of recursive-descent parsing called predictive parsing –Use a lookahead symbol to decide which production to use

8 Recursive Descent Parsing: Expression Grammar void main() { lookahead=getNextToken(); S(); match(EOF); } void S() { Expr(); } void Expr() { Term(); ExprPrime(); } void ExprPrime() { switch(lookahead) { case PLUS : match(PLUS); Term(); ExprPrime(); break; case MINUS : match(MINUS); Term(); ExprPrime(); break; default: return; } void Term() { Factor(); TermPrime(); } void TermPrime() { switch(lookahead) { case TIMES: match(TIMES); Factor(); TermPrime(); break; case DIV: match(DIV); Factor(); TermPrime(); break; default: return; } void Factor() { switch(lookahead) { case ID : match(ID); break; case NUMBER: match(NUMBER); break; default: error(); } int PLUS=1, MINUS=2,... int lookahead; void match(int token) { if (lookahead==token) lookahead=getNextToken(); else error(); }

9 Recursive Descent Parsing: Another Grammar 1S  if E then S else S 2|begin S L 3|print E 4L  end 5|; S L 6E  num = num void S() { switch(lookahead) { case IF: match(IF); E(); match(THEN); S(); match(ELSE); S(); break; case BEGIN: matvh(BEGIN); S(); L(); break; case PRINT: match(PRINT); E(); break; default: error(); } void E() { match(NUM); match(EQ); match(NUM); } void L() { switch(lookahead) { case END: match(END); break; case SEMI: match(SEMI); S(); L(); break; default: error(); } void main() { lookahead=getNextToken(); S(); match(EOF); }

10 Example Execution For Input: if 2=2 then print 5=5 else print 1=1 main: call S(); S 1 : find the production for (S, IF) : S  if E then S else S S 1 : match(IF); S 1 : call E(); E 1 : find the production for (E, NUM): E  num = num E 1 : match(NUM); match(EQ); match(NUM); E 1 : return from E 1 to S 1 S 1 : match(THEN); S 1 :call S(); S 2 : find the production for (S, PRINT): S  print E S 2 : match(PRINT); S 2 : call E(); E 2 : find the production for (E, NUM): E  num = num E 2 : match(NUM); match(EQ); match(NUM); E 2 : return from E 2 to S 2 S 2 : return from S 2 to S 1 S 1 : match(ELSE); S 1 : call S(); S 3 : find the production for (S, PRINT): S  print E S 3 : match(PRINT); S 3 : call E(); E 3 : find the production for (E, NUM): E  num = num E 3 : match(NUM); match(EQ); match(NUM); E 3 : return from E 2 to S 3 S 3 : return from S 3 to S 1 S 1 : return from S 1 to main main: match(EOF); return success;

11 Another Approach: Stack-Based Table-Driven Parsing The parsing table A two dimensional array M[A, a]  gives a production –A: a nonterminal symbol –a: a terminal symbol What does it mean? –If top of the stack is A and the lookahead symbol is a then we apply the production M[A, a] IF BEGIN PRINT END SEMI NUM S S  if E then S else S S  begin S L S  print E L L  end L  ; S L E E  num = num

12 Table-driven Parsers A table-driven parser looks like Parsing tables can be built automatically! Scanner Table-driven Parser Parsing Table Parser Generator source code grammar IR Stack

13 Table-Driven Predictive Parsing Algorithm Push the end-of-file symbol ($) and the start symbol onto the stack Consider the symbol X on the top of the stack and lookahead symbol a –If X = a = $ announce successful parse and halt –If X = a  $ pop X off the stack and advance the input pointer to the next input symbol – If X is a nonterminal, look at the production M[X, a] If there is no such production (M[X, a] = error), then call an error routine If M[X, a] is a production X  Y 1 Y 2... Y k, then pop X and push Y k, Y k-1,..., Y 1 onto the stack with Y 1 on top –If none of the cases above apply, then call an error routine

14 Table-Driven Predictive Parsing Algorithm Push($); // $ is the end-of-file symbol Push(S); // S is the start symbol of the grammar lookahead = get_next_token(); repeat X = top_of_stack(); if (X is a terminal or X == $) then if (X == lookahead) then pop(X); lookahead = get_next_token(); else error(); else // X is a non-terminal if ( M[X, lookahead] == X  Y 1 Y 2... Y k ) then pop(X); push(Y k ); push(Y k-1 );... push(Y 1 ); else error(); until (X = $)

15 Recursive Descent Parser On: if 2=2 then print 5=5 else print 1=1 main: call S(); S 1 : find the production for (S, IF) : S  if E then S else S S 1 : match(IF); S 1 : call E(); E 1 : find the production for (E, NUM): E  num = num E 1 : match(NUM); match(EQ); match(NUM); E 1 : return from E 1 to S 1 S 1 : match(THEN); S 1 :call S(); S 2 : find the production for (S, PRINT): S  print E S 2 : match(PRINT); S 2 : call E(); E 2 : find the production for (E, NUM): E  num = num E 2 : match(NUM); match(EQ); match(NUM); E 2 : return from E 2 to S 2 S 2 : return from S 2 to S 1 S 1 : match(ELSE); S 1 : call S(); S 3 : find the production for (S, PRINT): S  print E S 3 : match(PRINT); S 3 : call E(); E 3 : find the production for (E, NUM): E  num = num E 3 : match(NUM); match(EQ); match(NUM); E 3 : return from E 2 to S 3 S 3 : return from S 3 to S 1 S 1 : return from S 1 to main main: match(EOF); return success;

16 Table Driven Parser On: if 2=2 then print 5=5 else print 1=1$ StacklookaheadParse-table lookup $SIFM[S,IF]: S  if E then S else S $S,ELSE,S,THEN,E,IFIF $S,ELSE,S,THEN,E NUMM[E,NUM]: E  num = num $S,ELSE,S,THEN,NUM,EQ,NUMNUM $S,ELSE,S,THEN,NUM,EQEQ $S,ELSE,S,THEN,NUMNUM $S,ELSE,S,THENTHEN $S,ELSE,SPRINTM[S,PRINT]: S  print E $S,ELSE,E,PRINTPRINT $S,ELSE,ENUMM[E,NUM]: E  num = num $S,ELSE,NUM,EQ,NUMNUM $S,ELSE,NUM,EQEQ $S,ELSE,NUMNUM $S,ELSEELSE $SPRINTM[S,PRINT]: S  print E $E,PRINTPRINT $ENUMM[E,NUM]: E  num = num $NUM,EQ,NUMNUM $NUM,EQEQ $NUMNUM $$report success!

17 How to Build Parse Tables? FIRST Sets For a string of grammar symbols  define FIRST(  ) as The set of tokens that appear as the first symbol in some string that derives from  If   * , then  is in FIRST(  ) To construct FIRST(X) for a grammar symbol X, apply the following rules until no more symbols can be added to FIRST(X) If X is a terminal FIRST(X) is {X} If X   is a production then  is in FIRST(X) If X is a nonterminal and X  Y 1 Y 2... Y k is a production then put every symbol in FIRST(Y 1 ) other than  to FIRST(X) If X is a nonterminal and X  Y 1 Y 2... Y k is a production, then put terminal a in FIRST(X) if a is in FIRST(Y i ) and  is in FIRST(Y j ) for all 1  j  i If X is a nonterminal and X  Y 1 Y 2... Y k is a production, then put  in FIRST(X) if  is in FIRST(Y i ) for all 1  i  k

18 Computing FIRST Sets for Strings of Symbols To construct the FIRST set for any string of grammar symbols X 1 X 2... X n (given the FIRST sets for symbols X 1, X 2,... X n ) apply the following rules. FIRST(X 1 X 2... X n ) contains: –Any symbol in FIRST(X 1 ) other than  –Any symbol in FIRST(X i ) other than , if  is in FIRST(X j ) for all 1  j  i – , if  is in FIRST(X j ) for all 1  i  n

19 FIRST Sets 1S  Expr 2Expr  Term Expr 3Expr  + Term Expr 4 |- Term Expr 5|  6Term  Factor Term 7Term  * Factor Term 8|/ Factor Term 9|  10Factor  num 11|id SymbolFIRST S{num, id} Expr{num, id} Expr{ , +, - } Term{num, id} Term{ , *, / } Factor{num, id} num{num} id{id} +{+} -{-} *{*} /{/}

20 How to build Parse Tables? FOLLOW Sets For a non-terminal symbol A, define FOLLOW(A) as: The set of terminal symbols that can appear immediately to the right of A in some sentential form To construct FOLLOW(A) for a non-terminal symbol A apply the following rules until no more symbols can be added to FOLLOW(A) Place $ in FOLLOW(S) ($ is the end-of-file symbol, S is the start symbol) If there is a production A   B , then everything in FIRST(  ) except  is placed in FOLLOW(B) If there is a production A   B, then everything in FOLLOW(A) is placed in FOLLOW(B) If there is a production A   B , and  is in FIRST(  ) then everything in FOLLOW(A) is placed in FOLLOW(B)

21 FOLLOW Sets 1S  Expr 2Expr  Term Expr 3Expr  + Term Expr 4 |- Term Expr 5|  6Term  Factor Term 7Term  * Factor Term 8|/ Factor Term 9|  10Factor  num 11|id SymbolFOLLOW S{ $ } Expr{ $ } Term{ $, +, - } Factor{ $, +, -, *, / }

22 LL(1) Parse Table Construction For all productions A  , perform the following steps: –For each terminal symbol a in FIRST(  ), add A   to M[A, a] –If  is in FIRST(  ), then add A   to M[A, b] for each terminal symbol b in FOLLOW(A) and add A   to M[A, $] if $ is in FOLLOW(A) Set all the undefined entries in M to error

23 1S  Expr 2Expr  Term Expr 3Expr  + Term Expr 4 |- Term Expr 5|  6Term  Factor Term 7Term  * Factor Term 8|/ Factor Term 9|  10Factor  num 11|id id num + - * / $ S S  E S  E E E  T E E  T E E’ E  + T E E  - T E E   T T  F T T  F T T’ T’   T’   T  * F T T  / F T T’   F F  id F  num Grammar: LL(1) Parse table:

24 LL(1) gramars Left-to-right scan of the input, Leftmost derivation, 1-token lookahead Two alternative definitions of LL(1) grammars: 1.A grammar G is LL(1) if there are no multiple entries in its LL(1) parse table 2.A grammar G is LL(1) if for each set of its productions A   1 |  2 |... |  n FIRST(  1 ), FIRST(  2 ),..., FIRST(  n ), are all pairwise disjoint If  i  * , then FIRST (  j )  FOLLOW (A) =  for all 1  i  n, i  j