Profs. Necula CS 164 Lecture 6-7 1 Top-Down Parsing ICOM 4036 Lecture 5.

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

Mooly Sagiv and Roman Manevich School of Computer Science
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
Parsing III (Eliminating left recursion, recursive descent parsing)
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Prof. Fateman CS 164 Lecture 8 1 Top-Down Parsing CS164 Lecture 8.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Top-Down Parsing.
CPSC 388 – Compiler Design and Construction
Syntax and Semantics Structure of programming languages.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
# 1 CMPS 450 Parsing CMPS 450 J. Moloney. # 2 CMPS 450 Check that input is well-formed Build a parse tree or similar representation of input Recursive.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Compiler Parsing 邱锡鹏 复旦大学计算机科学技术学院
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Introduction to Parsing
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Parsing Top-Down.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Parsing #1 Leonidas Fegaras.
Programming Languages Translator
CS510 Compiler Lecture 4.
Context free grammars Terminals Nonterminals Start symbol productions
Introduction to Parsing (adapted from CS 164 at Berkeley)
Parsing with Context Free Grammars
4 (c) parsing.
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
Compiler Design 7. Top-Down Table-Driven Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Ambiguity, Precedence, Associativity & Top-Down Parsing
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5

Profs. Necula CS 164 Lecture 6-72 Review A parser consumes a sequence of tokens s and produces a parse tree Issues: –How do we recognize that s  L(G) ? –A parse tree of s describes how s  L(G) –Ambiguity: more than one parse tree (interpretation) for some string s –Error: no parse tree for some string s –How do we construct the parse tree?

Profs. Necula CS 164 Lecture 6-73 Ambiguity Grammar E  E + E | E * E | ( E ) | int Strings int + int + int int * int + int

Profs. Necula CS 164 Lecture 6-74 Ambiguity. Example This string has two parse trees E E EE E + int+ E E EE E+ + + is left-associative

Profs. Necula CS 164 Lecture 6-75 Ambiguity. Example This string has two parse trees E E EE E * int+ E E EE E+ * * has higher precedence than +

Profs. Necula CS 164 Lecture 6-76 Ambiguity (Cont.) A grammar is ambiguous if it has more than one parse tree for some string –Equivalently, there is more than one right-most or left-most derivation for some string Ambiguity is bad –Leaves meaning of some programs ill-defined Ambiguity is common in programming languages –Arithmetic expressions –IF-THEN-ELSE

Profs. Necula CS 164 Lecture 6-77 Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite the grammar unambiguously E  E + T | T T  T * int | int | ( E ) Enforces precedence of * over + Enforces left-associativity of + and *

Profs. Necula CS 164 Lecture 6-78 Ambiguity. Example The int * int + int has ony one parse tree now E E EE E * int+ E T T T+ * E

Profs. Necula CS 164 Lecture 6-79 Ambiguity: The Dangling Else Consider the grammar E  if E then E | if E then E else E | OTHER This grammar is also ambiguous

Profs. Necula CS 164 Lecture The Dangling Else: Example The expression if E 1 then if E 2 then E 3 else E 4 has two parse trees if E1E1 E2E2 E3E3 E4E4 E1E1 E2E2 E3E3 E4E4 Typically we want the second form

Profs. Necula CS 164 Lecture The Dangling Else: A Fix else matches the closest unmatched then We can describe this in the grammar (distinguish between matched and unmatched “then”) E  MIF /* all then are matched */ | UIF /* some then are unmatched */ MIF  if E then MIF else MIF | OTHER UIF  if E then E | if E then MIF else UIF Describes the same set of strings

Profs. Necula CS 164 Lecture The Dangling Else: Example Revisited The expression if E 1 then if E 2 then E 3 else E 4 if E1E1 E2E2 E3E3 E4E4 E1E1 E2E2 E3E3 E4E4 Not valid because the then expression is not a MIF A valid parse tree (for a UIF)

Profs. Necula CS 164 Lecture Ambiguity No general techniques for handling ambiguity Impossible to convert automatically an ambiguous grammar to an unambiguous one Used with care, ambiguity can simplify the grammar –Sometimes allows more natural definitions –We need disambiguation mechanisms

Profs. Necula CS 164 Lecture Precedence and Associativity Declarations Instead of rewriting the grammar –Use the more natural (ambiguous) grammar –Along with disambiguating declarations Most tools allow precedence and associativity declarations to disambiguate grammars Examples …

Profs. Necula CS 164 Lecture Associativity Declarations Consider the grammar E  E + E | int Ambiguous: two parse trees of int + int + int E E EE E + int+ E E EE E+ + Left-associativity declaration: %left +

Profs. Necula CS 164 Lecture Precedence Declarations Consider the grammar E  E + E | E * E | int –And the string int + int * int E E EE E + int* E E EE E* + Precedence declarations: %left + %left *

Profs. Necula CS 164 Lecture Review We can specify language syntax using CFG A parser will answer whether s  L(G) … and will build a parse tree … and pass on to the rest of the compiler Next: –How do we answer s  L(G) and build a parse tree?

Profs. Necula CS 164 Lecture Approach 1 Top-Down Parsing

Profs. Necula CS 164 Lecture Intro to Top-Down Parsing Terminals are seen in order of appearance in the token stream: t 1 t 2 t 3 t 4 t 5 The parse tree is constructed –From the top –From left to right A t1t1 B C t2t2 D t3t3 t4t4 t4t4

Profs. Necula CS 164 Lecture Recursive Descent Parsing Consider the grammar E  T + E | T T  int | int * T | ( E ) Token stream is: int 5 * int 2 Start with top-level non-terminal E Try the rules for E in order

Profs. Necula CS 164 Lecture Recursive Descent Parsing. Example (Cont.) Try E 0  T 1 + E 2 Then try a rule for T 1  ( E 3 ) –But ( does not match input token int 5 Try T 1  int. Token matches. –But + after T 1 does not match input token * Try T 1  int * T 2 –This will match but + after T 1 will be unmatched Have exhausted the choices for T 1 –Backtrack to choice for E 0

Profs. Necula CS 164 Lecture Recursive Descent Parsing. Example (Cont.) Try E 0  T 1 Follow same steps as before for T 1 –And succeed with T 1  int * T 2 and T 2  int –With the following parse tree E0E0 T1T1 int 5 * T2T2 int 2

Profs. Necula CS 164 Lecture Recursive Descent Parsing. Notes. Easy to implement by hand –An example implementation is provided as a supplement “Recursive Descent Parsing” But does not always work …

Profs. Necula CS 164 Lecture Recursive-Descent Parsing Parsing: given a string of tokens t 1 t 2... t n, find its parse tree Recursive-descent parsing: Try all the productions exhaustively –At a given moment the fringe of the parse tree is: t 1 t 2 … t k A … –Try all the productions for A: if A  BC is a production, the new fringe is t 1 t 2 … t k B C … –Backtrack when the fringe doesn’t match the string –Stop when there are no more non-terminals

Profs. Necula CS 164 Lecture When Recursive Descent Does Not Work Consider a production S  S a: –In the process of parsing S we try the above rule –What goes wrong? A left-recursive grammar has a non-terminal S S  + S  for some  Recursive descent does not work in such cases –It goes into an  loop

Profs. Necula CS 164 Lecture Elimination of Left Recursion Consider the left-recursive grammar S  S  |  S generates all strings starting with a  and followed by a number of  Can rewrite using right-recursion S   S’ S’   S’ | 

Profs. Necula CS 164 Lecture Elimination of Left-Recursion. Example Consider the grammar S  1 | S 0 (  = 1 and  = 0 ) can be rewritten as S  1 S’ S’  0 S’ | 

Profs. Necula CS 164 Lecture More Elimination of Left-Recursion In general S  S  1 | … | S  n |  1 | … |  m All strings derived from S start with one of  1,…,  m and continue with several instances of  1,…,  n Rewrite as S   1 S’ | … |  m S’ S’   1 S’ | … |  n S’ | 

Profs. Necula CS 164 Lecture General Left Recursion The grammar S  A  |  A  S  is also left-recursive because S  + S   This left-recursion can also be eliminated See Dragon Book, Section 4.3 for general algorithm

Profs. Necula CS 164 Lecture Summary of Recursive Descent Simple and general parsing strategy –Left-recursion must be eliminated first –… but that can be done automatically Unpopular because of backtracking –Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar

Profs. Necula CS 164 Lecture Predictive Parsers Like recursive-descent but parser can “predict” which production to use –By looking at the next few tokens –No backtracking Predictive parsers accept LL(k) grammars –L means “left-to-right” scan of input –L means “leftmost derivation” –k means “predict based on k tokens of lookahead” In practice, LL(1) is used

Profs. Necula CS 164 Lecture LL(1) Languages In recursive-descent, for each non-terminal and input token there may be a choice of production LL(1) means that for each non-terminal and token there is only one production that could lead to success Can be specified as a 2D table –One dimension for current non-terminal to expand –One dimension for next token –A table entry contains one production

Profs. Necula CS 164 Lecture Predictive Parsing and Left Factoring Recall the grammar E  T + E | T T  int | int * T | ( E ) Impossible to predict because –For T two productions start with int –For E it is not clear how to predict A grammar must be left-factored before use for predictive parsing

Profs. Necula CS 164 Lecture Left-Factoring Example Recall the grammar E  T + E | T T  int | int * T | ( E ) Factor out common prefixes of productions E  T X X  + E |  T  ( E ) | int Y Y  * T | 

Profs. Necula CS 164 Lecture LL(1) Parsing Table Example Left-factored grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  The LL(1) parsing table: int*+()$ Tint Y( E ) ET X X+ E  Y* T 

Profs. Necula CS 164 Lecture LL(1) Parsing Table Example (Cont.) Consider the [E, int] entry –“When current non-terminal is E and next input is int, use production E  T X –This production can generate an int in the first place Consider the [Y,+] entry –“When current non-terminal is Y and current token is +, get rid of Y” –We’ll see later why this is so

Profs. Necula CS 164 Lecture LL(1) Parsing Tables. Errors Blank entries indicate error situations –Consider the [E,*] entry –“There is no way to derive a string starting with * from non-terminal E”

Profs. Necula CS 164 Lecture Using Parsing Tables Method similar to recursive descent, except –For each non-terminal S –We look at the next token a –And choose the production shown at [S,a] We use a stack to keep track of pending non- terminals We reject when we encounter an error state We accept when we encounter end-of-input

Profs. Necula CS 164 Lecture LL(1) Parsing Algorithm initialize stack = and next (pointer to tokens) repeat case stack of : if T[X,*next] = Y 1 …Y n then stack  ; else error (); : if t == *next ++ then stack  ; else error (); until stack ==

Profs. Necula CS 164 Lecture LL(1) Parsing Example Stack Input Action E $ int * int $ T X T X $ int * int $ int Y int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal T X $ int $ int Y int Y X $ int $ terminal Y X $ $  X $ $  $ $ ACCEPT

Profs. Necula CS 164 Lecture Constructing Parsing Tables LL(1) languages are those defined by a parsing table for the LL(1) algorithm No table entry can be multiply defined We want to generate parsing tables from CFG

Profs. Necula CS 164 Lecture Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves –Always expand the leftmost non-terminal E T E + int * int + int

Profs. Necula CS 164 Lecture Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves –Always expand the leftmost non-terminal E int T * T E + int * int + int The leaves at any point form a string  A  –  contains only terminals –The input string is  b  –The prefix  matches –The next token is b

Profs. Necula CS 164 Lecture Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves –Always expand the leftmost non-terminal E int T * T E + T int * int + int The leaves at any point form a string  A  –  contains only terminals –The input string is  b  –The prefix  matches –The next token is b

Profs. Necula CS 164 Lecture Top-Down Parsing. Review Top-down parsing expands a parse tree from the start symbol to the leaves –Always expand the leftmost non-terminal E int T * T E + T int * int + int The leaves at any point form a string  A  –  contains only terminals –The input string is  b  –The prefix  matches –The next token is b

Profs. Necula CS 164 Lecture Predictive Parsing. Review. A predictive parser is described by a table –For each non-terminal A and for each token b we specify a production A   –When trying to expand A we use A   if b follows next Once we have the table –The parsing algorithm is simple and fast –No backtracking is necessary

Profs. Necula CS 164 Lecture Constructing Predictive Parsing Tables Consider the state S  *  A  –With b the next token –Trying to match  b  There are two possibilities: 1. b belongs to an expansion of A Any A   can be used if b can start a string derived from  In this case we say that b  First(  ) Or…

Profs. Necula CS 164 Lecture Constructing Predictive Parsing Tables (Cont.) 2.b does not belong to an expansion of A –The expansion of A is empty and b belongs to an expansion of  –Means that b can appear after A in a derivation of the form S  *  Ab  –We say that b  Follow(A) in this case –What productions can we use in this case? Any A   can be used if  can expand to  We say that   First(A) in this case

Profs. Necula CS 164 Lecture Computing First Sets Definition First(X) = { b | X  * b  }  {  | X  *  } 1.First(b) = { b } 2.For all productions X  A 1 … A n Add First(A 1 ) – {  } to First(X). Stop if   First(A 1 ) Add First(A 2 ) – {  } to First(X). Stop if   First(A 2 ) … Add First(A n ) – {  } to First(X). Stop if   First(A n ) Add  to First(X)

Profs. Necula CS 164 Lecture First Sets. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+,  } First( + ) = { + } First( Y ) = {*,  } First( * ) = { * }

Profs. Necula CS 164 Lecture Computing Follow Sets Definition Follow(X) = { b | S  *  X b  } 1.Compute the First sets for all non-terminals first 2.Add $ to Follow(S) (if S is the start non-terminal) 3.For all productions Y  … X A 1 … A n Add First(A 1 ) – {  } to Follow(X). Stop if   First(A 1 ) Add First(A 2 ) – {  } to Follow(X). Stop if   First(A 2 ) … Add First(A n ) – {  } to Follow(X). Stop if   First(A n ) Add Follow(Y) to Follow(X)

Profs. Necula CS 164 Lecture Follow Sets. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ), $} Follow( ) ) = {+, ), $} Follow( Y ) = {+, ), $} Follow( int) = {*, +, ), $}

Profs. Necula CS 164 Lecture Constructing LL(1) Parsing Tables Construct a parsing table T for CFG G For each production A   in G do: –For each terminal b  First(  ) do T[A, b] =  –If   * , for each b  Follow(A) do T[A, b] =  –If   *  and $  Follow(A) do T[A, $] = 

Profs. Necula CS 164 Lecture Constructing LL(1) Tables. Example Recall the grammar E  T X X  + E |  T  ( E ) | int Y Y  * T |  Where in the row of Y do we put Y  * T ? –In the lines of First( *T) = { * } Where in the row of Y do we put Y   ? –In the lines of Follow(Y) = { $, +, ) }

Profs. Necula CS 164 Lecture Notes on LL(1) Parsing Tables If any entry is multiply defined then G is not LL(1) –If G is ambiguous –If G is left recursive –If G is not left-factored –And in other cases as well Most programming language grammars are not LL(1) There are tools that build LL(1) tables

Profs. Necula CS 164 Lecture Review For some grammars there is a simple parsing strategy –Predictive parsing Next: a more powerful parsing strategy