1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.

Slides:



Advertisements
Similar presentations
Parsing II : Top-down Parsing
Advertisements

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Lecture # 11 Grammar Problems.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Top-Down Parsing.
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntax and Semantics Structure of programming languages.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Lecture 3: Parsing CS 540 George Mason University.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Programming Languages Translator
Context free grammars Terminals Nonterminals Start symbol productions
Compilers Welcome to a journey to CS419 Lecture15: Syntax Analysis:
Top-down parsing cannot be performed on left recursive grammars.
Parsing Techniques.
Top-Down Parsing CS 671 January 29, 2008.
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Top-Down Parsing Identify a leftmost derivation for an input string
Lecture 7: Introduction to Parsing (Syntax Analysis)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated

2 Parsing The scanner recognizes words The parser recognizes syntactic units Can we use regular expressions? –No. How would we recognize L={p k q k } then? We use Context-Free Grammars (CFGs) to specify context-free syntax. Coming up: –CFGs –Top-down parsing –Bottom-up parsing

3 CFGs A CFG describes how a sentence of a language may be generated. Example: Use this grammar to generate the sentence mwa ha ha ha EvilLaugh  mwa EvilCackle EvilCackle  ha EvilCackle EvilCackle  ha

4 CFGs A CFG is a quadruple (N, T, R, S) where –N is the set of non-terminal symbols –T is the set of terminal symbols –S  N is the starting symbol –R  N  (N  T)* is a set of rules Example: Consider G = (N, T, R, S) where –N = {S} –T ={ (, ) } –R ={ S  (S), S  SS, S  }

5 Derivations The language described by a CFG is the set of strings that can be derived from the start symbol using the rules of the grammar. At each step, we choose a non-terminal to replace. S  (S)  (SS)  ((S)S)  (( )S)  (( )(S))  (( )((S)))  (( )(( ))) derivation sentential form This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.

6 Derivations and parse trees We can describe a derivation using a graphical representation called parse tree: –the root is labeled with the start symbol, S –each internal node is labeled with a non- terminal –the children of an internal node A are the right-hand side of a production A  –each leaf is labeled with a terminal A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)

7 Derivations and parse trees So, how can we use the grammar described earlier to verify the syntax of "(( )((( ))))"? –We must try to find a derivation for that string. –We can work top-down (starting at the root/start symbol) or bottom-up (starting at the leaves). Careful! –There may be more than one grammars to describe the same language. –Not all grammars are suitable

8 Problems in parsing Consider S  if E then S else S | if E then S –What is the parse tree for if E1 then S1 else if E2 then S2 else S3 –There are two possible parse trees! This problem is called ambiguity A CFG is ambiguous if one or more terminal strings have multiple leftmost derivations from the start symbol.

9 Ambiguity There is no general algorithm to tell whether a CFG is ambiguous or not. There is no standard procedure for eliminating ambiguity. Some languages are inherently ambiguous. –In those cases, any grammar we come up with will be ambiguous.

10 Ambiguity In general, we try to eliminate ambiguity by rewriting the grammar. Example: –E  E+E | E  E | id becomes: E  E+T | T T  T  F | F F  id

11 Ambiguity In general, we try to eliminate ambiguity by rewriting the grammar. Example: –S  if E then S else S | if E then S | other becomes: S  E withElse | E noElse E withElse  if E then E withElse else E withElse | other E noElse  if E then S | if E then E withElse else E noElse

12 Top-down parsing Main idea: –Start at the root, grow towards leaves –Pick a production and try to match input –May need to backtrack –Some grammars are backtrack-free Example: –Use the expression grammar to parse x-2*y

13 Grammar problems Because we try to generate a leftmost derivation by scanning the input from left to right, grammars of the form A  A x may cause endless recursion. Such grammars are called left-recursive and they must be transformed if we want to use a top-down parser.

14 Left recursion A grammar is left recursive if for a non- terminal A, there is a derivation A  + A  There are three types of left recursion: –direct (A  A x) –indirect (A  B C, B  A ) –hidden (A  B A, B   )

15 Left recursion To eliminate direct left recursion replace A  A  1 | A  2 |... | A  m |  1 |  2 |... |  n with A   1 B |  2 B |... |  n B B   1 B |  2 B |... |  m B | 

16 Left recursion How about this: S  E E  E+T E  T T  E-T T  id There is direct recursion: E  E+T There is indirect recursion: T  E+T, E  T Algorithm for eliminating indirect recursion List the nonterminals in some order A 1, A 2,...,A n for i=1 to n for j=1 to i-1 if there is a production A i  A j , replace A j with its rhs eliminate any direct left recursion on A i

17 Eliminating indirect left recursion S  E E  E+T E  T T  E-T T  F F  E*F F  id i=Sordering: S, E, T, F S  E E  E+T E  T T  E-T T  F F  E*F F  id i=E S  E E  TE' E'  +TE'|  T  E-T T  F F  E*F F  id i=T, j=E S  E E  TE' E'  +TE'|  T  TE'-T T  F F  E*F F  id S  E E  TE' E'  +TE'|  T  FT' T'  E'-TT'|  F  E*F F  id

18 Eliminating indirect left recursion i=F, j=E S  E E  TE' E'  +TE'|  T  FT' T'  E'-TT'|  F  TE'*F F  id i=F, j=T S  E E  TE' E'  +TE'|  T  FT' T'  E'-TT'|  F  FT'E'*F F  id S  E E  TE' E'  +TE'|  T  FT' T'  E'-TT'|  F  idF' F'  T'E'*FF'| 

19 Grammar problems Consider S  if E then S else S | if E then S –Which of the two productions should we use to expand non-terminal S when the next token is if? –We can solve this problem by factoring out the common part in these rules. This way, we are postponing the decision about which rule to choose until we have more information (namely, whether there is an else or not). –This is called left factoring

20 Left factoring A   1 |  2 |...|  n |  becomes A   B|  B   1 |  2 |...|  n

21 Grammar problems A symbol X  V is useless if –there is no derivation from X to any string in the language (non-terminating) –there is no derivation from S that reaches a sentential form containing X (non-reachable) Reduced grammar = a grammar that does not contain any useless symbols.

22 Useless symbols In order to remove useless symbols, apply two algorithms: –First, remove all non-terminating symbols –Then, remove all non-reachable symbols. The order is important! –For example, consider S  +  X  where  contains a non-terminating symbol. What will happen if we apply the algorithms in the wrong order? Concrete example: S  AB | a, A  a

23 Useless symbols Example Initial grammar: S  AB | CA A  a B  CB | AB C  cB | b D  aD | d Algorithm 1 (terminating symbols): A is in because of A  a C is in because of C  b D is in because of D  d S is in because A, C are in and S  AC

24 Useless symbols Example continued After algorithm 1: S  CA A  a C  b D  aD | d Algorithm 2 (reachable symbols): S is in because it is the start symbol C and A are in because S is in and S  CA Final grammar: S  CA A  a C  b

25 Predictive parsing Basic idea: –Given A   |  the parser should be able to choose between  and  How? –What if we do some "preprocessing" to answer the question: Given a non-terminal A and lookahead t, which (if any) production of A is guaranteed to start with a t?

26 Predictive parsing If we have two productions: A , we want a distinct way of choosing the correct one. Define: –for  G, x  FIRST(  ) iff   * x  If FIRST(  ) and FIRST(  ) contain no common symbols, we will know whether we should choose A  or A  by looking at the lookahead symbol. –Almost right...

27 Predictive parsing Compute FIRST(X) as follows: –if X is a terminal, then FIRST(X)={X} –if X  is a production, then add  to FIRST(X) –if X is a non-terminal and X  Y 1 Y 2...Y n is a production, add FIRST(Y i ) to FIRST(X) if the preceding Y j s contain  in their FIRSTs

28 Predictive parsing What if we have a "candidate" production A  where  =  or  *  ? We could expand if we knew that there is some sentential form where the current input symbol appears after A. Define: –for A  N, x  FOLLOW(A) iff  S  *  Ax 

29 Predictive parsing Compute FOLLOW as follows: –FOLLOW(S) contains EOF –For productions A  B , everything in FIRST(  ) except  goes into FOLLOW(B) –For productions A  B or A  B  where FIRST(  ) contains , FOLLOW(B) contains everything that is in FOLLOW(A)

30 Predictive parsing Armed with –FIRST –FOLLOW we can build parser where no backtracking is required!

31 Predictive parsing (w/table) For each production A  do: –For each terminal a  FIRST(  ) add A  to entry M[A,a] –If  FIRST(  ), add A  to entry M[A,b] for each terminal b  FOLLOW(A). –If  FIRST(  ) and EOF  FOLLOW(A), add A  to M[A,EOF] Use table and stack to simulate recursion.

32 Recursive Descent Parsing Basic idea: –Write a routine to recognize each lhs –This produces a parser with mutually recursive routines. –Good for hand-coded parsers.