CS 321 Programming Languages and Compilers VI. Parsing.

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Compiler Construction
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
LESSON 18.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing Giuseppe Attardi Università di Pisa. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens.
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
LR(1) Languages An Introduction Professor Yihjia Tsai Tamkang University.
Top-Down Parsing.
COP4020 Programming Languages
Parsing. Parsing Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Tokens = “words” Programs = “sentences”
CPSC 388 – Compiler Design and Construction
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Top-Down Parsing - recursive descent - predictive parsing
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Chapter 5 Context-Free Grammars
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
1 Compiler Construction Syntax Analysis Top-down parsing.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Parsing Giuseppe Attardi Università di Pisa. Parsing To extract the grammatical structure of a sentence, where: sentence = program words = tokens words.
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Programming Languages Translator
CS510 Compiler Lecture 4.
Syntax Specification and Analysis
Top-down parsing cannot be performed on left recursive grammars.
Top-Down Parsing CS 671 January 29, 2008.
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Parsing IV Bottom-up Parsing
Nonrecursive Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

CS 321 Programming Languages and Compilers VI. Parsing

Parsing 2 Calculate grammatical structure of program, like diagramming sentences, where: Tokens = “words” Programs = “sentences” For further information, read: Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)

Parsing 3 Outline of coverage Context-free grammars Parsing –Tabular Parsing Methods –One pass »Top-down »Bottom-up Yacc

Parsing 4 What parser does: Extracts grammatical structure of program function-def nameargumentsstmt-list main stmt expression operatorexpression variablestring cout << “hello, world\n”

Parsing 5 Context-free languages Grammatical structure defined by context-free grammar. statement  labeled-statement | expression-statement | compound-statement labeled-statement  ident : statement | case constant-expression : statement compound-statement   { declaration-list statement-list } terminal non-terminal “Context-free” = only one non-terminal in left-part.

Parsing 6 Parse trees Parse tree = tree labeled with grammar symbols, such that: If node is labeled A, and its children are labeled x 1...x n, then there is a production A  x 1...x n “Parse tree from A” = root labeled with A “Complete parse tree” = all leaves labeled with tokens

Parsing 7 Parse trees and sentences Frontier of tree = labels on leaves (in left-to-right order) Frontier of tree from S is a sentential form. Frontier of a complete tree from S is a sentence. L E a L ; E “Frontier”

Parsing 8 Example G: L  L ; E | E E  a | b Syntax trees from start symbol (L): Sentential forms: aa; E a;b;b L E a L E a L ; E L E a L ; E b L E b ;

Parsing 9 Derivations Alternate definition of sentence: Given ,  in V*, say  is a derivation step if  ’  ’’ and  =  ’  ’’, where A   is a production  is a sentential form iff there exists a derivation (sequence of derivation steps) S  ( alternatively, we say that S    ) Two definitions are equivalent, but note that there are many derivations corresponding to each parse tree.

Parsing 10 Another example H: L  E ; L | E E  a | b L E a L E a L ; E L E a L ; E b L E b ;

Parsing 11 Ambiguity For some purposes, it is important to know whether a sentence can have more than one parse tree. A grammar is ambiguous if there is a sentence with more than one parse tree. Example: E  E + E | E * E | id E E E E E id + E E EE E + * *

Parsing 12 Ambiguity Ambiguity is a function of the grammar rather than the language. Certain unambiguous grammars may have equivalent ambiguous ones.

Parsing 13 Grammar Transformations Grammars can be transformed without affecting the language generated. Three transformations are discussed next: –Eliminating Ambiguity –Eliminating Left Recursion (i.e.productions of the form A  A  ) –Left Factoring

Parsing 14 Grammar Transformation 1. Eliminating Ambiguity Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity. For example, expressions involving additions and products can be written as follows: E  E + T | T T  T * id | id The language generated by this grammar is the same as that generated by the grammar on tranparency 11. Both generate id(+id| * id)* However, this grammar is not ambiguous.

Parsing 15 Grammar Transformation 1. Eliminating Ambiguity (Cont.) One advantage of this grammar is that it represents the precedence between operators. In the parsing tree, products appear nested within additions E T TE id + * T

Parsing 16 Grammar Transformation 1. Eliminating Ambiguity (Cont.) The most famous example of ambiguity in a programming language is the dangling else. Consider S  if  then S else S | if  then S | 

Parsing 17 Grammar Transformation 1. Eliminating Ambiguity (Cont.) When there are two nested ifs and only one else.. S if  then S else S  if  then S  S if  then S if  S else S 

Parsing 18 Grammar Transformation 1. Eliminating Ambiguity (Cont.) In most languages (including C++ and Java), each else is assumed to belong to the nearest if that is not already matched by an else. This association is expressed in the following (unambiguous) grammar: S  Matched | Unmatched Matched  if  then Matched else Matched |  Unmatched  if  then S |  if  then Matched else Unmatched

Parsing 19 Grammar Transformation 1. Eliminating Ambiguity (Cont.) Ambiguity is a function of the grammar It is undecidable whether a context free grammar is ambiguous. The proof is done by reduction to Post’s correspondence problem. Although there is no general algorithm, it is possible to isolate certain constructs in productions which lead to ambiguous grammars.

Parsing 20 Grammar Transformation 1. Eliminating Ambiguity (Cont.) For example, a grammar containg the production A  AA |  would be ambiguous, because the substring  has two parses. A AA A A A A A A A     This ambiguity disappears if we use the productions A  AB | B and B   or the productions A  BA | B and B  .

Parsing 21 Grammar Transformation 1. Eliminating Ambiguity (Cont.) Other three examples of ambiguous productions are: –A  A  A –A  A | A  and –A  A |  A  A A language generated by an ambiguous Context Free Grammar is inherently ambiguous if it has no unambiguous Context Free Grammar. (This can be proven formally) –An example of such a language is L={a i b j c m | i=j or j=m} which can be generated by the grammar: S  AB | DC A  aA |  C  cC |  B  bBc |  D  aDb | 

Parsing 22 Grammar Transformations 2. Elimination of Left Recursion A grammar is left recursive if it has a nonterminal A and a derivation A   A  for some string  Top-down parsing methods (to be discussed shortly) cannot handle left-recursive grammars, so a transformation to eliminate left recursion is needed. Immediate left recursion (productions of the form A  A  ) can be easily eliminated. We group the A-productions as A  A  1 | A  2 | … | A  m |  1 |  2 | … |  n where no  i begins with A. Then we replace the A- productions by A   1 A’ |  2 A’ | … |  n A’ A’   1 A’ |  2 A ’| … |  m A’ | 

Parsing 23 Grammar Transformations 2. Elimination of Left Recursion (Cont.) The previous transformation, however, does not eliminate left recursion involving two or more steps. For example, consider the grammar S  A a | b A  A c| Sd |  S is left-recursive because S  Aa  Sda  but it is not immediately left recursive.

Parsing 24 Grammar Transformations 2. Elimination of Left Recursion (Cont.) Algorithm. Eliminate left recursion Arrange nonterminals in some order A 1, A 2,,…, A n for i =1 to n { for j =1 to i -1 { replace each production of the form A i  A j  by the production A i   1  |  2  | … |  n  where A j   1 |  2 |…|  n are all the current A j -productions } eliminate the immediate left recursion among the A i -productions }

Parsing 25 Grammar Transformations 2. Elimination of Left Recursion (Cont.) To show that the previous algorithm actually works all we need notice is that iteration i only changes productions with A i on the left-hand side. And m > i in all productions of the form A i  A m . This can be easily shown by induction. –It is clearly true for i=1. –If it is true for all i<k, then when the outer loop is executed for i=k, the inner loop will remove all productions A i  A m  with m < i. –Finally, with the elimination of self recursion, m in the A i  A m  productions is forced to be > i. So, at the end of the algorithm, all derivations of the form A i   A m  will have m > i and therefore left recursion would not be possible.

Parsing 26 Grammar Transformations 3. Left Factoring Left factoring helps transform a grammar for predictive parsing For example, if we have the two productions S  if  then S else S | if  then S on seeing the input token if, we cannot immediately tell which production to choose to expand S. In general, if we have A    1 |   2 and the input begins with , we do not know (without looking further) which production to use to expand A.

Parsing 27 Grammar Transformations 3. Left Factoring(Cont.) However, we may defer the decision by expanding A to  A’. Then after seeing the input derived from , we may expand A’ to  1 or to  2. That is, left-factored, the original productions become A   A’ A’   1 |  2

Parsing 28 Non-Context-Free Language Constructs Examples of non-context-free languages are: –L 1 ={wcw | w is of the form (a|b)*} –L 2 ={a n b m c n d m | n  1 and m  1 } –L 3 ={a n b n c n | n  0 } Languages similar to these that are context free –L’ 1 ={wcw R | w is of the form (a|b)*} (w R stands for w reversed) This language is generated by the grammar »S  aSa | bSb | c –L’ 2 ={a n b m c m d n | n  1 and m  1 } This language is generated by the grammar »S  aSd | aAd »A  bAc | bc

Parsing 29 Non-Context-Free Language Constructs (Cont.) –L” 2 ={a n b n c m d m | n  1 and m  1 } This language is generated by the grammar »S  AB »A  aAb | ab »B  cBd | cd –L’ 3 ={a n b n | n  1} This language is generated by the grammar »S  aSb | ab This language is not definable by any regular expression

Parsing 30 Non-Context-Free Language Constructs (Cont.) Suppose we could construct a DFSM D accepting L’ 3. D must have a finite number of states, say k. Consider the sequence of states s 0, s 1, s 2, …, s k entered by D having read , a, aa, …, a k. Since D only has k states, two of the states in the sequence have to be equal. Say, s i  s j (i  j). From s i, a sequence of i bs leads to an accepting (final) state. Therefore, the same sequence of i bs will also lead to an accepting state from s j. Therefore D would accept a j b i which means that the language accepted by D is not identical to L’ 3. A contradiction.

Parsing 31 Parsing The parsing problem is: Given string of tokens w, find a parse tree whose frontier is w. (Equivalently, find a derivation from w.) A parser for a grammar G reads a list of tokens and finds a parse tree if they form a sentence (or reports an error otherwise) Two classes of algorithms for parsing: –Top-down –Bottom-up

Parsing 32 Parser generators A parser generator is a program that reads a grammar and produces a parser. The best known parser generator is yacc. Both produce bottom-up parsers. Most parser generators - including yacc - do not work for every cfg; they accept a restricted class of cfg’s that can be parsed efficiently using the method employed by that parser generator.

Parsing 33 Top-down parsing Starting from parse tree containing just S, build tree down toward input. Expand left-most non- terminal. Algorithm: (next slide)

Parsing 34 Top-down parsing (cont.) Let input = a 1 a 2...a n current sentential form (csf) = S loop { suppose csf = t 1...t k A  if t 1...t k  a 1...a k, it’s an error based on a k+1..., choose production A  csf becomes t 1...t k  }

Parsing 35 Top-down parsing example Grammar: H: L  E ; L | E E  a | b Input: a;b Parse tree Sentential form Input LL a;b E;LE;L L EL ; L EL ; a a; L a;b

Parsing 36 Top-down parsing example (cont.) Parse tree Sentential form Input L EL ; a E a; E a;b L EL ; a E b

Parsing 37 LL(1) parsing Efficient form of top-down parsing. Use only first symbol of remaining input ( a k+1 ) to choose next production. That is, employ a function M:   N  P in “choose production” step of algorithm. When this works, grammar is (usually) called LL(1). (More precise definition to follow.)

Parsing 38 LL(1) examples Example 1: H: L  E ; L | E E  a | b Given input a;b, so next symbol is a. Which production to use? Can’t tell.  H not LL(1).

Parsing 39 LL(1) examples Example 2: Exp  Term Exp’ Exp’  $ | + Exp Term  id (Use $ for “end-of-input” symbol.) Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.

Parsing 40 Nonrecursive predictive parsing It is possible to build a nonrecursive predictive parser by maintaining as stack explicitly, rather tan implicitly via recursive calls. The key problem during predictive parsing is that of determining the production to be applied for a non-terminal.

Parsing 41 Nonrecursive predictive parsing Algorithm. Nonrecursive predictive parsing Set ip to point to the first symbol of w$. repeat Let X be the top of the stack symbol and a the symbol pointed to by ip if X is a terminal or $ then if X == a then pop X from the stack and advance ip else error() else // X is a nonterminal if M[X,a] == X  Y 1 Y 2 … Y k then pop X from the stack push Y k Y k-1, …, Y 1 onto the stack with Y 1 on top (push nothing if Y 1 Y 2 … Y k is  ) output the production X  Y 1 Y 2 … Y k else error() until X == $

Parsing 42 LL(1) grammars No left recursion. A  A  : If this production is chosen, parse makes no progress. No common prefixes. A   |  Can fix by “left factoring”: A   A’  ’  | 

Parsing 43 LL(1) grammars (cont.) No ambiguity. Precise definition requires that production to choose be unique (“choose” function M very hard to calculate otherwise).

Parsing 44 Top-down Parsing Input tokens: L E0 … E-n Start symbol and root of parse tree Input tokens: L E0 … E-n... From left to right, “grow” the parse tree downwards

Parsing 45 Checking LL(1)-ness For any sequence of grammar symbols , define set FIRST(  )   to be those tokens a such that   …  a  for some . (Notation: write   * a .)

Parsing 46 Checking LL(1)-ness Define: Grammar G = (N, , P, S) is LL(1) if whenever there are two left-most derivations (in which the leftmost non-terminal is always expanded first ) S =>* wA  => w  =>* wx S =>* wA  => w  =>* wy Such that FIRST(x) = FIRST(y), it follows that  = . In other words, given 1. A string wA  in V* and 2. The first terminal symbol to be derived from A , say t There is at most one production that can be applied to A to yield a derivation of any terminal string beginning with wt. FIRST sets can often be calculated by inspection.

Parsing 47 FIRST Sets Exp  Term Exp’ Exp’  $ | + Exp Term  id (Use $ for “end-of-input” symbol.) FIRST(Term Exp’) = { id } FIRST($) = {$}, FIRST( + Exp) = { + } implies FIRST($)  FIRST( + Exp) = {} FIRST( id ) = { id }  grammar is LL(1)

Parsing 48 FIRST Sets H: L  E ; L | E E  a | b FIRST(E ; L) = { a, b } = FIRST(E) FIRST(E ; L)  FIRST(E)  {}  H not LL(1).

Parsing 49 How to compute FIRST Sets of Vocabulary Symbols Algorithm. Compute FIRST(X) for all grammar symbols X forall X  V do FIRST(X)={} forall X   (X is a terminal) do FIRST(X)={X} forall productions X   do FIRST(X) = FIRST(X) U {  } repeat forall productions X  Y 1 Y 2 … Y k do forall i  [1,k] do FIRST(X) = FIRST(X) U (FIRST(Y i ) - {  }) if   FIRST(Y i ) then continue outer loop FIRST(X) = FIRST(X) U {  } until no more terminals or  are added to any FIRST set

Parsing 50 How to compute FIRST Sets of Strings of Symbols FIRST(X 1 X 2 …X n ) is the union of FIRST(X 1 ) and all FIRST(X i ) such that   FIRST(X k ) for k=1,2,..,i-1 FIRST(X 1 X 2 …X n ) contains  iff   FIRST(X k ) for k=1,2,..,n.

Parsing 51 FIRST Sets do not Suffice Given the productions A  T x A  T y T  w T   T  w should be applied when the next input token is w. T   should be applied whenever the next terminal (the one pointed to by ip) is either x or y

Parsing 52 FOLLOW Sets For any nonterminal X, define set FOLLOW(X)   to be those tokens a such that S  *  X a  for some  and .

Parsing 53 How to compute the FOLLOW Set Algorithm. Compute FOLLOW(X) for all nonterminals X FOLLOW(S) ={$} forall productions A   B  do FOLLOW(B)=Follow(B) U (FIRST(  ) - {  }) repeat forall productions A   B or A   B  with   FIRST(  ) do FOLLOW(B) = FOLLOW(B) U FOLLOW(A) until all FOLLOW sets remain the same

Parsing 54 Construction of a predictive parsing table Algorithm. Construction of a predictive parsing table M[:,:] = {} forall productions A   do forall a  FIRST(  ) do M[A,a] = M[A,a] U {A   } if   FIRST(  ) then forall b  FOLLOW(A) do M[A,b] = M[A,b] U {A   } Make all empty entries of M be error

Parsing 55 Another Definition of LL(1) Define: Grammar G is LL(1) if for every A  N with productions A  1  n  FIRST(  i FOLLOW(A))  FIRST(  j FOLLOW(A) ) =  for all i, j;