C Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.

Slides:



Advertisements
Similar presentations
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Advertisements

Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
ICE1341 Programming Languages Spring 2005 Lecture #5 Lecture #5 In-Young Ko iko.AT. icu.ac.kr iko.AT. icu.ac.kr Information and Communications University.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Context-Free Grammars Lecture 7
ISBN Chapter 4 Lexical and Syntax Analysis.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
Lexical and syntax analysis
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
C Chuen-Liang Chen, NTUCS&IE / 77 TOP-DOWN PARSING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CFG1 CSC 4181Compiler Construction Context-Free Grammars Using grammars in parsers.
Chapter 4 Grammars and Parsing Prof Chung /03/18.
Chap. 4, Formal Grammars and Parsing J. H. Wang Oct. 19, 2015.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
A Programming Languages Syntax Analysis (1)
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
Grammars CS 130: Theory of Computation HMU textbook, Chap 5.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Compiler Chapter 5. Context-free Grammar Dept. of Computer Engineering, Hansung University, Sung-Dong Kim.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Chapter 3 – Describing Syntax
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Context-Free Grammars
Context-Free Grammars
Context-Free Grammars
CSC 4181Compiler Construction Context-Free Grammars
CSC 4181 Compiler Construction Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Context-Free Grammars
Presentation transcript:

c Chuen-Liang Chen, NTUCS&IE / 51 CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN

c Chuen-Liang Chen, NTUCS&IE / 52 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

c Chuen-Liang Chen, NTUCS&IE / 53 Definitions about context-free grammar (1/2) context-free grammar -- G = (V t, V n, S, P)  V t --set of terminal symbols  V n --set of nonterminal symbols –a, b, c,...  V t –A, B, C,...  V n –U, V, W,...  V = V t  V n –u, v, w,...  V t * – , , ,...  V*  S --start symbol, goal symbol; S  V n  P --set of production rules of the form : A   derivation by production rule A   one step derivation :  A   left-most derivation : u A   lm u    right-most derivation :  A v  rm   v  one or more steps derivation :     lm   rm  zero or more steps derivation :  *  * lm  * rm

c Chuen-Liang Chen, NTUCS&IE / 54 Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = {  | S  *  }  left-most sentential form -- the  so that S  * lm   right-most sentential form -- the  so that S  * rm  context-free language -- L(G) = SF(G)  V t * parse tree, derivation tree --  graphic representation of derivations  root -- start symbol  leaf nodes -- grammar symbols or  interior nodes -- nonterminals  offspring of a nonterminal -- a production for a given sentential form --  phrase -- a sequence of symbols derived from a single nonterminal  simple phrase, prime phrase -- minimal phrase  handle -- left-most simple phrase

c Chuen-Liang Chen, NTUCS&IE / 55 Example of context-free grammar grammar G 0 -- E  Prefix ( E ) | V Tail Prefix  F | Tail  + E | left-most derivation -- right-most derivation -- E  lm Prefix ( E ) E  rm Prefix ( E )  lm F ( E )  rm Prefix ( V Tail )  lm F ( V Tail )  rm Prefix ( V + E )  lm F ( V + E )  rm Prefix ( V + V Tail )  lm F ( V + V Tail )  rm Prefix ( V + V )  lm F ( V + V )  rm F ( V + V ) right-most sentential forms E 2. Prefix ( E ) 3. Prefix ( V Tail ) 4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V ) 8. and so on L(G 0 )  { F ( V + V ) }

c Chuen-Liang Chen, NTUCS&IE / 56 parse trees of left-most derivations  blue symbols : left-most sentential forms Example of left-most derivation Tail E Prefix(E) FVTail +E V E Prefix(E) E (E) FVTail E Prefix(E) FVTail +E E Prefix(E) FVTail +E V EE Prefix(E) F

c Chuen-Liang Chen, NTUCS&IE / 57 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

c Chuen-Liang Chen, NTUCS&IE / 58 trace of top-down parsing (left-most derivation)  orange : just derived (predicted)blue : just read (matched) black : derived or readgreen : un-processed (parse stack) Example of top-down parsing Tail E Prefix(E) FVTail +E V E Prefix(E) E (E) FVTail E Prefix(E) FVTail +E E Prefix(E) FVTail +E V EE Prefix(E) F

c Chuen-Liang Chen, NTUCS&IE / 59 Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = {  | S  *  }  left-most sentential form -- the  so that S  * lm   right-most sentential form -- the  so that S  * rm  context-free language -- L(G) = SF(G)  V t * parse tree, derivation tree --  graphic representation of derivations  root -- start symbol  leaf nodes -- grammar symbols or  interior nodes -- nonterminals  offspring of a nonterminal -- a production for a given sentential form --  phrase -- a sequence of symbols derived from a single nonterminal  simple phrase, prime phrase -- minimal phrase  handle -- left-most simple phrase

c Chuen-Liang Chen, NTUCS&IE / 60 Example of right-most derivation (1/2) parse trees of right-most derivations and corresponding sentential form, phrases, simple phrases, handle  blue symbols : sentential form  : phrase  : simple phrase  : handle E Prefix(E) E (E) VTail E Prefix(E) VTail +E E Prefix ( V + E )Prefix ( V Tail )EPrefix ( E )

c Chuen-Liang Chen, NTUCS&IE / 61 Example of right-most derivation (2/2) E Prefix(E) FVTail +E V E Prefix(E) VTail +E V E Prefix(E) VTail +E V Prefix ( V + V Tail ) Prefix ( V + V )F ( V + V )

c Chuen-Liang Chen, NTUCS&IE / 62 Parsing function:checking syntactically validity of the input string producing structure of the corresponding parse tree callee:scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer  top-down parsing –beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) –left-most derivation –pre-order traversal of parse tree –e.g.LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing  bottom-up parsing –beginning from terminal string, determining the production used to generate leaves –right-most derivation in reverse order –post-order traversal of parse tree –e.g.LR(k) [read from Left; Right-most derivation; k lookaheads]

c Chuen-Liang Chen, NTUCS&IE / 63 trace of bottom-up parsing (inverse order of right-most derivation)  blue : just read (shifted)orange : just derived (reduced to) pink : not readgreen : derived or read (parse stack) Example of bottom-up parsing ()FV+V Prefix() F V+E VTail Prefix() F V+V Prefix() F V+VTail Prefix() F VTail +E V Prefix(E) FVTail +E V E Prefix(E) FVTail +E V

c Chuen-Liang Chen, NTUCS&IE / 64 Examples - 排骨麵特餐 example 1  排骨麵特餐  冰紅茶 排骨麵 柳丁切片 排骨麵  炸排骨 湯麵  lookahead is unnecessary example 2  排骨麵特餐  冰紅茶  排骨麵  service  柳丁切片 排骨麵  炸排骨 湯麵 | 湯麵 炸排骨 service  芋仔冰 | 別想了 ( )  lookahed is required

c Chuen-Liang Chen, NTUCS&IE / 65 Ambiguity of grammar a string with two different parse trees (i.e., two different structures) example :  -  id for an unambiguous grammar, parse trees of leftmost derivation and right-most derivation are the same id - - id - -

c Chuen-Liang Chen, NTUCS&IE / 66 First set and Follow set (1/2) First(  ) = { a  V t |   * a  }  ( if   * then { } else  )  set of all terminals that can begin a sentential form derived from   First k (  ) -- set of k-symbol terminal strings that can begin a sentential form derived from  QUIZ: for what?  QUIZ: for what? Follow(A) = { a  V t | S  +  A a  }  ( if S  +  A then { } else  )  set of all terminals that may follow A in some sentential form  Follow k (A) -- set of k-symbol terminal strings that may follow A in some sentential form QUIZ: for what?  QUIZ: for what?

c Chuen-Liang Chen, NTUCS&IE / 67 First set and Follow set (2/2) example 1 -- E  Prefix ( E ) E  V Tail Prefix  F | Tail  + E | example 2 -- S  a S e | B B  b B e | C C  c C e | d example 3 -- S  A B c A  a | B  b |

c Chuen-Liang Chen, NTUCS&IE / 68 Algorithms for First & Follow sets (1/6) typedef int symbol; /* a symbol in the grammar */ /* The symbolic constants used * below, NUM_TERMINALS, * NUM_NONTERMINALS, and * NUM_PRODUCTIONS are * determined by the grammar. * MAX_RHS_LENGTH should * simply be "big enough." */ #define VOCABULARY (NUM_NONTERMINALS + NUM_TERMINALS) typedef struct gram { symbol terminals[NUM_TERMINALS]; symbol nonterminals[NUM_NONTERMINALS]; symbol start_symbol; int num_productions; struct prod { symbol lhs; int rhs_length; symbol rhs[MAX_RHS_LENGTH]; } productions[NUM_PRODUCTIONS]; symbol vocabulary[VOCABULARY]; } grammar; typedef struct prod production; typedef symbol terminal; typedef symbol nonterminal;

c Chuen-Liang Chen, NTUCS&IE / 69 Algorithms for First & Follow sets (2/6) typedef short boolean; typedef boolean marked_vocabulary[VOCABULARY]; /* * Mark those vocabulary symbols found to derive (directly or indirectly). */ marked_vocabulary mark_lambda(const grammar g) { static marked_vocabulary derives_lambda; boolean changes;/* any changes during last iteration? */ boolean rhs_derives_lambda;/* does the RHS derive ? */ symbol v;/* a word in the vocabulary */ production p;/* a production in the grammar */ int i, j; /* loop variables */ for (v = 0; v < VOCABULARY; v++) derives_lambda[v] = FALSE; /* initially, nothing is marked */

c Chuen-Liang Chen, NTUCS&IE / 70 Algorithms for First & Follow sets (3/6) do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; if (! derives_lambda[p.lhs]) { if (p.rhs_length == 0) { /* derives directly */ changes = derives_lambda[p.lhs] = TRUE; continue; } /* does each part of RHS derive ? */ rhs_derives_lambda = derives_lambda[p.rhs[0]]; for (j = 1; j < p.rhs_length, j++) rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]]; if (rhs_derives_lambda) changes = derives_lambda[p.lhs] = TRUE; } } while (changes); return derives_lambda; }

c Chuen-Liang Chen, NTUCS&IE / 71 Algorithms for First & Follow sets (4/6) typedef set_of_terminal_or_lambda termset; termset follow_set[NUM_NONTERMINAL]; termset first_set[SYMBOL]; marked_vocabulary derives_lambda = mark_lambda(g); /* mark_lambda(g) as defined above */ termset compute_first(string_of_symbols alpha) { inti, k; termset result; k = length(alpha); if (k == 0) result = SET_OF( ); else { result = first_set[alpha[0]] - SET_OF( ) ; for (i = 1; i < k &&  first_set[alpha[i-1] ]; i++) result = result  ( first_set[alpha[i]] - SET_OF( ) ); if (i == k &&  first_set[alpha[k - 1]]) result = result  SET_OF( ); } return result; }

c Chuen-Liang Chen, NTUCS&IE / 72 Algorithms for First & Follow sets (5/6) extern grammar g; void fill_first_set(void) { nonterminalA; terminala; productionp; booleanchanges; inti, j; for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; if (derives_lambda[A]) first_set[A] = SET_OF( ); else first_set[A] =  ; } for (i = 0; i < NUM_TERMINAL; i++) { a = g.terminals[i]; first_set[a] = SET_OF( a ); for (j = 0; j < NUM_NONTERMINAL; j++) { A = g.nonterminals[j]; if (there exists a production A  a  ) first_set[A] = first_set[A]  SET_OF( a ); } do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; first_set[p.lhs] = first_set[p.lhs]  compute_first(p.rhs); if ( first_set changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness?

c Chuen-Liang Chen, NTUCS&IE / 73 Algorithms for First & Follow sets (6/6) void fill_follow_set(void) { nonterminal A, B; int i; boolean changes; for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; follow_set[A] =  ; } follow_set[g.start_symbol] = SET_OF(  ); do { changes = FALSE; for (each production A  B  ) { /* * I.e. for each production and each * occurrence of a nonterminal in its * right-hand side. */ follow_set[B] = follow_set[B]  (compute_first(  ) - SET_OF( )); if (  compute_first(  ) ) follow_set[B] = follow_set[B]  follow_set[A]; if ( follow_set[B] changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness?

c Chuen-Liang Chen, NTUCS&IE / 74 Tracing examples example 1 -- E  Prefix  ( E ‚ )  E  V Tail   Prefix  F  |  Tail  + E „  | ‘ example 2 -- S  a S  e  | B ‚  ’ B  b B  e  | C „  C  c C  e  | d‘ example 3 -- S  A  B ‚ c  A  a  |  B  b  |   ‚  „„‚ ‚   ’’    ‘ ‚   „„  ‘

c Chuen-Liang Chen, NTUCS&IE / 75 From extended BNF to CFG  { }   QUIZ: how, systematically?

c Chuen-Liang Chen, NTUCS&IE / 76 Other types of grammars regular grammar --A  a B or C  QUIZ: how?  QUIZ: how? context-free grammar --A   context-sensitive grammar --  A      type 0 grammar --    regular grammar : too simple, e.g., { [ i ] i | i  1 }  QUIZ: how to specify { [ i ] i | i  1 } by context-free grammar? context-sensitive, type 0 : without sufficient parser context-free grammar : a balance between generality and practicality