1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.

Slides:



Advertisements
Similar presentations
Hector Miguel Chavez Western Michigan University.
Advertisements

15-Dec-14 BNF. Metalanguages A metalanguage is a language used to talk about a language (usually a different one) We can use English as its own metalanguage.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Context-Free Grammars
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
ISBN Chapter 3 Describing Syntax and Semantics.
CS5371 Theory of Computation
Lecture Note of 12/22 jinnjy. Outline Chomsky Normal Form and CYK Algorithm Pumping Lemma for Context-Free Languages Closure Properties of CFL.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Costas Busch - RPI1 Context-Free Languages. Costas Busch - RPI2 Regular Languages.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Normal forms for Context-Free Grammars
30-Jun-15 BNF. Metalanguages A metalanguage is a language used to talk about a language (usually a different one) We can use English as its own metalanguage.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Fall 2004COMP 3351 Context-Free Languages. Fall 2004COMP 3352 Regular Languages.
Prof. Busch - LSU1 Context-Free Languages. Prof. Busch - LSU2 Regular Languages Context-Free Languages.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Fall 2005Costas Busch - RPI1 Context-Free Languages.
Context-Free Languages Hinrich Schütze CIS, LMU, Slides based on RPI CSCI 2400 Thanks to Costas Busch.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Chpater 3. Outline The definition of Syntax The Definition of Semantic Most Common Methods of Describing Syntax.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Formal Languages Context free languages provide a convenient notation for recursive description of languages. The original goal of CFL was to formalize.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
Grammars CPSC 5135.
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
CS 3240: Languages and Computation Context-Free Languages.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
ISBN Chapter 3 Describing Syntax and Semantics.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
ISBN Chapter 3 Describing Syntax and Semantics.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1 Chapter 6 Simplification of CFGs and Normal Forms.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Context-Free Languages. Regular Languages Context-Free Languages.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
1 Context-Free Languages. 2 Regular Languages 3 Context-Free Languages.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
David Rodriguez-Velazquez CS – 6800 Summer I
Parsing & Context-Free Grammars
Context-Free Grammars: an overview
Context-Free Languages
CS510 Compiler Lecture 4.
Syntax Specification and Analysis
7. Properties of Context-Free Languages
Simplifications of Context-Free Grammars
Jaya Krishna, M.Tech, Assistant Professor
7. Properties of Context-Free Languages
CHAPTER 2 Context-Free Languages
R.Rajkumar Asst.Professor CSE
BNF 23-Feb-19.
BNF 9-Apr-19.
lec02-parserCFG May 27, 2019 Syntax Analyzer
COMPILER CONSTRUCTION
Presentation transcript:

1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016

2 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

3 Derivation Trees Illustrate the derivation of a certain sentence from a grammar Different derivation trees with different derivation orders

4 Derivation Order Consider the following example grammar with 5 productions:

5 Leftmost derivation order of string : At each step, we substitute the leftmost variable

6 Rightmost derivation order of string : At each step, we substitute the rightmost variable

7 Rightmost derivation of : Leftmost derivation of :

8 Derivation Trees Consider the same example grammar: And a derivation of :

9 yield

10 yield

11 yield

12 yield

13 yield Derivation Tree (parse tree)

14 Give same derivation tree Sometimes, derivation order doesn’t matter Leftmost derivation: Rightmost derivation:

15 Ambiguity A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguity and non-inherent ambiguity

16 Grammar for mathematical expressions Example strings: Denotes any number

17 A leftmost derivation for

18 Another leftmost derivation for

19 Two derivation trees for

20 take

21 Good TreeBad Tree Compute expression result using the tree

22 Two different derivation trees may cause problems in applications which use the derivation trees: Evaluating expressions In general, in compilers for programming languages

23 Ambiguous Grammar: A context-free grammar is ambiguous if there is a string which has: two different derivation trees or two leftmost derivations (Two different derivation trees give two different leftmost derivations and vice-versa)

24 stringhas two derivation trees this grammar is ambiguous since Example:

25 stringhas two leftmost derivations this grammar is ambiguous also because

26 IF_STMTif EXPR then STMT if EXPR then STMT else STMT Another ambiguous grammar: VariablesTerminals Very common piece of grammar in programming languages

27 If expr1 then if expr2 then stmt1 else stmt2 IF_STMT expr1then elseifexpr2then STMT stmt1 if IF_STMT expr1thenelse ifexpr2then STMTstmt2 if stmt1 stmt2 Two derivation trees

28 In general, ambiguity is bad and we want to remove it Sometimes it is possible to find a non-ambiguous grammar for a language But, in general we cannot do so

29 Ambiguous Grammar Non-Ambiguous Grammar Equivalent generates the same language A successful example:

30 Unique derivation tree for

31 An un-successful example: every grammar that generates this language is ambiguous is inherently ambiguous:

32 Example (ambiguous) grammar for :

33 The string has always two different derivation trees (for any grammar) For example

34 Ambiguity: Summary A grammar can have multiple parser tree to derive a certain sentence Inherent ambiguous language –All grammars are ambiguous Non-inherent ambiguous language –There exist at least one grammar that is not ambiguous Checking ambiguity of a grammar or a language: un-decidable problem

35 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

36 Normal Forms Chomsky Normal Forms BNF Normal Forms

37 A → BC A → α A context free grammar is said to be in Chomsky Normal Form if all productions are in the following form: A, B and C are non terminal symbols α is a terminal symbol

38 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

39 Eliminate Useless Symbols We need to determine if the symbol is useful by identifying if a symbol is generating and is reachable X is generating if X ω for some terminal string ω. X is reachable if there is a derivation X αXβ for some α and β * *

40 Example: Removing non-generating symbols S → AB | a A → b S → AB | a A → b Initial CFL grammar S → AB | a A → b S → AB | a A → b Identify generating symbols S → a A → b S → a A → b Remove non-generating

41 Example: Removing non-reachable symbols S → a Eliminate non-reachable S → a A → b S → a A → b Identify reachable symbols

42 The order is important. S → AB | a A → b S → AB | a A → b Looking first for non-reachable symbols and then for non-generating symbols can still leave some useless symbols. S → a A → b S → a A → b

43 Finding generating symbols If there is a production A → α, and every symbol of α is already known to be generating. Then A is generating S → AB | a A → b S → AB | a A → b We cannot use S → AB because B has not been established to be generating

44 Finding reachable symbols S is surely reachable. All symbols in the body of a production with S in the head are reachable. S → AB | a A → b S → AB | a A → b In this example the symbols {S, A, B, a, b} are reachable.

45 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

46 Eliminate ε Productions In a grammar ε productions are convenient but not essential If L has a CFG, then L – {ε} has a CFG Nullable variable A ε *

47 If A is a nullable variable Whenever A appears on the body of a production A might or might not derive ε S → ASA | aB A → B | S B → b | ε Nullable: {A, B}

48 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

49 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

50 Create two version of the production, one with the nullable variable and one without it Eliminate productions with ε bodies S → ASA | aB A → B | S B → b | ε S → ASA | aB | AS | SA | S | a A → B | S B → b Eliminate ε Productions

51 Eliminate Useless Symbols 1 Eliminate ε productions 2 Eliminate unit productions 3 There are three preliminary simplifications

52 Eliminate unit productions A unit production is one of the form A → B where both A and B are variables A B * A → B, B → ω, then A → ω Identify unit pairs

53 Example: I → a | b | Ia | Ib | I0 | I1 F → I | (E) T → F | T * F E → T | E + T PairsProductions ( E, E )E → E + T ( E, T )E → T * F ( E, F )E → (E) ( E, I )E → a | b | Ia | Ib | I0 | I1 ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 ( F, F )F → (E) ( F, I )F → a | b | Ia | Ib | I0 | I1 ( I, I )I → a | b | Ia | Ib | I0 | I1 Basis: (A, A) is a unit pair of any variable A, if A A by 0 steps. * T = {*, +, (, ), a, b, 0, 1}

54 Example: PairsProductions …… ( T, T )T → T * F ( T, F )T → (E) ( T, I )T → a | b | Ia |Ib | I0 | I1 …… I → a | b | Ia | Ib | I0 | I1 E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1

55 Chomsky Normal Form (CNF) 1.Arrange that all bodies of length 2 or more to consists only of variables. 2.Break bodies of length 3 or more into a cascade of productions, each with a body consisting of two variables. Starting with a CFL grammar with the preliminary simplifications performed

56 Step 1: For every terminal α that appears in a body of length 2 or more create a new variable that has only one production. E → E + T | T * F | (E ) | a | b | la | lb | l0 | l1 T → T * F | (E) | a | b | Ia | Ib | I0 | I1 F → (E) | a | b | Ia | Ib | I0 | I1 I → a | b | Ia | Ib | I0 | I1 E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → )

57 Step 2: Break bodies of length 3 or more adding more variables E → EPT | TMF | LER | a | b | lA | lB | lZ | lO T → TMF | LER | a | b | IA | IB | IZ | IO F → LER | a | b | IA | IB | IZ | IO I → a | b | IA | IB | IZ | IO A → aB → bZ → 0O → 1 P → +M → *L → (R → ) C 1 → PT C 2 → MF C 3 → ER

58 Normal Forms Chomsky Normal Forms BNF Normal Forms

59 BNF BNF stands for either Backus-Naur Form or Backus Normal Form BNF is used to describe the grammar of a programming language BNF is formal and precise –BNF is a notation for context-free grammars BNF is essential in compiler construction

60 BNF indicate a nonterminal that needs to be further expanded, e.g. Symbols not enclosed in are terminals; they represent themselves, e.g. if, while, ( The symbol ::= means is defined as The symbol | means or; it separates alternatives, e.g. ::= + | - This is all there is to “plain” BNF; but we will discuss extended BNF (EBNF) later in this lecture

61 BNF uses recursion ::= | or ::= | Recursion is all that is needed (at least, in a formal sense) "Extended BNF" allows repetition as well as recursion Repetition is usually better when using BNF to construct a compiler

62 BNF Examples I ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= if ( ) | if ( ) else

63 BNF Examples II ::= | ::= | + | -

64 BNF Examples III ::= | | ::= { } ::= |

65 BNF Examples IV ::= | |...

66 Extended BNF The following are pretty standard: –[ ] enclose an optional part of the rule Example: ::= if ( ) [ else ] –{ } mean the enclosed can be repeated any number of times (including zero) Example: ::= ( ) | ( {, } )

67 Variations The preceding notation is the original and most common notation –BNF was designed before we had boldface, color, more than one font, etc. –A typical modern variation might: – Use boldface to indicate multi-character terminals –Quote single-character terminals (because boldface isn’t so obvious in this case) Example: –if_statement ::= if "(" condition ")" statement [ else statement ]

68 Limitations of BNF No easy way to impose length limitations, such as maximum length of variable names No easy way to describe ranges, such as 1 to 31 No way at all to impose distributed requirements, such as, a variable must be declared before it is used Describes only syntax, not semantics

69 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm

70 The CYK Algorithm The membership problem: –Problem: Given a context-free grammar G and a string w –G = (V, ∑,P, S) where »V finite set of variables »∑ (the alphabet) finite set of terminal symbols »P finite set of rules »S start symbol (distinguished element of V) »V and ∑ are assumed to be disjoint –G is used to generate the string of a language –Question: Is w in L(G)?

71 The CYK Algorithm J. Cocke D. Younger, T. Kasami –Independently developed an algorithm to answer this question.

72 The CYK Algorithm Basics –The Structure of the rules in a Chomsky Normal Form grammar –Uses a “dynamic programming” or “table- filling algorithm”

73 Chomsky Normal Form Normal Form is described by a set of conditions that each rule in the grammar must satisfy Context-free grammar is in CNF if each rule has one of the following forms: –A  BCat most 2 symbols on right side –A  a, orterminal symbol –S  εnull string where B, C Є V – {S}

74 Construct a Triangular Table Each row corresponds to one length of substrings –Bottom Row – Strings of length 1 –Second from Bottom Row – Strings of length 2. –Top Row – string ‘w’

75 Construct a Triangular Table X i, i is the set of variables A such that A  w i is a production of G Compare at most n pairs of previously computed sets: (X i, i, X i+1, j ), (X i, i+1, X i+2, j ) … (X i, j-1, X j, j )

76 Construct a Triangular Table X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Table for string ‘w’ that has length 5

77 X 1, 5 X 1, 4 X 2, 5 X 1, 3 X 2, 4 X 3, 5 X 1, 2 X 2, 3 X 3, 4 X 4, 5 X 1, 1 X 2, 2 X 3, 3 X 4, 4 X 5, 5 w1w1 w2w2 w3w3 w4w4 w5w5 Construct a Triangular Table Looking for pairs to compare

78 Example CYK Algorithm Show the CYK Algorithm with the following example: –CNF grammar G S  AB | BC A  BA | a B  CC | b C  AB | a –w is baaba –Question Is baaba in L(G)?

79 Constructing The Triangular Table {B}{A, C} {B}{A, C} baaba Calculating the Bottom ROW S  AB | BC A  BA | a B  CC | b C  AB | a

80 Constructing The Triangular Table X 1, 2 = (X i, i,X i+1, j ) = (X 1, 1, X 2, 2 )  {B}{A,C} = {BA, BC} Steps: –Look for production rules to generate BA or BC –There are two: S and A –X 1, 2 = {S, A} S  AB | BC A  BA | a B  CC | b C  AB | a

81 Constructing The Triangular Table {S, A} {B}{A, C} {B}{A, C} baaba

82 Constructing The Triangular Table X 2, 3 = (X i, i,X i+1, j ) = (X 2, 2, X 3, 3 )  {A, C}{A,C} = {AA, AC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 3 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

83 Constructing The Triangular Table {S, A}{B} {A, C} {B}{A, C} baaba

84 Constructing The Triangular Table X 3, 4 = (X i, i,X i+1, j ) = (X 3, 3, X 4, 4 )  {A, C}{B} = {AB, CB} = Y Steps: –Look for production rules to generate Y –There are two: S and C –X 3, 4 = {S, C} S  AB | BC A  BA | a B  CC | b C  AB | a

85 Constructing The Triangular Table {S, A}{B}{S, C} {B}{A, C} {B}{A, C} baaba

86 Constructing The Triangular Table X 4, 5 = (X i, i,X i+1, j ) = (X 4, 4, X 5, 5 )  {B}{A, C} = {BA, BC} = Y Steps: –Look for production rules to generate Y –There are two: S and A –X 4, 5 = {S, A} S  AB | BC A  BA | a B  CC | b C  AB | a

87 Constructing The Triangular Table {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

88 Constructing The Triangular Table X 1, 3 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 1, 1, X 2, 3 ), (X 1, 2, X 3, 3 )  {B}{B} U {S, A}{A, C}= {BB, SA, SC, AA, AC} = Y Steps: –Look for production rules to generate Y –There are NONE: S and A –X 1, 3 = Ø –no elements in this set (empty set) S  AB | BC A  BA | a B  CC | b C  AB | a

89 Constructing The Triangular Table Ø {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

90 Constructing The Triangular Table X 2, 4 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 2, 2, X 3, 4 ), (X 2, 3, X 4, 4 )  {A, C}{S, C} U {B}{B}= {AS, AC, CS, CC, BB} = Y Steps: –Look for production rules to generate Y –There is one: B –X 2, 4 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

91 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

92 Constructing The Triangular Table X 3, 5 = (X i, i,X i+1, j ) (X i, i+1,X i+2, j ) = (X 3, 3, X 4, 5 ), (X 3, 4, X 5, 5 )  {A,C}{S,A} U {S,C}{A,C} = {AS, AA, CS, CA, SA, SC, CA, CC} = Y Steps: –Look for production rules to generate Y –There is one: B –X 3, 5 = {B} S  AB | BC A  BA | a B  CC | b C  AB | a

93 Constructing The Triangular Table Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba

94 Final Triangular Table {S, A, C}  X 1, 5 Ø{S, A, C} Ø{B} {S, A}{B}{S, C}{S, A} {B}{A, C} {B}{A, C} baaba - Table for string ‘w’ that has length 5 - The algorithm populates the triangular table

95 Example (Result) Is baaba in L(G)? Yes We can see the S in the set X 1n where ‘n’ = 5 We can see the table the cell X 15 = (S, A, C) then if S Є X 15 then baaba Є L(G)

96 Theorem The CYK Algorithm correctly computes X i j for all i and j; thus w is in L(G) if and only if S is in X 1n. The running time of the algorithm is O(n 3 ).

97 Today’s Class Derivation Trees Ambiguity Normal Forms CYK Algorithm