Transforming Context-Free Grammars to Chomsky Normal Form 1 Roger L. Costello April 12, 2014.

Slides:



Advertisements
Similar presentations
How to convert a left linear grammar to a right linear grammar
Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
Binary Trees Chapter 6. Linked Lists Suck By now you realize that the title to this slide is true… By now you realize that the title to this slide is.
How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
CS 3240 – Chapter 6.  6.1: Simplifying Grammars  Substitution  Removing useless variables  Removing λ  Removing unit productions  6.2: Normal Forms.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Context-Free Grammars Lecture 7
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
Normal forms for Context-Free Grammars
How to Convert a Context-Free Grammar to Greibach Normal Form
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Context-free grammars are a subset of context-sensitive grammars
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
1 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the strings of a language. Notational Example.
1 Chapter Construction Techniques. 2 Section 3.3 Grammars A grammar is a finite set of rules, called productions, that are used to describe the.
Pushdown Automata (PDA) Intro
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Context Free Grammars.
Introduction to Parsing
Section 12.4 Context-Free Language Topics
Lecture 11 Theory of AUTOMATA
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
Formal Languages and Grammars
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
9/15/2010CS485, Lecture 2, Fall Lecture 2: Introduction to Syntax (Revised based on the Tucker’s slides)
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Exercises on Chomsky Normal Form and CYK parsing
Chomsky Normal Form.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
20 G M aaba acba aaba.. What is it about? Models of Language Generation Models of Language Recognition.
Complexity and Computability Theory I
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Transforming Context-Free Grammars to Chomsky Normal Form 1 Roger L. Costello April 12, 2014

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? 2

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? 2.Why is Chomsky Normal Form useful/relevant? 3

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? 2.Why is Chomsky Normal Form useful/relevant? 3.How can arbitrary context-free grammars be converted to Chomsky Normal Form? 4

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? 2.Why is Chomsky Normal Form useful/relevant? 3.How can arbitrary context-free grammars be converted to Chomsky Normal Form? 4.Can we determine a priori how many steps it will take for a grammar to generate a string? 5

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? 2.Why is Chomsky Normal Form useful/relevant? 3.How can arbitrary context-free grammars be converted to Chomsky Normal Form? 4.Can we determine a priori how many steps it will take for a grammar to generate a string? 5.Is there a procedure for determining if a string is in the set of strings generated by a grammar? 6

But first, binary trees Before defining Chomsky Normal Form, let’s talk a bit about binary trees. Each node in a binary tree has zero, one, or two children. 7

Sample binary tree 8 SAaBCcDd

Node with 2 children 9 SAaBCcDd This node has two children

Node with 1 child 10 SAaBCcDd This node has one child

Node with 0 children 11 SAaBCcDd This node has no children

Well studied Binary trees have been well-studied. Lots is known about them. 12

Specialized binary trees There are specialized binary trees. One such specialized binary tree requires each node have either zero or two children (no nodes with one child). 13

Sample specialized binary tree 14 SABCD Each node has either zero children or two children.

Full binary tree 15 Definition: A full binary tree is a binary tree in which each node has exactly zero or two children.

Number of nodes a full binary tree 16

Calculate number of nodes in this full binary tree 17 SABCD

Context-free grammar Here is a context-free grammar: 18 S → AaBb A → aB B → b Don’t know what a context-free grammar is? Check out my tutorial:

Production tree 19 SAaBbaBbb S → AaBb A → aB B → b grammar

Number of child nodes 20 This node has 4 child nodes SAaBbaBbb

Number of child nodes 21 This node has 2 child nodes SAaBbaBbb

Number of child nodes 22 This node has 1 child node SAaBbaBbb

Number of child nodes 23 This node has 0 child nodes SAaBbaBbb

Nodes have 0, 1, 2, or 4 child nodes 24 SAaBbaBbb

Terminology: arity Arity is the maximum number of child nodes that a node in the tree may have. The arity of the tree on the previous slide is 4. Conversely, the arity of a binary tree is 2. 25

Not well-studied Whereas binary trees are well-studied, trees of arbitrary arity are not so well studied. For trees that have arbitrary arity it is hard to find nice, neat results. 26

Another context-free grammar Here is a context-free grammar: 27 S → AB A → a B → b

Here is its production tree 28 SAaBb S → AB A → a B → b The production tree is a binary tree.

Arbitrary context-free grammars versus restricted context-free grammars Arbitrary context-free grammars yield production trees that are not binary. Grammars with rules which are restricted to no more than 2 symbols on the right-hand side have production trees that are binary trees. 29

Benefit of restricted grammar rules There are benefits to grammars that are restricted to no more than 2 symbols on the right-hand side of each rule: Their production trees are binary trees, which are well-studied and lots of useful research results can be applied to such trees. 30

Let’s recap what we’ve learned Binary trees consist of nodes that have 0, 1, or 2 child nodes. 31

Let’s recap what we’ve learned Binary trees consist of nodes that have 0, 1, or 2 child nodes. Binary trees are well-studied. 32

Let’s recap what we’ve learned Binary trees consist of nodes that have 0, 1, or 2 child nodes. Binary trees are well-studied. Context-free grammars with rules that have at most 2 symbols on the right-hand side yield production trees that are binary trees. 33

Let’s recap what we’ve learned Binary trees consist of nodes that have 0, 1, or 2 child nodes. Binary trees are well-studied. Context-free grammars with rules that have at most 2 symbols on the right-hand side yield production trees that are binary trees. Arbitrary context-free grammars have production trees that are not binary trees. 34

Let’s recap what we’ve learned Binary trees consist of nodes that have 0, 1, or 2 child nodes. Binary trees are well-studied. Context-free grammars with rules that have at most 2 symbols on the right-hand side yield production trees that are binary trees. Arbitrary context-free grammars have production trees that are not binary trees. Non-binary trees are not so well-studied. 35

Chomsky Normal Form A context-free grammar is in Chomsky Normal Form if each rule has one of these forms: 1.X → a 2.X → YZ That is, the right-hand side is either a single terminal or two non-terminals. 36 Convention: uppercase letters denote non-terminal symbols and lowercase letters denote terminal symbols.

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? A context-free grammar is in Chomsky Normal Form if each rule has one of these forms: 1.X → a 2.X → YZ 2.Why is Chomsky Normal Form useful/relevant? The production trees for grammars in Chomsky Normal Form are binary trees. Binary trees are well-studied. The results from research on binary trees can be applied to grammars in Chomsky Normal Form. 37

ε -rules, ε -free A grammar rule that has an empty right-hand side, e.g., A → ε is called an ε -rule. Read that rule as: A may be replaced by the empty string (which we denote by ε ). A grammar that contains no such rules is called ε -free. 38

Transform any context-free grammar to Chomsky Normal Form To every ε-free context-free grammar one can find an equivalent grammar in Chomsky Normal Form. 39 Context-free grammar in Chomsky Normal Form transform

Example of a grammar that is transformed to Chomsky Normal Form 40 S → AaBb A → aB B → b S → AX 1 A → A 1 B B → b A 1 → a B 1 → b X 1 → A 1 X 2 X 2 → BB 1 transform Chomsky Normal Form

3-step process The following slides shows a 3-step process for transforming any context-free grammar into an equivalent grammar in Chomsky Normal Form. 41

Step 1: replace terminals mixed in with non-terminals 42 Q → aPQ → A 1 P A 1 → a Step 1

Example 43 S → AB A → aCa A → a B → bB B → b C → D D → d S → AB A → A 1 CA 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → b Step 1 Replace the right-hand side, aCa, by A 1 CA 1 and then add a new rule A 1 → a Replace the right-hand side, bB, by B 1 B and then add a new rule B 1 → b

Step 2: convert sequence of non- terminals to pairs of non-terminals For every rule with a right-hand side that contains 3 or more non-terminals, replace all non-terminals but the first by X i and then add a new rule where X i has as its right-hand side those non-terminals that were replaced by X i Repeatedly apply Step 2 until there are no rules with more than two non-terminals on the right-hand side. 44 Q → ABCDE Q → AX 1 X 1 → BX 2 X 2 → CX 3 X 3 → DE Step 2

Repeatedly apply step 2 45 Q → ABCDE Q → AX 1 X 1 → BCDE Step 2 Q → AX 1 X 1 → BX 2 X 2 → CDE Step 2 Q → AX 1 X 1 → BX 2 X 2 → CX 3 X 3 → DE

Applying step 2 to a grammar 46 S → AB A → A 1 CA 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → b S → AB A → A 1 X 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → b X 1 → CA 1 Step 2 Replace the right-hand side, A 1 CA 1, by A 1 X 1 and then add a new rule X 1 → CA 1

3 kinds of rules remain After performing steps 1 and 2, the resulting grammar has three kinds of rules: 1)X → a 2)X → Y 3)X → YZ 47

Rules of the form: X → a 1)X → a 2)X → Y 3)X → YZ 48 S → AB A → A 1 X 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → a X 1 → CA 1

Rules of the form: X → Y 1)X → a 2)X → Y 3)X → YZ 49 S → AB A → A 1 X 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → a X 1 → CA 1

Rules of the form: X → YZ 1)X → a 2)X → Y 3)X → YZ 50 S → AB A → A 1 X 1 A → a B → B 1 B B → b C → D D → d A 1 → a B 1 → a X 1 → CA 1

Chain rules Rules with the form X → Y are called chain rules. 51

Chain rules aren’t in Chomsky Normal Form Recall the definition of Chomsky Normal Form: A context-free grammar is in Chomsky Normal Form if each rule has one of these forms: 1.X → a 2.X → YZ Chain rules are of this form: X → Y Clearly that is not Chomsky Normal Form. So we must transform chain rules into the desired form. 52

Step 3: remove chain rules Consider this chain rule: X → Y From the previous few slides we know that the rule for Y must have one of these forms: 1.Y → a 2.Y → Z 3.Y → YZ If there is a rule Y → a then replace X → Y by X → a If there is a rule Y → YZ then replace X → Y by X → YZ If there is a rule Y → Z then replace X → Y by the result of replacing Z (recursive definition – cool!) 53

Example 54 S → AB A → A i X i A → a B → B i B B → b C → D D → d A i → a B i → a X i → CA i S → AB A → A i X i A → a B → B i B B → b C → d D → d A i → a B i → a X i → CA i Step 3 Chomsky Normal Form There is one chain rule: C → D D is defined by this rule: D → d So, replace the chain rule with: C → d

Another example 55 S → A A → B B → b S → b A → b B → b Step 3 This is a chain rule: S → A A is defined by this chain rule: A → B B is defined by this rule: B → b So, replace the first chain rule with: S → b And, replace the second chain rule with: A → b

Multiple rules may be generated Consider this rule: X → Y The rule for Y may be an alternative: Y → a | Z | AB So the rule for X must be replaced by: X → a X → AB plus the rule(s) generated by replacing Z 56

Recap Using the 3-step process we can transform any ε-free context-free grammar into an equivalent grammar in Chomsky Normal Form. 57 Context-free grammar in Chomsky Normal Form 3-step transform

Grammars in Chomsky Normal Form produce binary trees Each production tree that is created from a grammar in Chomsky Normal Form is a binary tree. As we’ve discussed, lots is known about binary trees. 58

Objective This mini-tutorial will answer these questions: 1.What is Chomsky Normal Form? A context-free grammar is in Chomsky Normal Form if each rule has one of these forms: 1.X → a 2.X → YZ 2.Why is Chomsky Normal Form useful/relevant? The production trees for grammars in Chomsky Normal Form are binary trees. Binary trees are well-studied. The results from research on binary trees can be applied to grammars in Chomsky Normal Form. 3.How can arbitrary context-free grammars be converted to Chomsky Normal Form? Use the 3-step process described in the previous slides. 59

Grammars generate languages 60 grammar generates string-1 string-2 string-n … The set of strings is called a language

This grammar generates a n b n 61 generates ab aabb aa…bb … Each string consists of as followed by an equal number of bs grammar (in Chomsky Normal Form) S → AX S → AB A → a B → b X → SB

62 SAaXSAaBbBb S → AX S → AB A → a B → b X → SB generates Notice that the production tree is a binary tree. grammar (in Chomsky Normal Form)

Chomsky Normal Form enables powerful results Interesting questions about grammars can be answered when the grammars are in Chomsky Normal Form. 63

Interesting Question: Is a string a member of the language? 64 grammar G (in Chomsky Normal Form) string P Is P a member of the language generated by G? yes no

Is aabb a member of a n b n ? 65 aabb Is aabb a member of the language generated by G? yes no S → AX S → AB A → a B → b X → SB

Is abb a member of a n b n ? 66 abb Is abb a member of the language generated by G? yes no S → AX S → AB A → a B → b X → SB

Another interesting question: Number of production steps needed? 67 grammar G (in Chomsky Normal Form) string P ?? steps How many steps are needed to generate P?

We will answer both questions But we will answer the latter question first: How many steps are needed to produce string P? 68

69 SAaXSAaBbBb S → AX S → AB A → a B → b X → SB generates

70 S → AX → aX → aSB → aABB → aaBB → aabB → aabb steps needed to generate aabb

Calculate the number of steps based on string length The following slides show how to calculate the number of production steps needed to generate a string. The calculation will be based on the length of the string. 71

Notation for “length of a string” 72

Generate 1 symbol takes 1 step 73 Sa S → a generates 1

Generate 2 symbols takes 3 steps 74 SAaBb S → AB A → a B → b generates 1 2 3

One grammar 75 S → AB A → a B → b This grammar generates only two symbols. S → X X → AB A → a B → b How about this grammar? It also generates only two symbols. True, but it is not in Chomsky Normal Form. Namely, the first rule is not in Chomsky Normal Form.

Generate 3 symbols takes 5 steps 76 SAaXAaBb generates S → AX S → AB A → a B → b X → AB

Generate 4 symbols takes 7 steps 77 SAaXSAaBbBb generates S → AX S → AB A → a B → b X → SB 6 7

Every non-terminal has one of these forms 78 AaABC A → a A → BC

Remove the terminal symbols 79 SAXSABB SAaXSAaBbBb remove terminals

The result is a full binary tree 80 SAXSABB

Recall this: Number of nodes in a full binary tree 81

Number of nodes 82 SAXSABB

83

84

85

86 SAaXSAaBbBb

Objective 87

88

89 Create a set of all the strings that can be generated from G in 2|P| - 1 steps CF grammar, G P ∈ L(G) A procedure exists for deciding if a string P is an element of G’s language! Is P an element of w? set w No P ∉ L(G) string P

90 Queue S S → AB A → a B → b S substitute S AB We can systematically generate all strings using a queue.

Objective 91

Case Study We are tasked to generate data for Books in a BookStore. The Genre of a Book is either fiction or non- fiction. The Publisher of a Book is either Springer, MIT Press, or Harvard Press. The Title of a Book is either “The Wisdom of Crowds,” “Six Great Ideas,” or “Society of Mind.” Create a grammar that generates strings containing the title of a book, its genre, and its publisher. 92

BookStore Grammar 93 →→→→→→→→→→→→→→→→→→→→→→ Book Bookstore Book Title Genre Publisher “Wisdom of Crowds” “Six Great Ideas” “Society of Mind” “fiction” “non-fiction” “Springer” “MIT Press” “Harvard Press” Bookstore Book Title Genre Publisher

Not in Chomsky Normal Form 94 →→→→→→→→→→→→→→→→→→→→→→ Book Bookstore Book Title Genre Publisher “Wisdom of Crowds” “Six Great Ideas” “Society of Mind” “fiction” “non-fiction” “Springer” “MIT Press” “Harvard Press” Bookstore Book Title Genre Publisher chain rule too many non- terminals on right-hand side

Transform to Chomsky Normal Form 95 →→→→→→→→→→→→→→→→→→→→→→ Book Bookstore Book Title Genre Publisher “Wisdom of Crowds” “Six Great Ideas” “Society of Mind” “fiction” “non-fiction” “Springer” “MIT Press” “Harvard Press” Bookstore Book Title Genre Publisher →→→→→→→→→→→→→→→→→→→→→→→→ Book Bookstore Title Other Genre Publisher “Wisdom of Crowds” “Six Great Ideas” “Society of Mind” “fiction” “non-fiction” “Springer” “MIT Press” “Harvard Press” Bookstore Book Other Title Genre Publisher transform Chomsky Normal Form

How many production steps needed to generate this data? 96 Wisdom of Crowds non-fiction Springer Society of Mind non-fiction Harvard Press

Determine the length of the data 97 Wisdom of Crowds non-fiction Springer Society of Mind non-fiction Harvard Press

Calculate the answer 98 Wisdom of Crowds non-fiction Springer Society of Mind non-fiction Harvard Press

Check the results 99 →→→→→→→→→→→→→→→→→→→→→→→→ Book Bookstore Title Other Genre Publisher “Wisdom of Crowds” “Six Great Ideas” “Society of Mind” “fiction” “non-fiction” “Springer” “MIT Press” “Harvard Press” Bookstore Book Other Title Genre Publisher BookstoreBookTitle Wisdom of Crowds OtherGenrenon-fictionPublisherSpringerBookstoreTitle Society of Mind OtherGenrenon-fictionPublisher Harvard Press

XML, XML Schema If the Bookstore grammar is converted into an XML Schema, how many XML elements will be needed to markup this data: 100 Wisdom of Crowds non-fiction Springer Society of Mind non-fiction Harvard Press

101 Wisdom of Crowds non-fiction Springer Society of Mind non-fiction Harvard Press

Bookstore XML Schema 102

Bookstore XML Schema 103