Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
CFGs and PDAs Sipser 2 (pages ). Long long ago…
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Western Michigan University CS6800 Advanced Theory of Computation Spring 2014 By Abduljaleel Alhasnawi & Rihab Almalki.
Pushdown Automata Part II: PDAs and CFG Chapter 12.
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
Normal forms for Context-Free Grammars
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
How to Convert a Context-Free Grammar to Greibach Normal Form
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
MA/CSSE 474 Theory of Computation
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Chapter 12: Context-Free Languages and Pushdown Automata
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Pushdown Automata (PDA) Intro
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
CSCI 2670 Introduction to Theory of Computing September 20, 2005.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Context-Free Grammars Chapter 11. Languages and Machines.
Pushdown Automata (PDAs)
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Introduction to Parsing
Section 12.4 Context-Free Language Topics
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal.
Context-Free and Noncontext-Free Languages Chapter 13 1.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Exercises on Chomsky Normal Form and CYK parsing
Unrestricted Grammars
Chomsky Normal Form.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Context-Free Grammars Chapter 11. Languages and Machines.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Theory of Languages and Automata By: Mojtaba Khezrian.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Normal Forms (Chomsky and Greibach) Pushdown Automata (PDA) Intro PDA examples MA/CSSE 474 Theory of Computation.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Context-Free Grammars: an overview
Simplifications of Context-Free Grammars
Chapter 7 Regular Grammars
CHAPTER 2 Context-Free Languages
Structure and Ambiguity
Answer Questions about Exam2 problems
MA/CSSE 474 Theory of Computation
Presentation transcript:

Context-Free Grammars Normal Forms Chapter 11

Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks. ● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.

Normal Forms If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with. Normal forms are designed to do just that. Various ones have been developed for various purposes. Examples: ● Clause form for logical expressions to be used in resolution theorem proving ● Disjunctive normal form for database queries so that they can be entered in a query by example grid. ● Various normal forms for grammars to support specific parsing techniques.

Clause Form for Logical Expressions Given: [1]  x ((Roman(x)  know(x, Marcus))  (hate(x, Caesar)   y (  z (hate(y, z)  thinkcrazy(x, y))))) [2] Roman(Paulus) [3]  hate(Paulus, Caesar) [4] hate(Flavius, Marcus) [5]  thinkcrazy(Paulus, Flavius) Prove:  know(Paulus, Marcus) Sentence [1] in clause form:  Roman(x)   know(x, Marcus)  hate(x, Caesar)   hate(y, z)  thinkcrazy(x, y)

Disjunctive Normal Form for Queries The Query by Example (QBE) grid: (category = fruit and supplier = Aabco) (category = fruit or category = vegetable) CategorySupplierPrice CategorySupplierPrice fruitAabco

Disjunctive Normal Form for Queries (category = fruit or category = vegetable) CategorySupplierPrice fruit vegetable

Disjunctive Normal Form for Queries (category = fruit and supplier = Aabco) or (category = vegetable and supplier = Botrexco) CategorySupplierPrice fruitAabco vegetableBotrexco

Disjunctive Normal Form for Queries But what about: (category = fruit or category = vegetable) and (supplier = A or supplier = B) This isn’t right: CategorySupplierPrice fruitAabco vegetableBotrexco

Disjunctive Normal Form for Queries (category = fruit or category = vegetable) and (supplier = Aabco or supplier = Botrexco) becomes (category = fruit and supplier = Aabco) or (category = fruit and supplier = Botrexco) or (category = vegetable and supplier = Aabco) or (category = vegetable and supplier = Botrexco) CategorySupplierPrice fruitAabco fruitBotrexco vegetableAabco vegetableBotrexco

Normal Forms for Grammars Chomsky Normal Form, in which all rules are of one of the following two forms: ● X  a, where a  , or ● X  BC, where B and C are elements of V - . Advantages: ● Parsers can use binary trees. ● Exact length of derivations is known: S AB AAB B aab BB b b

Normal Forms for Grammars Greibach Normal Form, in which all rules are of the following form: ● X  a , where a   and   (V -  )*. Advantages: ● Every derivation of a string s contains |s| rule applications. ● Greibach normal form grammars can easily be converted to pushdown automata with no  - transitions. This is useful because such PDAs are guaranteed to halt.

Normal Forms Exist Theorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar G C such that: L(G C ) = L(G) – {  }. Proof: The proof is by construction. Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar G G such that: L(G G ) = L(G) – {  }. Proof: The proof is also by construction.

Converting to a Normal Form 1. Apply some transformation to G to get rid of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply another transformation to G to get rid of undesirable property 2. Show that the language generated by G is unchanged and that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form.

Rule Substitution X  a Y c Y  b Y  ZZ We can replace the X rule with the rules: X  abc X  a ZZ c X  a Y c  a ZZ c

Rule Substitution Theorem: Let G contain the rules: X   Y  and Y   1 |  2 | … |  n, Replace X   Y  by: X   1 , X   2 , …, X   n . The new grammar G will be equivalent to G.

Rule Substitution Theorem: Let G contain the rules: X   Y  and Y   1 |  2 | … |  n Replace X   Y  by: X   1 , X   2 , …, X   n . The new grammar G will be equivalent to G.

Rule Substitution Replace X   Y  by: X   1 , X   2 , …, X   n . Proof: ● Every string in L(G) is also in L(G): If X   Y  is not used, then use same derivation. If it is used, then one derivation is: S  …   X    Y    k   …  w Use this one instead: S  …   X    k   …  w ● Every string in L(G) is also in L(G): Every new rule can be simulated by old rules.

Conversion to Chomsky Normal Form 1. Remove all  -rules, using the algorithm removeEps. 2. Remove all unit productions (rules of the form A  B). 3. Remove all rules whose right hand sides have length greater than 1 and include a terminal: (e.g., A  a B or A  B a C) 4. Remove all rules whose right hand sides have length greater than 2: (e.g., A  BCDE)

Remove all  productions: (1) If there is a rule P   Q  and Q is nullable, Then: Add the rule P  . (2) Delete all rules Q  . Removing  -Productions

Example: S  a A A  B | CDC B   B  a C  BD D  b D   Removing  -Productions

Unit Productions A unit production is a rule whose right-hand side consists of a single nonterminal symbol. Example: S  X Y X  A A  B | a B  b Y  T T  Y | c

removeUnits(G) = 1. Let G = G. 2. Until no unit productions remain in G do: 2.1 Choose some unit production X  Y. 2.2 Remove it from G. 2.3 Consider only rules that still remain. For every rule Y  , where   V*, do: Add to G the rule X   unless it is a rule that has already been removed once. 3. Return G. After removing epsilon productions and unit productions, all rules whose right hand sides have length 1 are in Chomsky Normal Form. Removing Unit Productions

removeUnits(G) = 1. Let G = G. 2. Until no unit productions remain in G do: 2.1 Choose some unit production X  Y. 2.2 Remove it from G. 2.3 Consider only rules that still remain. For every rule Y  , where   V*, do: Add to G the rule X   unless it is a rule that has already been removed once. 3. Return G. Removing Unit Productions Example:S  X Y X  A A  B | a B  b Y  T T  Y | c

removeUnits(G) = 1. Let G = G. 2. Until no unit productions remain in G do: 2.1 Choose some unit production X  Y. 2.2 Remove it from G. 2.3 Consider only rules that still remain. For every rule Y  , where   V*, do: Add to G the rule X   unless it is a rule that has already been removed once. 3. Return G. Removing Unit Productions Example:S  X Y X  A A  B | a B  b Y  T T  Y | c S  X Y A  a | b B  b T  c X  a | b Y  c

Mixed Rules removeMixed(G) = 1. Let G = G. 2. Create a new nonterminal T a for each terminal a in . 3. Modify each rule whose right-hand side has length greater than 1 and that contains a terminal symbol by substituting T a for each occurrence of the terminal a. 4. Add to G, for each T a, the rule T a  a. 5. Return G. Example: A  a A  a B A  B a C A  B b C

Mixed Rules removeMixed(G) = 1. Let G = G. 2. Create a new nonterminal T a for each terminal a in . 3. Modify each rule whose right-hand side has length greater than 1 and that contains a terminal symbol by substituting T a for each occurrence of the terminal a. 4. Add to G, for each T a, the rule T a  a. 5. Return G. Example: A  a A  a B A  B a C A  B b C A  a A  T a B A  BT a C A  BT b C T a  a T b  b

Long Rules removeLong(G) = 1. Let G = G. 2. For each rule r of the form: A  N 1 N 2 N 3 N 4 …N n, n > 2 create new nonterminals M 2, M 3, … M n Replace r with the rule A  N 1 M Add the rules: M 2  N 2 M 3, M 3  N 3 M 4, … M n-1  N n-1 N n. 5. Return G. Example: A  BCDEF

An Example S  a AC a A  B | a B  C | c C  c C |  removeEps returns: S  a AC a | a A a | a C a | aa A  B | a B  C | c C  c C | c

An Example Next we apply removeUnits: Remove A  B. Add A  C | c. Remove B  C. Add B  c C (B  c, already there). Remove A  C. Add A  c C (A  c, already there). So removeUnits returns: S  a AC a | a A a | a C a | aa A  a | c | c C B  c | c C C  c C | c S  a AC a | a A a | a C a | aa A  B | a B  C | c C  c C | c

An Example S  a AC a | a A a | a C a | aa A  a | c | c C B  c | c C C  c C | c Next we apply removeMixed, which returns: S  T a ACT a | T a AT a | T a CT a | T a T a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c

An Example S  T a ACT a | T a AT a | T a CT a | T a T a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c Finally, we apply removeLong, which returns: S  T a S 1 S  T a S 3 S  T a S 4 S  T a T a S 1  AS 2 S 3  AT a S 4  CT a S 2  CT a A  a | c | T c C B  c | T c C C  T c C | c T a  a T c  c

The Price of Normal Forms E  E + E E  (E) E  id Converting to Chomsky normal form: E  E E E  P E E  L E  E   E R E  id L  ( R  ) P  + Conversion doesn’t change weak generative capacity but it may change strong generative capacity.

Island Grammars Suppose that: ● Inputs are ill-formed: ● “Um, I uh need a copy of uh my bill for er Ap, no May, I think, or June, maybe all of them uh, I guess that would work.” ● bold italic ● Inputs are not well understood.

Island Grammars Suppose that: ● Inputs are a blend of multiple languages: ● Spanglish ● A Web page with HTML, Java, etc. ● Inputs contain parts we care about and parts we don’t: ● Extracting the top-level structure of an XML document and ignoring the content. A key application: reverse engineering.

Island Grammars An island grammar contains: ● A set of detailed rules that describe the fragments that we care about. We’ll call these fragments islands. ● A set of flexible rules that can match everything else. We’ll call everything else the water. We could use regular expressions to find islands. But what if we need to match delimiters? A context-free island grammar is a hybrid between a context-free grammar and a set of patterns described as regular expressions.

An Island Grammar [1]  * [2]  CALL ( ) {cons( CALL )} [3]  CALL ERROR ( ) {reject} [4]  [5]   * {avoid}

Stochastic Context-Free Grammars A stochastic context-free grammar G is a quintuple: (V, , R, S, D): ● V is the rule alphabet, ●  is a subset of V, ● R is a finite subset of (V -  )  V*, ● S can be any element of V - , ● D is a function from R to [0 - 1]. For every nonterminal symbol n, the sum of the probabilities associated with all rules whose left-hand side is n must be 1.

Stochastic Context-Free Example PalEven = {ww R : w  { a, b }*}. But now suppose: ● a ’s occur three times as often as b ’s do. G = ({S, a, b }, { a, b }, R, S, D): S  a S a [.72] S  b S b [.24] S   [.04]

Stochastic Context-Free Grammars The probability of a particular parse tree t: Let C be the collection (in which duplicates count) of rules r that were used to generate t. Then: Example: S  a S a [.72] S  b S b [.24] S   [.04] S  a S a  aa S aa  aab S baa  aabbaa =

Stochastic Context-Free Grammars Stochastic grammars can be used to answer two important kinds of questions: ● In an error-free environment: given s, find the most likely parse tree for it. ● In a noisy environment: given a set of possible true strings X and an observed string o, find the particular string s (and possibly also the most likely parse for it) that is most likely to have been the one that was actually generated.

Stochastic Context-Free Grammars ● The noisy environment problem: Finding the probability that string w was generated: If T is the set of possible parse trees for w, then the total probability of generating w is:. So the highest probability sentence s is:.

Three Dimensional RNA Structure Complementary base pairs (C – G and A – U) form hydrogen bonds that determine the 3-D structure of the molecule. Other pairs (e.g., G – U) can also bond, but with lower probability.

Three Dimensional RNA Structure  [1]  C G [.23]  G C [.23]  A U [.23]  U A [.23]  G U [.03]  U G [.03]  …

Stochastic Context-Free Grammars and Generation Stochastic CFGs can generate very realistic English texts. Applications: ● Fake papers: SCIgenSCIgen ● Spam obfuscation