Context-Free Grammars Chapter 11. Languages and Machines.

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Pushdown Automata Chapter 12. Recognizing Context-Free Languages We need a device similar to an FSM except that it needs more power. The insight: Precisely.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
CS5371 Theory of Computation
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Normal forms for Context-Free Grammars
Chapter 3: Formal Translation Models
MA/CSSE 474 Theory of Computation
COP4020 Programming Languages
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Pushdown Automata (PDA) Intro
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Context-Free Grammars Chapter 11. Languages and Machines.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Classification of grammars Definition: A grammar G is said to be 1)Right-linear if each production in P is of the form A  xB or A  x where A and B are.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
PART I: overview material
Context-Free Grammars Chapter 11. Languages and Machines.
ISBN Chapter 3 Describing Syntax and Semantics.
Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 11 Midterm Exam 2 -Context-Free Languages Mälardalen University 2005.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
ISBN Chapter 3 Describing Syntax and Semantics.
Context-Free Grammars Chapter Languages and Machines 2.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 5 Context-free Languages
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Context-Free Grammars Chapter 11. Languages and Machines.
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?)
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Theory of Languages and Automata By: Mojtaba Khezrian.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Normal Forms (Chomsky and Greibach) Pushdown Automata (PDA) Intro PDA examples MA/CSSE 474 Theory of Computation.
Context-Free Grammars: an overview
CS510 Compiler Lecture 4.
Context-Free Grammars
CHAPTER 2 Context-Free Languages
Finite Automata and Formal Languages
Context-Free Grammars
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Structure and Ambiguity
Answer Questions about Exam2 problems
COMPILER CONSTRUCTION
Presentation transcript:

Context-Free Grammars Chapter 11

Languages and Machines

Background Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages. Developed by Noam Chomsky in mid 50’s 1928 – Professor emeritus at MIT Father of modern linguistics Still holds officeoffice Controversial political critic Often receives undercover police protection

Rewrite Systems and Grammars A rewrite system (or production system or rule-based system) is: ● a list of rules, and ● an algorithm for applying them Each rule has a left-hand side and a right hand side. Example rules: S  a S b a S   a S b  b S ab S a

Simple-rewrite simple-rewrite(R: rewrite system, w: initial string) = 1. Set working-string to w. 2. Until told by R to halt do: Match the lhs of some rule against some part of working-string. Replace the matched part of working-string with the rhs of the rule that was matched. 3. Return working-string. If simple-rewrite(R, w) can return some string s, then we say that R can drive s from w

A Rewrite System Formalism A rewrite system formalism specifies: ● The form of the rules ● How simple-rewrite works: ● How to choose rules? ● When to quit?

An Example w = S a S Rules: S  a S b a S   ● What order to apply the rules? ● When to quit?

Rule Based Systems ● Expert systems ● Cognitive modeling ● Business practice modeling ● General models of computation ● Grammars G L(G)

Grammars Define Languages A grammar has a set of rules, and works with an alphabet, that can be divided into two subsets: a terminal alphabet, , that contains the symbols that make up the strings in L(G), and a nonterminal alphabet, the elements of which will function as working symbols that will be used while the grammar is operating. These symbols will disappear by the time the grammar finishes its job and generates a string. A grammar has a unique start symbol, often called S.

Using a Grammar to Derive a String Simple-rewrite (G, S) will generate the strings in L(G). We will use the symbol  to indicate steps in a derivation. A derivation could begin with: S  a S b  aa S bb  …

Generating Many Strings Multiple rules may match. Given: S  a S b, S  b S a, and S   Derivation so far: S  a S b  aa S bb  Three choices at the next step: S  a S b  aa S bb  aaa S bbb (using rule 1), S  a S b  aa S bb  aab S abb (using rule 2), S  a S b  aa S bb  aabb (using rule 3).

Generating Many Strings One rule may match in more than one way. Given: S  a TT b, T  b T a, and T   Derivation so far: S  a TT b  Two choices at the next step: S  a TT b  ab T a T b  S  a TT b  a T b T ab 

When to Stop May stop when: 1.The working string no longer contains any nonterminal symbols (including, when it is  ). In this case, we say that the working string is generated by the grammar. Example: S  a S b  aa S bb  aabb

When to Stop May stop when: 2.There are nonterminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string. Example: Rules:S  a S b, S  b T a, and S   Derivations: S  a S b  ab T ab  [blocked]

When to Stop It is possible that neither (1) nor (2) is achieved. Example: G contains only the rules S  B a and B  b B, with S the start symbol. Then all derivations proceed as: S  B a  b B a  bb B a  bbb B a  bbbb B a ... So the grammar generates the language 

Context-free Grammars, Languages, and PDAs Context-free Language Context-free Grammar PDA Recognizes or Accepts Generates

Recall Regular Grammar Have a left-hand side that is a single nonterminal Have a right-hand side that is  or a single terminal or a single terminal followed by a single nonterminal Regular grammars must always produce strings one character at a time, moving left to right. L = {w  { a, b }* : |w| is even} (( aa )  ( ab )  ( ba )  ( bb ))* G:S   S  a T S  b T T  a S T  b S M: But it may be more natural to describe generation more flexibly.

Context-Free Grammars No restrictions on the form of the right hand sides. S  ab D e FG ab But require single non-terminal on left hand side as in regular grammars. S  but not ASB 

Context-Free Grammars A context-free grammar G is a quadruple, (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals and terminals ●  (the set of terminals) is a subset of V ● R (the set of rules) is a finite subset of (V -  )  V* ● S (the start symbol) is an element of V -  Example: ({S, a, b }, { a, b }, {S  a S b, S   }, S)

Derivations x  G y iff x =  A  and A   is in R y =    w 0  G w 1  G w 2  G...  G w n is a derivation in G Let  G * be the reflexive, transitive closure of  G Then the language generated by G, denoted L(G), is: {w   * : S  G * w}

An Example Derivation Example: Let G = ({S, a, b }, { a, b }, {S  a S b, S   }, S) S  a S b  aa S bb  aaa S bbb  aaabbb S  * aaabbb

Context-Free A language L is context-free iff it is generated by some context-free grammar G Why “context-free”? Using these rules, the decision to replace a nonterminal by some other sequence is made without looking at the context in which the nonterminal occurs. Note by definition, lhs is a single nonterminal There are less restrictive grammar formalisms (context-sensitive, unrestricted), where the lhs may contain several symbols Context-sensitive grammar example: aSa -> aTa, where S can be replaced by T when it is surrounded by a’s. Note that context is considered. Unrestricted grammar is even less restrictive Context-free grammar = LBA = context sensitive language Unrestricted grammar = TM = SD Every regular language is also context-free

Balanced Parentheses Showed in Example 8.10 (p173) that Bal is not regular. Can we use regular grammar to define programming languages? S   S  SS S  (S) Some example derivations in G: S  ( S)  () S  ( S)  ( SS )  (( S)S)  (( ) ( S))  (()()) So, S  * () and S  * (()())

AnBnAnBn Showed in Example 8.8 (p171) that A n B n is not regular. S   S  a S b

Recursive and Self-Embedding Rules A rule is recursive iff it is X  w 1 Yw 2, where: Y  * w 3 Xw 4 for some w 1, w 2, w 3, and w 4 in V* A grammar is recursive iff it contains at least one recursive rule. Recursive rules make it possible for a finite grammar to generate an infinite set of strings Examples: S  (S) S  a S A rule in a grammar G is self-embedding iff it is : X  w 1 Yw 2, where Y  * w 3 Xw 4 and both w 1 w 3 and w 4 w 2 are in  + It allows X  * w’Xw’’ where neither w’ nor w’’ is  A grammar is self-embedding iff it contains at least one self-embedding rule. Example: S  (S)

Where Context-Free Grammars Get Their Power If a grammar G is not self-embedding then L(G) is regular. If a language L has the property that every grammar that defines it is self-embedding, then L is not regular.

PalEven = {ww R : w  { a, b }*} Even length palindromes G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S a S  b S b S   }.

Equal Numbers of a ’s and b ’s Let L = {w  { a, b }*: # a (w) = # b (w)}. G = {{S, a, b }, { a, b }, R, S}, where: R = { S  a S b S  b S a S  SS S   }.

BNF The symbol | should be read as “or”. Example: S  a S b | b S a | SS |  Allow a nonterminal symbol to be any sequence of characters surrounded by angle brackets. Examples of nonterminals: Backus Naur Form: a notation for writing practical context-free grammars

BNF for a Java Fragment ::= { } | {} ::= | ::= | while ( ) | if ( ) | do while ( ); | ; | return | return | ; {while(x < 12) { hippo.pretend(x); x = x + 2; }} Many other kinds of practical languages are also context-free. e.g., HTML

HTML Item 1, which will include a sublist First item in sublist Second item in sublist Item 2 A grammar: /* Text is a sequence of elements. HTMLtext  Element HTMLtext |  Element  UL | LI | … (and other kinds of elements that are allowed in the body of an HTML document) /* The and tags must match. UL  HTMLtext /* The and tags must match. LI  HTMLtext

Designing Context-Free Grammars Several simple strategies: ● Related regions must be generated in tandem. otherwise, no way to enforce the necessary constraint A n B n ● For independent regions, use concatenation A  BC ● Generate outside-in: to generate A  a A b

Concatenating Independent Sublanguages Let L = { a n b n c m : n, m  0}. The c m portion of any string in L is completely independent of the a n b n portion, so we should generate the two portions separately and concatenate them together. G = ({S, N, C, a, b, c }, { a, b, c }, R, S} where: R = { S  NC N  a N b N   C  c C C   }.

The Kleene star of a language L = { : k  0 and  i (n i  0)} Examples of strings in L: , abab, aabbaaabbbabab Note that L = { a n b n : n  0}* G = ({S, M, a, b }, { a, b }, R, S} where: R = { S  MS // each M will generate one { a n b n : n  0} S   M  a M b M   }.

Another Example: Equal a’s and b’s L = {w  { a, b }*: # a (w) = # b (w)} G = {{S, a, b }, { a, b }, R, S}, where: R = {S  a S b S  b S a S  SS S   }.

Another Example: Unequal a’s and b’s L = { a n b m : n  m } G = (V, , R, S), where V = { a, b, S, },  = { a, b }, R = S  A/* more a ’s than b ’s S  B/* more b ’s than a ’s A  a /* at least one extra a generated A  a A A  a A b B  b /* at least one extra b generated B  B b B  a B b

Proving the Correctness of a Grammar A n B n = { a n b n : n  0} G = ({S, a, b }, { a, b }, R, S), R = { S  a S b S   } ● Prove that G generates only strings in L. ● Prove that G generates all the strings in L.

Context-free grammars do more than just describe the set of strings in a language. as in the case of regular grammar They provide a way of assigning an internal structure to the strings that they derive. This structure is important because it, in turn, provides the starting point for assigning meanings to the strings. Derivations and Parse Trees

Recognizing Strings Regular languages: We care about recognizing patterns and taking appropriate actions.

Context free languages: We care about structure. E E +E id E * E 3 id id 5 7 Structure

To capture structure, we must capture the path we took through the grammar. Derivations do that. Example: S   S  SS S  (S) S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() But the order of rule application doesn’t matter. Derivations

A parse tree is an (ordered, rooted) tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by nonterminals of the grammar, while the leaf nodes are labeled by terminals of the grammar. A program that produces such trees is called a parser. Parse trees capture the essential grammatical structure of a string. Parse Trees

S  SS  (S)S  ((S))S  (())S  (())(S)  (())() S  SS  (S)S  ((S))S  ((S))(S)  (())(S)  (())() Parse Trees S S S ( S ) ( S )   A parse tree may correspond to multiple derivations. Parse trees are useful precisely because they capture the important structural facts about a derivation but throw away the details of the order in which the nonterminals were expanded. The order has no bearing on the structure we wish to assign to a string.

Parse Trees A parse tree, derived by a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of   {  }, ● The root node is labeled S, ● Every other node is labeled with some element of: V – , and ● If m is a nonleaf node labeled X and the children of m are labeled x 1, x 2, …, x n, then R contains the rule X  x 1, x 2, …, x n

S NP VP Nominal VNP Adjs N Nominal AdjN the smart cat smells chocolate Structure in English It is clear from the tree that the sentence is not about cat smells or smart cat smells.

Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates. Which set is bigger?

Another Example on Expansion Order Look at the parse tree for the smart cat smells chocolate From the parse tree, we cannot tell which of the following is used in derivation: S  NP VP  the Nominal VP  S  NP VP  NP V NP  Again, p arse trees capture the important structural facts about a derivation but throw away the details of the nonterminal expansion order The order has no bearing on the structure we wish to assign to a string.

Derivation Order However, the expansion order is important for algorithms. Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation. A leftmost derivation is one in which, at each step, the leftmost nonterminal in the working string is chosen for expansion. A rightmost derivation is one in which, at each step, the rightmost nontermial in the working string is chosen for expansion.

Derivations of The Smart Cat the smart cat smells chocolate A left-most derivation is: S  NP VP  the Nominal VP  the Adjs N VP  the Adj N VP  the smart N VP  the smart cat VP  the smart cat V NP  the smart cat smells NP  the smart cat smells Nominal  the smart cat smells N  the smart cat smells chocolate A right-most derivation is: S  NP VP  NP V NP  NP V Nominal  NP V N  NP V chocolate  NP smells chocolate  the Nominal smells chocolate  the Adjs N smells chocolate  the Adjs cat smells chocolate  the Adj cat smells chocolate  the smart cat smells chocolate

Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree. Even a very simple grammar can be highly ambiguous S   S  SS S  (S)

Regular ExpressionRegular Grammar ( a  b )* a ( a  b )*S  a S  b S choose a from ( a  b )S  a S choose a from ( a  b )S  a T choose a T  a T  b choose a T  a T choose a from ( a  b )T  b T choose a from ( a  b ) Regular Expressions and Grammars Can be Ambiguous Too

Why Is Ambiguity a Problem? With regular languages, for most applications, we do not care about assigning internal structure to strings. With context-free languages, we usually do care about internal structure because, given a string w, we want to assign meaning to w. We almost always want to assign a unique such meaning. It is generally difficult, if not impossible, to assign a unique meaning without a unique parse tree.

An Arithmetic Expression Grammar E  E + E E  E  E E  (E) E  id

Inherent Ambiguity Some languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous. Example: L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0}. Every string in L has either (or both) the same number of a’s and b’s or the same number of b’s and c’s.

Inherent Ambiguity L = { a n b n c m : n, m  0}  { a n b m c m : n, m  0} One grammar for L has the rules: S  S 1 | S 2 S 1  S 1 c | A/* Generate all strings in { a n b n c m }. A  a A b |  S 2  a S 2 | B/* Generate all strings in { a n b m c m }. B  b B c |  Consider any string of the form a n b n c n. They have two distinct derivations, one through S 1 and the other through S 2 It is possible to prove that L is inherently ambiguous: given any grammar G that generates L, there is at least on string with two derivations in G.

Inherent Ambiguity Both of the following problems are undecidable: Given a context-free grammar G, is G ambiguous? Given a context-free language L, is L inherently ambiguous?

But We Can Often Reduce Ambiguity We can get rid of: ●  rules like S  , ● rules with symmetric right-hand sides A grammar is ambiguous if it is both left and right recursive. Fix: remove right recursion S  SS orE  E + E ● rule sets that lead to ambiguous attachment of optional postfixes. dangling else problem: else goes with which if? if E then if E then S else S

Proving that G is Unambiguous G is unambiguous iff, for all strings w, at every point in a leftmost or rightmost derivation of w, only one rule in G can be applied. In other words, A grammar G is unambiguous iff every string derivable in G has a single leftmost (or rightmost) derivation.

Getting rid of ambiguity, but not at the expense of losing useful parse trees. In the arithmetic expression example and dangling else case, we were willing to force one interpretation. Sometimes, this is not acceptable. Chris likes the girl with a cat. Chris shot the bear with a rifle. Going Too Far

Comparing Regular and Context-Free Languages Regular LanguagesContext-Free Languages ● regular exprs. or ● regular grammars ● context-free grammars ● recognize ● parse

A Testimonial Also, you will be happy to know that I just made use of the context-free grammar skills I learned in your class! I am working on Firefox at IBM this summer and just found an inconsistency between how the native Firefox code and a plugin by Adobe parse SVG path data elements. In order to figure out which code base exhibits the correct behavior I needed to trace through the grammar Thanks to your class I was able to determine that the bug is in the Adobe plugin. Go OpenSource!

Context-Free Grammars Normal Forms

Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks. ● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.

Normal Forms If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with. Normal forms are designed to do just that. Various ones have been developed for various purposes. Examples: ● Clause form for logical expressions to be used in resolution theorem proving ● Disjunctive normal form for database queries so that they can be entered in a query by example grid. ● Various normal forms for grammars to support specific parsing techniques.

Clause Form for Logical Expressions Given: [1]  x ((Roman(x)  know(x, Marcus))  (hate(x, Caesar)   y (  z (hate(y, z)  thinkcrazy(x, y))))) [2] Roman(Paulus) [3]  hate(Paulus, Caesar) [4] hate(Flavius, Marcus) [5]  thinkcrazy(Paulus, Flavius) Prove:  know(Paulus, Marcus) Sentence [1] in clause form:  Roman(x)   know(x, Marcus)  hate(x, Caesar)   hate(y, z)  thinkcrazy(x, y)

Disjunctive Normal Form for Queries The Query by Example (QBE) grid: (category = fruit and supplier = Aabco) (category = fruit or category = vegetable) CategorySupplierPrice CategorySupplierPrice fruitAabco

Disjunctive Normal Form for Queries (category = fruit or category = vegetable) CategorySupplierPrice fruit vegetable

Disjunctive Normal Form for Queries (category = fruit and supplier = Aabco) or (category = vegetable and supplier = Botrexco) CategorySupplierPrice fruitAabco vegetableBotrexco

Disjunctive Normal Form for Queries But what about: (category = fruit or category = vegetable) and (supplier = A or supplier = B) This isn’t right: CategorySupplierPrice fruitAabco vegetableBotrexco

Disjunctive Normal Form for Queries (category = fruit or category = vegetable) and (supplier = Aabco or supplier = Botrexco) becomes (category = fruit and supplier = Aabco) or (category = fruit and supplier = Botrexco) or (category = vegetable and supplier = Aabco) or (category = vegetable and supplier = Botrexco) CategorySupplierPrice fruitAabco fruitBotrexco vegetableAabco vegetableBotrexco

Normal Forms for Grammars Chomsky Normal Form, in which all rules are of one of the following two forms: ● X  a, where a  , or ● X  BC, where B and C are elements of V - . Advantages: ● Parsers can use binary trees. ● Exact length of derivations is known: S AB AAB B aab BB b b

Normal Forms for Grammars Greibach Normal Form, in which all rules are of the following form: ● X  a , where a   and   (V -  )* Property: In every derivation that is produced by a GNF grammar, precisely one terminal is generated for each rule application. This property is useful in several ways: Every derivation of a string w contains |w| rule applications. It is straightforward to define a decision procedure to determine whether w can be generated by a GNF grammar. GNF grammars can easily be converted to pushdown automata with no  -transitions. This is useful because such PDAs are guaranteed to halt.

Normal Forms Exist Theorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar G C such that: L(G C ) = L(G) – {  }. Proof: The proof is by construction. Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar G G such that: L(G G ) = L(G) – {  }. Proof: The proof is also by construction.

Stochastic Context-Free Grammars Recall in Chapter 5, we introduced the idea of stochastic FSM: an NDFSM whose transitions have been augmented with probabilities that describe some phenomenon that we want to model. We can apply the same idea to context-free grammar. We can add probabilities to grammar rules and create a stochastic context-free grammar, also called probabilistic context-free grammar.

Stochastic Context-Free Grammars A stochastic context-free grammar G is a quintuple: (V, , R, S, D): ● V is the rule alphabet, ●  is a subset of V, ● R is a finite subset of (V -  )  V*, ● S can be any element of V - , ● D is a function from R to [0 - 1]. D assigns a porbability to each rule in R. D must satisfy the requirement that, for every nonterminal symbol X, the sum of the probabilities associated with all rules whose left-hand side is X must be 1.

Stochastic Context-Free Example PalEven = {ww R : w  { a, b }*}. But now suppose we want to describe a special case: ● a ’s occur three times as often as b ’s do. G = ({S, a, b }, { a, b }, R, S, D): S  a S a [.72] S  b S b [.24] S   [.04]

Stochastic Context-Free Grammars The probability of a particular parse tree t: Let C be the collection (in which duplicates count) of rules r that were used to generate t. Then: Example: S  a S a [.72] S  b S b [.24] S   [.04] S  a S a  aa S aa  aab S baa  aabbaa =