SIMPLIFYING GRAMMARS Definition: A useless symbol of a context-free

Slides:



Advertisements
Similar presentations
REMOVING LEFT RECURSION AND INDIRECT LEFT RECURSION
Advertisements

1 Pushdown Automata (PDA) Informally: –A PDA is an NFA-ε with a stack. –Transitions are modified to accommodate stack operations. Questions: –What is a.
Closure Properties of CFL's
Context-Free Grammars
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
SIMPLIFYING GRAMMARS Definition: A useless symbol of a context-free grammar is one which does not occur in the derivation of any sentence of that grammar.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Context Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
104 Closure Properties of Regular Languages Regular languages are closed under many set operations. Let L 1 and L 2 be regular languages. (1) L 1  L 2.
Normal forms for Context-Free Grammars
How to Convert a Context-Free Grammar to Greibach Normal Form
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
1 Background Information for the Pumping Lemma for Context-Free Languages Definition: Let G = (V, T, P, S) be a CFL. If every production in P is of the.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Context-free Grammars
CS 3813 Introduction to Formal Languages and Automata Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on.
Languages and Grammars MSU CSE 260. Outline Introduction: E xample Phrase-Structure Grammars: Terminology, Definition, Derivation, Language of a Grammar,
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
نظریه زبان ها و ماشین ها فصل دوم Context-Free Languages دانشگاه صنعتی شریف بهار 88.
1 Homework #7 (Models of Computation, Spring, 2001) Due: Section 1; April 16 (Monday) Section 2; April 17 (Tuesday) 2. Covert the following context-free.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Chapter 5 Context-Free Grammars
Grammars CPSC 5135.
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
ELIMINATING LEFT RECURSIVENESS. Abbreviation. “cfg” stands for “context free grammar” Definition. A cfg is left recursive if it contains a production.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Regular Grammars Chapter 7 1. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
Context Free Grammars.
Lecture 11 Theory of AUTOMATA
Chapter 6 Simplification of Context-free Grammars and Normal Forms These class notes are based on material from our textbook, An Introduction to Formal.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Syntax Analyzer (Parser)
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
How YACC is constructed. How Yacc works To construct a parsing machine for arithmetic expressions, a special case considered to simplify the account of.
Exercises on Chomsky Normal Form and CYK parsing
Chomsky Normal Form.
1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Theory of Languages and Automata By: Mojtaba Khezrian.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
Context-free grammars
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Complexity and Computability Theory I
Even-Even Devise a grammar that generates strings with even number of a’s and even number of b’s.
Context-Free Languages
Definition: Let G = (V, T, P, S) be a CFL
Chapter 6 Simplification of Context-free Grammars and Normal Forms
Properties of Context-Free Languages
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Answer Questions about Exam2 problems
Presentation transcript:

SIMPLIFYING GRAMMARS Definition: A useless symbol of a context-free grammar is one which does not occur in the derivation of any sentence of that grammar.   For example: G→ RT R→ Ra T→b Here R is useless. 

Clearly a symbol is useless if and only if either: we cannot derive any string containing it from the goal symbol and/or we cannot derive a terminal string from that symbol   Notation: a) is expressed by saying that the symbol is not reachable from the goal symbol. b) is expressed by saying that the symbol does not derive a terminal string

Algorithm: To find all those symbols that are not reachable from the goal symbol.  1) Make a list of all the grammar symbols, all initially unflagged. 2) Flag the goal symbol. 3) Go through the grammar from the 1st production to the last. If A→ x1x2…xn is one of these productions and A is flagged, then flag x1,x2,…,xn (those ones not already flagged). 4)  Where any new symbols flagged during the iteration of step 3? If so, repeat step 3 again, otherwise stop. Any symbol that has not been flagged at this stage is not reachable from the goal symbol.

EXAMPLE   Grammar 1 z → b e a → a e | e b → c e | a f c → c f d → f d is not reachable  f √ c d is not reachable

Grammar 2   Z → E + T E → E | S + F | T F → F | F P | P P → G G → G | G G | F T→ T * i | i Q → E | E + F | T | S S → i Q is not reachable

Grammar 3 G → A Q → P R P→Q Q, P, R are not reachable   G → A Q → P R P→Q Q, P, R are not reachable

Algorithm: To determine which symbols do not derive a terminal string. 1)  Make a fresh list of all the symbols, initially unflagged. 2)  Flag all the terminals. 3)  Go through the grammar from the first production to the last. If A→ x1x2…xn is any such production, then if x1,x2,…,xn are all flagged, flag A. 4) Were any new symbols flagged in step 3? If so, go back to step 3. If not, all symbols not flagged at this stage do not derive a terminal string.

TRY THE ABOVE ALGORITHM ON GRAMMARS 1- 3 ABOVE

Definitions: 1) A α means you can derive α from A or α=A 2)  A symbol A is said to vanish if A  ε 3)  A production of the form χ→ε is called an ε-production Note that the textbook uses λ to denote the empty string, whereas these slides employ ε for this purpose

Algorithm: To determine which symbols of a grammar vanish. 1)   Make a list of symbols, initially unflagged. 2)  Flag all the left hand sides of ε-productions. 3)  Go through the grammar from 1st production to last. If A→ x1x2…xn is any such production, then if x1,x2,…,xn are all flagged, then flag A. Were any new symbols flagged in step 3? If so, go back to step 3, else stop. The flagged symbols are those which vanish.

Example: Try the algorithm on the following Grammar. Grammar G4 A → b Y D | A Y c Y → E F | ε D → g h i F → N O | Y N N → ε O → Y N E → Y O N Y

Defns: An - production is one of the form A -> . If A, in this case, is the goal symbol, the production is referred to as a null goal production Theorem: For every cfg G, there exists a cfg G’, such that L(G’) = L(G), and G’ has no -productions with exception that if   L(G), then G’ contains a null goal production.  

Proof. G’ can be formed from G as follows: 1. Discard all the -productions. 2. For each production of G, add to the grammar all possible productions that can be formed from it by omitting from its rhs some subset of those symbols (if any) that vanish.. 3. Remove all productions with useless symbols. 4. If the goal symbol of G vanishes, add a null goal production.

Example 1 G -> AVw A -> aA | a V -> rUcW |  U ->  W ->    First of all, determine which symbols vanish: U, V, W.

1)  Remove -productions, gives: G -> AVw A -> aA | a V -> rUcW 2) Considering  G -> AVw in step 2 of the algorithm, we add to the grammar G -> Aw Considering V -> rUcW, we add V -> rc V -> rUc V -> rcW 3)  W, U are now useless symbols, so leaving out all productions with W, U, we get: G -> AVw | Aw A -> aA | a V -> rc

EXAMPLE. Provide a grammar equivalent to the one below but without ε-productions S → ABaC A → BC B → b | ε C → D | ε D → d Try working this out for yourself, before consulting the answer on the next slide. Note carefully that the symbol A is one of those that vanishes.

ANSWER S → ABaC | ABa | AaC | Aa | BaC |Ba | aC | a A → BC | B | C B → b C → D D → d

Defn. A unit production of a grammar is one of the form A -> B where A, B are both non-terminals.   Theorem. For any context-free grammar G,  a cfg G’ s.t. L(G’) = L(G) and G’ does not contain any unit productions.

Proof. G’ can be formed from G as follows Eliminate -productions from G to form G* (with possibly a null goal symbol) If A is the left hand side of a unit production and B is any symbol that can be derived from A, and B ->  is any production with B as left hand side where  is not a single non-terminal, then add to grammar A -> . By step 1, any derivation of B from A must consist entirely of a sequence of non-terminals. Do step 2 for all symbols which are the left hand side of a unit production

To find all single symbols that can be derived from a symbol A, consider the derivation tree in which no symbol occurs more than once, e.g.:   A   B D E   C F N M    If say M B, we do not include it, as B already occurs in the tree. Hence the depth of the tree is < = the number of unit productions

3. Now discard all unit productions

EXAMPLE Consider the grammar: E → E + T | T T → T * F | F F → ( E ) | a Since E => T and T → T * F, we add to the grammar E → T * F and since E => F and F → ( E ) | a, we add E → ( E ) and E → a Also since T => F, we add T → ( E ) | a

Discarding all unit productions, then gives us: E → E + T | T * F | ( E ) | a T → T * F | ( E ) | a F → ( E ) | a

EXAMPLE 3. “Remove” unit productions from: S → Aa | B B → A | bb A → a | bc | B ANSWER S → Aa | bb | a | bc since S => B and S => A B → bb | a | bc since B => A A → a | bc | bb since A => B But B is a useless symbol, so discard the production involving B

EXAMPLE 4. “Remove” unit productions from S → Aa | bb | a | bc | B B → bb | a | bc A → a | bc | bb

ANSWER S → Aa | bb | bc | a B → bb | a | bc A → a | bc | bb Again, B is a useless symbol, and so the productions involving it should be discarded

Defn. A nice context free grammar is one: a)  without useless symbols, b)  without -production except possible for a null goal production, and c)   without unit productions Notation. cfg stands for context free grammar, and ncfg stands for nice context free grammar Corollary. For every cfg G,  a ncfg G’, such that L(G’) = L(G).