Context-free grammars are a subset of context-sensitive grammars

Slides:

Advertisements

Similar presentations

How to convert a left linear grammar to a right linear grammar

Advertisements

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.

SIMPLIFYING GRAMMARS Definition: A useless symbol of a context-free

How to find and remove unproductive rules in a grammar Roger L. Costello May 1, 2014 New! How to find and remove unreachable rules in a grammar.

FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY

Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.

Transforming Context-Free Grammars to Chomsky Normal Form 1 Roger L. Costello April 12, 2014.

Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.

CS5371 Theory of Computation

Foundations of (Theoretical) Computer Science Chapter 2 Lecture Notes (Section 2.1: Context-Free Grammars) David Martin With some.

Context-Free Grammars Lecture 7

Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.

79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.

Normal forms for Context-Free Grammars

Chapter 3: Formal Translation Models

How to Convert a Context-Free Grammar to Greibach Normal Form

MA/CSSE 474 Theory of Computation

Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.

Finite State Machines Data Structures and Algorithms for Information Processing 1.

Language Translation Principles Part 1: Language Specification.

Chapter 12: Context-Free Languages and Pushdown Automata

1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.

Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.

Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.

Formal Grammars Denning, Sections 3.3 to 3.6. Formal Grammar, Defined A formal grammar G is a four-tuple G = (N,T,P,  ), where N is a finite nonempty.

::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.

Pushdown Automata (PDA) Intro

Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.

The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.

Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.

Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:

PART I: overview material

Languages & Grammars. Grammars  A set of rules which govern the structure of a language Fritz Fritz The dog The dog ate ate left left.

So far... A language is a set of strings over an alphabet. We have defined languages by: (i) regular expressions (ii) finite state automata Both (i) and.

Discrete Structure Li Tak Sing( 李德成 ) Lectures

Lecture # 5 Pumping Lemma & Grammar

CMSC 330: Organization of Programming Languages Context-Free Grammars.

CS 321 Programming Languages and Compilers Lectures 16 & 17 Introduction to Formal Languages Regular Languages Lexical Analysis.

Introduction to Parsing

Section 12.4 Context-Free Language Topics

Lecture 11 Theory of AUTOMATA

1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.

1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.

Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.

CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.

CS 203: Introduction to Formal Languages and Automata

1 Chapter 6 Simplification of CFGs and Normal Forms.

Formal Languages and Grammars

Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.

LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.

CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.

1 A well-parenthesized string is a string with the same number of (‘s as )’s which has the property that every prefix of the string has at least as many.

Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.

Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.

Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.

Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.

Lecture 17: Theory of Automata:2014 Context Free Grammars.

20 G M aaba acba aaba.. What is it about? Models of Language Generation Models of Language Recognition.

Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.

Describing Syntax and Semantics Chapter 3: Describing Syntax and Semantics Lectures # 6.

Context-Free Grammars: an overview

Chomsky Normal Form CYK Algorithm

Complexity and Computability Theory I

Context-Free Languages

Intro to Data Structures

CHAPTER 2 Context-Free Languages

Midterm (Models of Computation, Fall, 2000)

Presentation transcript:

Context-free grammars are a subset of context-sensitive grammars Roger L. Costello February 16, 2014

Objective: Show that Type 2 is a subset of Type 1

Grammars: a brief refresher A grammar is a concise way to specify a language. A language is a set of strings. Example: This is an (infinite) language: {a, aa, aaa, …} A grammar consists of a series of (rewrite) rules. Each rule has a left-hand side and a right-hand side. The two sides are separate by an arrow (→).

Sample Grammar The below grammar consists of five rules. The grammar generates the language: {ab, aab, abb, aaab, aaabb, aaabbb, …} S → AB A → aA A → a B → bB B → b

Generate a string from the grammar S → AB A → aA A → a B → bB B → b Here is a sequence of rules to generate: aab S → AB → aAB → aaB → aab

Rules with “alternates” Grammar S → AB A → aA A → a B → bB B → b Notice in the above grammar there are two rules for A. Ditto for B. The two rules may be combined: the right-hand side will consist of a series of alternatives, separated by a vertical bar ( | ): Grammar Equivalent Grammar S → AB A → aA A → a B → bB B → b S → AB A → aA | a B → bB | b combine A’s combine B’s

“Zero” or more a’s and b’s Grammar S → AB A → aA | a B → bB | b The above grammar requires every string in the language contain at least one a and at least one b. What grammar would generate the language: zero or more a’s followed by zero of more b’s?

Generate an empty string Question: What grammar would generate the language: zero or more a’s followed by zero of more b’s? Answer: Use rules that generate an empty string (a string of length zero). We denote an empty string by: ε This grammar generates the desired language: Grammar S → AB A → aA | ε B → bB | ε

Generate both empty and non-empty This rule for A generates both empty and non-empty: A → aA | ε empty non-empty

How to read a rule A → aA | ε Read as: A may be replaced by aA or by an empty string. The arrow (→) is read as: may be replaced by.

Terminal versus non-terminal symbols A → aA | ε Non-terminal symbols; these are symbols that may be replaced (further expanded). Terminal symbols; these are symbols that may not be replaced.

Notation Non-terminal symbols: denoted by uppercase letters. Example: Q1, Q2, A, P, S denote non-terminal symbols Terminal symbols: denoted by lowercase letters. Example: a, b, c denote terminal symbols

Context-sensitive grammars Every rule has this form: context context Q1AQ2 → Q1PQ2 A is replaced by P

Context-sensitive grammars Every rule has this form: Q1AQ2 → Q1PQ2 That is, some symbol A is rewritten to some symbol P while the surrounding (context) symbols Q1 and Q2 remain unchanged. Note: P can be multiple symbols.

Context-sensitive grammars Every rule has this form: Q1AQ2 → Q1PQ2 That is, some symbol A is rewritten to some symbol P while the surrounding (context) symbols Q1 and Q2 remain unchanged. Note: P can be multiple symbols. A must be a non-terminal. Q1, Q2, and P are either non-terminals or terminals.

Context-sensitive grammars Every rule has this form: Q1AQ2 → Q1PQ2 That is, some symbol A is rewritten to some symbol P while the surrounding (context) symbols Q1 and Q2 remain unchanged. Note: P can be multiple symbols. A must be a non-terminal. Q1, Q2, and P are either non-terminals or terminals. P must not be empty (ε).

Context-sensitive grammars Every rule has this form: Q1AQ2 → Q1PQ2 That is, some symbol A is rewritten to some symbol P while the surrounding (context) symbols Q1 and Q2 remain unchanged. Note: P can be multiple symbols. A must be a non-terminal. Q1, Q2, and P are either non-terminals or terminals. P must not be empty (ε). None of the rules lead to empty except possibly for a rule S → ε, in which case S does not occur on the right-hand side of any rules.

Sample context-sensitive rule empty context S → abc S is replaced by abc

Sample context-sensitive rule empty context S → aSQ S is replaced by aSQ

Sample context-sensitive rule bQc → bbcc Q is replaced by bc

Sample context-sensitive rule empty right context cQ → cc Q is replaced by c

Sample context-sensitive rule empty left context cc → Qc c is replaced by Q

Swap c and Q cQ → cc cc → Qc Collectively, the two rules swap c and Q.

Sample context-sensitive grammar The language generated by the below context-sensitive grammar is: anbncn Grammar for anbncn → abc | aSQ bbcc cc Qc S bQc cQ 1. 2. 3. 4.

Generate a string from the grammar Grammar for anbncn Derivation of a3b3c3 → abc | aSQ bbcc cc Qc S bQc cQ 1. 2. 3. 4. S (start) aSQ (rule 1) aaSQQ (rule 1) aaabcQQ (rule 1) aaabccQ (rule 3) aaabQcQ (rule 4) aaabbccQ (rule 2) aaabbccc (rule 3) aaabbQcc (rule 4) aaabbbccc (rule 2) generated string

Next on the agenda We have seen what context-sensitive grammars look like, and the restrictions imposed on them (e.g., the P in the right-hand side can’t be empty). Now let’s turn our attention to context-free grammars.

Context-free grammars Every rule has this form: empty context A → P A is replaced by P

Context-free grammars Every rule has this form: A → P That is, some symbol A is rewritten to some symbol P. A never has context – it is context-free! P can be multiple symbols

Context-free grammars Every rule has this form: A → P That is, some symbol A is rewritten to some symbol P. A never has context – it is context-free! P can be multiple symbols. A must be a non-terminal. P is any sequence of non-terminals and terminals.

Context-free grammars Every rule has this form: A → P That is, some symbol A is rewritten to some symbol P. A never has context – it is context-free! P can be multiple symbols. A must be a non-terminal. P is any sequence of non-terminals and terminals. P may be empty (ε).

Next on the agenda Now we have seen context-sensitive grammars and context-free grammars. Now it’s time to compare them.

Compare the two types of grammars Context-sensitive Context-free empty context context context Q1AQ2 → Q1PQ2 A → P A is replaced by P A is replaced by P A context-free rule is a context-sensitive rule without context, so context-free is a subset of context-sensitive; right?

Key Point The P in a context-sensitive rule cannot be empty whereas the P in a context-free rule can be empty. So it is not an apples-to-apples comparison and we cannot claim that context-free is a subset of context-sensitive.

Context-free has an additional value Context-sensitive Q1 A Q2 Q1 P Q2 Context-free A P ε

What is needed? What do we need to make the claim that a context-free rule is a special case (subset) of a context-sensitive rule?

Context-free without an empty P If we can show that, for every context-free grammar there is an equivalent grammar that doesn’t have an empty P, then we will have an apples-to-apples comparison.

Need to show this A P A P’ Context-free rule with ε ε transform to an equivalent grammar Equivalent context-free rule without ε A P’

2-step strategy Use a systematic procedure (i.e., algorithm) to find all the non-terminal symbols that generate empty (ε). Modify the grammar rules: eliminate the non-terminals found in step 1 and then modify the rules that use the eliminated non-terminals.

A generates empty A → ε

A generates empty and non-empty A → ε | a

B generates empty A → ε B → A

Procedure Find the non-terminals that directly generate empty, i.e., those of this form: X → ε Then find the non-terminals which have on their right-hand side exclusively symbols found in step 1, e.g., Y → X Then find the non-terminals which have on their right-hand side exclusively symbols found in step 1 or step 2 Repeat until no new non-terminals are found.

Closure algorithm The procedure described on the previous slide is called a closure algorithm. We will find all the non-terminal symbols that produce empty (ε) by using a closure algorithm.

2 steps to identify the non-terminals Our closure algorithm identifies non-terminals that generate empty using these two steps: Initialization: If a rule has ε on its right-hand side, then the rule’s left-hand side non-terminal generates empty. Inference rule: If all the right-hand side members of a rule produce empty, then the rule’s left-hand side non-terminal produces empty.

Which non-terminals generate empty? Let’s use the closure algorithm on the below grammar. The closure algorithm finds all the non-terminals that generate empty. → A B C ε a A A D d S B D Goal: Find the non-terminals that generate empty (ε)

Round 1 (Initialization) Rule Produces empty? S → A B S → C A → ε A produces empty A → a B → A C → A D D → d

Round 2 (inference) Rule Produces empty? S → A B S → C A → ε A produces empty A → a B → A B produces empty (because A produces empty) C → A D D → d

Round 3 (inference) Rule Produces empty? S → A B S produces empty (because A and B produce empty) S → C A → ε A produces empty A → a B → A B produces empty (because A produces empty) C → A D D → d

Round 4 Round 4 adds no additional members to the set. Rule Produces empty? S → A B S produces empty (because A and B produce empty) S → C A → ε A produces empty A → a B → A B produces empty (because A produces empty) C → A D D → d

Non-terminals that generate empty → A B C ε a A A D d S B D Non-terminals that generate empty: {A, B, S}

Make the grammar context-sensitive-compliant Our goal is to modify the grammar so that it is a context-sensitive grammar. It will be both context-sensitive and context-free Original Modified → A B C ε a A A D d S B D Grammar that conforms to the rules of context-sensitive grammars.

Remove rules with ε on the right-hand side Recall that context-sensitive grammars do not allow empty rules, except the start symbol may be empty. So we need to remove the empty rules: → A B C ε a A A D d S B D Remove this rule

Remove references to empty non-terminals Suppose a grammar has this empty rule: X → ε Remove it, per the previous slide. The following rule has X on its right-hand side: Y → X Z So we must remove the X: Y → Z

Non-terminal could have empty and non-empty rules Suppose X has an empty and non-empty rule: X → ε | x The X in the following rule could generate either empty or x: Y → X Z Recall that we will remove X → ε so there must be one rule for Y that omits X and one that does not: Y → Z | X Z X is non-empty X is empty

Recap Consider this rule: Q → V N Suppose the closure algorithm determines that V is in the set of non-terminals that generate empty. If V is empty then Q generates N, so we need this rule: Q → N Suppose V also has a non-empty rule. If V is non-empty then Q generates V N, so we need this rule: Q → V N Here is Q’s modified rule: Q → N | V N

Resume modifying our grammar Now that we understand how to modify the rules, let’s resume making context-sensitive-compliant our sample grammar.

Modify the rule for C → A B C ε a A A D d S B D On the right-hand side of this rule is A. A generates empty so we erase A. However, A also generates a so C could generate a D. Here is the modified rule: C → D | A D

Modify the rule for S → A B C ε a A A D d S B D Both symbols on the right-hand side of this rule generate empty. A generates empty and it also generates a. B generates A. So this rule is capable of generating ε, a and aa. Here is the modified rule: S → A | A B

Here is the modified grammar Original Modified → A B C ε a A A D d S B D → A | A B C a A D | A D d S B D

No empty rules Modified → A | A B C a A D | A D d S B D No empty rules, as required by context-sensitive grammars – Yea!

Lost the ability to generate empty The modified grammar does not generate empty → A | A B C a A D | A D d S B D But the original grammar does generate empty → A B C ε a A A D d S B D We need to add this rule: S → ε

Here’s the final, modified grammar → ε A | A B C a A D | A D d S B D

Equivalent grammars Original Modified → ε A | A B C a A D | A D d S B

It’s context-sensitive-compliant Modified → ε A | A B C a A D | A D d S B D There are no empty rules except for the start symbol (S). Therefore, it is a context-sensitive grammar. It’s also context-free-compliant

How we modified the grammar to be context-sensitive-compliant Using a closure algorithm, we found all the non-terminals that generate empty. We modified the rules so that none of them generated empty: If a rule’s right-hand side is ε, delete it. If a rule’s right-hand side contains a non-terminal that is in the set produced by the closure algorithm, create a rule without the non-terminal. If the non-terminal also has a non-empty rule, create a rule with the non-terminal. If the original grammar generates empty, add this rule: S → ε

Context-free is a subset of context-sensitive We now have a procedure for converting every context-free grammar into an equivalent context-free grammar that complies with the context-sensitive rules. Therefore, context-free grammars are a restricted form of context-sensitive grammars. Therefore, context-free grammars are a subset of context-sensitive grammars.

Type 2 is a subset of Type 1

Type 2 is a “proper” subset of Type 1 Not only is Type 2 a subset of Type 1, it is a proper subset. This means that there are grammars in Type 1 that are not in Type 2: anbncn

Language generated by a grammar A grammar generates a language; that is, a set of strings. For example, this simple grammar: S → ε | aS generates this set of strings: {ε, a, aa, aaa, …} That is the language generated by the grammar. Notice that ε is an element of the language (recall that ε is a string of length zero).

ε-detecting procedure It is useful to know if ε is an element of the language generated by a grammar. We need a procedure that can take any arbitrary grammar and determine if ε is an element of the language generated by the grammar: grammar procedure ε is (not) an element of the language generated by the grammar

Implementing the ε-detecting procedure grammar This can be implemented using the closure algorithm. procedure ε is (not) an element of the language generated by the grammar

Here’s the implementation grammar closure algorithm set of non-terminals that generate empty Is the start symbol in the set? ε is (not) an element of the language generated by the grammar

Recap of the implementation Recall the closure algorithm: it produces the set of non-terminals that generate empty. For our sample grammar it produced: {A, B, S} The start symbol (S) generates ε. Therefore, ε is an element of the language generated by the grammar.

Decision procedure We now have a procedure for deciding, for any arbitrary context-free grammar, if the empty string is a member of the language generated by the grammar. This procedure is called a decision procedure.

Big accomplishments In these slides we have accomplished much. shown that Type 2 (context-free) grammars are a subset of Type 1 (context-sensitive) grammars created a decision procedure that is capable of deciding, for any arbitrary grammar, if ε is an element of the language generated by the grammar.

Formalize the closure algorithm The next slide describes the closure algorithm very succinctly. I find great beauty and elegance in it. There’s no fluff in it; I call it “pure knowledge”.

Closure algorithm (formal) U1 is the set of all the empty non-terminals: U1 = {X | X → ε} U2 is the set of all the empty non-terminals (that is, U1) plus all the non-terminals that have a right-hand side containing exclusively non-terminals from U1: U2 = U1 ∪ {X | X → P for some P containing exclusively non-terminals from U1} Ui+1 is the set of all the non-terminals from Ui plus all the non-terminals that have a right-hand side containing exclusively non-terminals from Ui: Ui+1 = Ui ∪ {X | X → P for some P containing exclusively non-terminals from Ui} There is some index k for which Uk+1 = Uk. That is, additional rounds do not result in finding more non-terminals that produce empty. The set of non-terminals that generate empty is Uk.

Comments, questions I hope you found this mini-tutorial helpful. If you found any typos or errors in the material, please notify me. If you found any parts confusing, please notify me. Email me at: roger.costello@gmail.com Thanks!