CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.

Slides:



Advertisements
Similar presentations
CSCI 3130: Formal Languages and Automata Theory Tutorial 5
Advertisements

Theorem 7.16: Every CFL is a member of P Proof: Let G be a Chomsky normal form grammar for language L. The following O(n 3 ) algorithm decides whether.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Closure Properties of CFL's
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Conversion of a Chomsky normal form grammar to Greibach normal form
Western Michigan University CS6800 Advanced Theory of Computation Spring 2014 By Abduljaleel Alhasnawi & Rihab Almalki.
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
CS 310 – Fall 2006 Pacific University CS310 Pushdown Automata Sections: 2.2 page 109 October 11, 2006.
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
Tutorial CSC3130 : Formal Languages and Automata Theory Tu Shikui ( ) SHB 905, Office hour: Thursday 2:30pm-3:30pm
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
CONVERTING TO CHOMSKY NORMAL FORM
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Context Free Grammar. Introduction Why do we want to learn about Context Free Grammars?  Used in many parsers in compilers  Yet another compiler-compiler,
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Chapter 6 Simplification of CFGs and Normal Forms.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Exercises on Chomsky Normal Form and CYK parsing
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
LR(k) grammars The Chinese University of Hong Kong Fall 2009
Chomsky Normal Form CYK Algorithm
Ambiguity Parsing algorithms
Complexity and Computability Theory I
7. Properties of Context-Free Languages
CSC312 Automata Theory Grammatical Format Chapter # 13 by Cohen
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Polynomial time The Chinese University of Hong Kong Fall 2010
LR(1) grammars The Chinese University of Hong Kong Fall 2010
7. Properties of Context-Free Languages
Parsers for programming languages
LR(1) grammars The Chinese University of Hong Kong Fall 2011
Pushdown automata The Chinese University of Hong Kong Fall 2011
The Cocke-Kasami-Younger Algorithm
LR(k) grammars The Chinese University of Hong Kong Fall 2008
Normal forms and parsing
Normal Forms for Context-free Grammars
Context-Free Languages
Presentation transcript:

CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms and parsing Fall 2009

Testing membership and parsing Given a grammar How can we know if a string x is in its language? If so, can we obtain a parse tree for x ? Can we tell if the parse tree is unique? S → 0S1 | 1S0S1 | T T → S | e

First attempt Maybe we can try all possible derivations: S → 0S1 | 1S0S1 | T T → S |  x = S0S1 1S0S1 T 00S11 01S0S11 0T1 S  10S10S1... when do we stop?

Problems How do we know when to stop? S → 0S1 | 1S0S1 | T T → S |  x = S0S1 1S0S1 00S11 01S0S11 0T1 10S10S1... when do we stop?

Problems Idea: Stop derivation when length exceeds |x| Not right because of  -productions We might want to eliminate  -productions too S → 0S1 | 1S0S1 | T T → S |  x = S  0S1  01S0S11  01S011 

Problems Loops among the variables ( S → T → S ) might make us go forever We want to eliminate such loops S → 0S1 | 1S0S1 | T T → S |  x = 00111

Removal of  -productions A variable N is nullable if there is a derivation How to remove  -productions (except from S ) Find all nullable variables N 1,..., N k For every production of the form A →  N i , add another production A →  If N i →  is a production, remove it If S is nullable, add the special production S →  N  N   *   

Example Find the nullable variables S  ACD A  a B   C  ED |  D  BC | b E  b BCD nullable variablesgrammar Find all nullable variables N 1,..., N k 

Finding nullable variables To find nullable variables, we work backwards –First, mark all variables A s.t. A   as nullable –Then, as long as there are productions of the form where all of A 1,…, A k are marked as nullable, mark A as nullable A → A 1 … A k

Eliminating  -productions S  ACD A  a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For every production of the form A →  N i , add another production A →  If N i →  is a production, remove it  D  C S  AD D  B D   S  AC S  A C  E

Dealing with loops A unit production is a production of the form where A 1 and A 2 are both variables Example A 1 → A 2 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR grammar:unit productions: ST R

Removal of unit productions If there is a cycle of unit productions delete it and replace everything with A 1 Example A 1 → A 2 →... → A k → A 1 S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR ST R S → 0S1 | 1S0S1 S → R |  R → 0SR T is replaced by S in the {S, T} cycle  

Removal of unit productions For other unit productions, replace every chain by productions A 1 → ,..., A k →  Example A 1 → A 2 →... → A k →  S → R → 0SR is replaced by S → 0SR, R → 0SR S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR

Recap After eliminating  -productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  –We will not use this rule at all, except to check if  L Note –  -productions must be eliminated before unit productions S  a 1 …a k where a 1, …, a k are terminals *

Example: testing membership S → 0S1 | 1S0S1 | T T → S |  x = S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 S 01, S1 1S01 1S0S , strings of length ≥ , strings of length ≥ 6 unit,  -prod eliminate only strings of length ≥ 6 0S1 0011, S11 strings of length ≥ 6 only strings of length ≥ 6

Algorithm 1 for testing membership How to check if a string x ≠  is in L(G) Eliminate all  -productions and unit productions Let X := S While some new rule R can be applied to X Apply R to X If X = x, you have found a derivation for x If |X| > |x|, backtrack If no more rules can be applied to X, x is not in L    

Practical limitations of Algorithm I This method can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about steps!

Chomsky Normal Form A grammar is in Chomsky Normal Form if every production (except possibly S →  ) is of the type Conversion to Chomsky Normal Form is easy: A → BC A → a or A → BcDE replace terminals with new variables A → BCDE C → c break up sequences with new variables A → BX 1 X 1 → CX 2 X 2 → DE C → c

Exercise Convert this CFG into Chomsky Normal Form: S   |ADDA A  a C  c D  bCb

Algorithm 2 for testing membership S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Idea: We generate each substring of x bottom up abbaa ACBB BSA SC B–B SAC–

Parse tree reconstruction S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba abbaa ACACBBACACACAC BSASASASCSC B–B SAC– Tracing back the derivations, we obtain the parse tree

Cocke-Younger-Kasami algorithm For cells in last row If there is a production A  x i Put A in table cell ii For cells st in other rows If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st x 1 x 2 … x k 11 22kk …… 1k1k table cells s jtk 1 Input: Grammar G in CNF, string x = x 1 …x k Cell ij remembers all possible derivations of substring x i …x j