Normal forms and parsing

Slides:



Advertisements
Similar presentations
CSCI 3130: Formal Languages and Automata Theory Tutorial 5
Advertisements

Theorem 7.16: Every CFL is a member of P Proof: Let G be a Chomsky normal form grammar for language L. The following O(n 3 ) algorithm decides whether.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Closure Properties of CFL's
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Fall 2004COMP 3351 Simplifications of Context-Free Grammars.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
Chapter 4 Normal Forms for CFGs Chomsky Normal Form n Defn A CFG G = (V, , P, S) is in chomsky normal form if each rule in G has one of.
CS5371 Theory of Computation
CS 310 – Fall 2006 Pacific University CS310 Pushdown Automata Sections: 2.2 page 109 October 11, 2006.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
Costas Buch - RPI1 Simplifications of Context-Free Grammars.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 CSCI 3130: Formal Languages and Automata Theory Tutorial 4 Hung Chun Ho Office: SHB 1026 Department of Computer Science & Engineering.
Tutorial CSC3130 : Formal Languages and Automata Theory Tu Shikui ( ) SHB 905, Office hour: Thursday 2:30pm-3:30pm
1 Module 32 Chomsky Normal Form (CNF) –4 step process.
CONVERTING TO CHOMSKY NORMAL FORM
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Normal Forms for Context-Free Grammars Definition: A symbol X in V  T is useless in a CFG G=(V, T, P, S) if there does not exist a derivation of the form.
Context-Free Grammars – Chomsky Normal Form Lecture 16 Section 2.1 Wed, Sep 26, 2007.
CSCI 2670 Introduction to Theory of Computing September 21, 2004.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
CSCI 3130: Formal languages and automata theory Tutorial 4 Chin.
1 Simplification of Context-Free Grammars Some useful substitution rules. Removing useless productions. Removing -productions. Removing unit-productions.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1 Chapter 6 Simplification of CFGs and Normal Forms.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Limitations.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
Exercises on Chomsky Normal Form and CYK parsing
1 Context Free Grammars Xiaoyin Wang CS 5363 Spring 2016.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Polynomial.
Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon
David Rodriguez-Velazquez CS – 6800 Summer I
Context-Free Grammars: an overview
LR(k) grammars The Chinese University of Hong Kong Fall 2009
Chomsky Normal Form CYK Algorithm
Ambiguity Parsing algorithms
Complexity and Computability Theory I
7. Properties of Context-Free Languages
CSC312 Automata Theory Grammatical Format Chapter # 13 by Cohen
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Polynomial time The Chinese University of Hong Kong Fall 2010
CSCI 5832 Natural Language Processing
Pushdown automata and CFG ↔ PDA conversions
LR(1) grammars The Chinese University of Hong Kong Fall 2010
NORMAL FORMS FDP ON THEORY OF COMPUTING
7. Properties of Context-Free Languages
Decidable and undecidable languages
Parsers for programming languages
More undecidable languages
LR(1) grammars The Chinese University of Hong Kong Fall 2011
Pushdown automata The Chinese University of Hong Kong Fall 2011
The Cocke-Kasami-Younger Algorithm
LR(k) grammars The Chinese University of Hong Kong Fall 2008
Automata, Grammars and Languages
Normal Forms for Context-free Grammars
Context-Free Languages
Presentation transcript:

Normal forms and parsing The Chinese University of Hong Kong Fall 2008 CSC 3130: Automata theory and formal languages Normal forms and parsing Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130

Testing membership and parsing Given a grammar How can we know if a string x is in its language? If so, can we reconstruct a parse tree for x? S → 0S1 | 1S0S1 | T T → S | e

First attempt S → 0S1 | 1S0S1 | T T → S |  x = 00111 Maybe we can try all possible derivations: S 0S1 00S11 01S0S11 0T1 when do we stop? 1S0S1 10S10S1 ... T S 

Problems S → 0S1 | 1S0S1 | T T → S |  x = 00111 How do we know when to stop? S 0S1 00S11 01S0S11 when do we stop? 0T1 1S0S1 10S10S1 ...

Problems S → 0S1 | 1S0S1 | T T → S |  x = 01011 Idea: Stop derivation when length exceeds |x| Not right because of -productions We might want to eliminate -productions too S  0S1  01S0S11  01S011  01011 1 3 7 6 5

Problems S → 0S1 | 1S0S1 | T T → S |  x = 00111 Loops among the variables (S → T → S) might make us go forever We might want to eliminate such loops

Unit productions A unit production is a production of the form where A1 and A2 are both variables Example A1 → A2 grammar: unit productions: S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S T R

Removal of unit productions If there is a cycle of unit productions delete it and replace everything with A1 Example A1 → A2 → ... → Ak → A1 S T  S → 0S1 | 1S0S1 | T T → S | R |  R → 0SR S → 0S1 | 1S0S1 S → R |  R → 0SR  R T is replaced by S in the {S, T} cycle

Removal of unit productions For other unit productions, replace every chain by productions A1 → ,... , Ak →  Example A1 → A2 → ... → Ak →  S → 0S1 | 1S0S1 | R |  R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR

Removal of -productions A variable N is nullable if there is a derivation How to remove -productions (except from S) N   * Find all nullable variables N1, ..., Nk For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it If S is nullable, add the special production S →    

Example Find the nullable variables grammar nullable variables S  ACD A a B   C  ED |  D  BC | b E  b B C D Find all nullable variables N1, ..., Nk 

Finding nullable variables To find nullable variables, we work backwards First, mark all variables A s.t. A   as nullable Then, as long as there are productions of the form where all of A1,…, Ak are marked as nullable, mark A as nullable A → A1… Ak

Eliminating e-productions D  C S  AD D  B D  e S  AC S  A C  E S  ACD A a B   C  ED |  D  BC | b E  b nullable variables: B, C, D For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it 

Recap After eliminating e-productions and unit productions, we know that every derivation doesn’t shrink in length and doesn’t go into cycles Exception: S →  We will not use this rule at all, except to check if e  L Note e-productions must be eliminated before unit productions S  a1…ak * where a1, …, ak are terminals

Example: testing membership unit, e-prod eliminate S → 0S1 | 1S0S1 | T T → S |  S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1 x = 00111 S 01, 101 0S1 0011, 01011 00S11 strings of length ≥ 6 only strings of length ≥ 6 10S1 10011, strings of length ≥ 6 1S01 10101, strings of length ≥ 6 1S0S1 only strings of length ≥ 6

Algorithm 1 for testing membership We can now use the following algorithm to check if a string x is in the language of G Eliminate all e-productions and unit productions If x = e and S → , accept; else delete S →  Let X := S While some new production P can be applied to X Apply P to X If X = x, accept If |X| > |x|, backtrack If no more productions can be applied to X, reject     

Practical limitations of Algorithm I Previous algorithm can be very slow if x is long There is a faster algorithm, but it requires that we do some more transformations on the grammar G = CFG of the java programming language x = code for a 200-line java program algorithm might take about 10200 steps!

Chomsky Normal Form A grammar is in Chomsky Normal Form if every production (except possibly S → e) is of the type Conversion to Chomsky Normal Form is easy: A → BC or A → a A → BcDE A → BCDE C → c A → BX1 X1 → CX2 X2 → DE break up sequences with new variables replace terminals with new variables C → c

Exercise Convert this CFG into Chomsky Normal Form: S   |ADDA A  a C  c D  bCb

Algorithm 2 for testing membership SAC S  AB | BC A  BA | a B  CC | b C  AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up

Parse tree reconstruction b AC B SA SC – SAC S  AB | BC A  BA | a B  CC | b C  AB | a x = baaba Tracing back the derivations, we obtain the parse tree

Cocke-Younger-Kasami algorithm Input: Grammar G in CNF, string x = x1…xk table cells For i = 1 to k If there is a production A  xi Put A in table cell ii For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A  BC where B is in cell sj and C is in cell jt Put A in cell st 1k … … 12 23 11 22 kk x1 x2 … xk 1 s j t k b Cell ij remembers all possible derivations of substring xi…xj