CYK )Cocke-Younger-Kasami) Parsing Algorithm

Slides:



Advertisements
Similar presentations
The Pumping Lemma for CFL’s
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Lecture # 8 Chapter # 4: Syntax Analysis. Practice Context Free Grammars a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
The Cocke-Younger-Kasami Algorithm*
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
The CYK Algorithm David Rodriguez-Velazquez CS – 6800 Summer I
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Introduction to Computability Theory
CS5371 Theory of Computation
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
1 CSC 3130: Automata theory and formal languages Tutorial 4 KN Hung Office: SHB 1026 Department of Computer Science & Engineering.
Decidable and undecidable problems deciding regular languages and CFL’s Undecidable problems.
Transparency No. P2C5-1 Formal Language and Automata Theory Part II Chapter 5 The Pumping Lemma and Closure properties for Context-free Languages.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Normal Forms for Context-free Grammars. 2 Chomsky Normal Form All productions have form: variable and terminal.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Normal forms for Context-Free Grammars
January 23, 2015CS21 Lecture 81 CS21 Decidability and Tractability Lecture 8 January 23, 2015.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Context-Free Grammars Chapter 3. 2 Context-Free Grammars and Languages n Defn A context-free grammar is a quadruple (V, , P, S), where  V is.
Compilation 2007 Context-Free Languages Parsers and Scanners Michael I. Schwartzbach BRICS, University of Aarhus.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Lecture 21: Languages and Grammars. Natural Language vs. Formal Language.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
CONVERTING TO CHOMSKY NORMAL FORM
Context-free Grammars Example : S   Shortened notation : S  aSaS   | aSa | bSb S  bSb Which strings can be generated from S ? [Section 6.1]
TM Design Universal TM MA/CSSE 474 Theory of Computation.
CSCI 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Ambiguity.
Grammars CPSC 5135.
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
CS 44 – Jan. 29 Expression grammars –Associativity √ –Precedence CFG for entire language (handout) CYK algorithm –General technique for testing for acceptance.
Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Re-enter Chomsky More about grammars. 2 Parse trees S  A B A  aA | a B  bB | b Consider L = { a m b n | m, n > 0 } (one/more a ’s followed by one/more.
Context-Free and Noncontext-Free Languages Chapter 13.
Grammars A grammar is a 4-tuple G = (V, T, P, S) where 1)V is a set of nonterminal symbols (also called variables or syntactic categories) 2)T is a finite.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Transparency No. 1 Formal Language and Automata Theory Homework 5.
Exercises on Chomsky Normal Form and CYK parsing
CSCI 3130: Formal languages and automata theory Andrej Bogdanov The Chinese University of Hong Kong Decidable.
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
1 Statistical methods in NLP Diana Trandabat
David Rodriguez-Velazquez CS – 6800 Summer I
Properties of Context-Free Languages
7. Properties of Context-Free Languages
LR(1) grammars The Chinese University of Hong Kong Fall 2010
Parsing Costas Busch - LSU.
Normal forms and parsing
Normal Forms for Context-free Grammars
Presentation transcript:

CYK )Cocke-Younger-Kasami) Parsing Algorithm دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر CYK )Cocke-Younger-Kasami) Parsing Algorithm سید محمد حسین معطر پردازش زبان طبیعی

Parsing Algorithms CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing: Recognition - deciding the membership in the language: Parsing – Recognition+ producing a parse tree for it Parsing is more “difficult” than recognition? (time complexity) Ambiguity - an input may have exponentially many parses

Parsing Algorithms Parsing General CFLs vs. Limited Forms Efficiency: Deterministic (LR) languages can be parsed in linear time A number of parsing algorithms for general CFLs require O(n3) time Asymptotically best parsing algorithm for general CFLs requires O(n2.37), but is not practical Utility - why parse general grammars and not just CNF? Grammar intended to reflect actual structure of language Conversion to CNF completely destroys the parse structure

CYK )Cocke-Younger-Kasami) One of the earliest recognition and parsing algorithms The standard version of CYK can only recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF Harder to understand Based on a “dynamic programming” approach: Build solutions compositionally from sub-solutions Store sub-solutions and re-use them whenever necessary Uses the grammar directly (no PDA is used) Recognition version: decide whether S == > w ?

CYK Algorithm The CYK algorithm for the membership problem is as follows: Let the input string be a sequence of n letters a1 ... an. Let the grammar contain r terminal and nonterminal symbols R1 ... Rr, and let R1 be the start symbol. Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. For each i = 1 to n For each unit production Rj -> ai, set P[i,1,j] = true. For each i = 2 to n -- Length of span For each j = 1 to n-i+1 -- Start of span For each k = 1 to i-1 -- Partition of span For each production RA -> RB RC If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true If P[1,n,1] is true Then string is member of language Else string is not member of language

CYK Pseudocode On input x = x1x2 … xn : for (i = 1 to n) //create middle diagonal for (each var. A) if(Axi) add A to table[i-1][i] for (d = 2 to n) // d’th diagonal for (i = 0 to n-d) for (k = i+1 to i+d-1) for(each var. B in table[i][k]) for(each var. C in table[k][k+d]) if(ABC) add A to table[i][k+d] return Stable[0][n] ? ACCEPT : REJECT

CYK Algorithm this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk. Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

CYK Algorithm for Deciding Context Free Languages Q: Consider the grammar G given by S  e | AB | XB T  AB | XB X  AT A  a B  b Is x = aaabb in L(G ) Is x = aaabbb in L(G )

CYK Algorithm for Deciding Context Free Languages The algorithm is “bottom-up” in that we start with bottom of derivation tree. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b

CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B

CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T

CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X

CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b A A A B B S,T T X S,T

CYK Algorithm for Deciding Context Free Languages Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b REJECT! a a a b b A A A B B S,T T X S,T X

CYK Algorithm for Deciding Context Free Languages Now look at aaabbb : S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b

CYK Algorithm for Deciding Context Free Languages 1) Write variables for all length 1 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B

CYK Algorithm for Deciding Context Free Languages 2) Write variables for all length 2 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T

CYK Algorithm for Deciding Context Free Languages 3) Write variables for all length 3 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X

CYK Algorithm for Deciding Context Free Languages 4) Write variables for all length 4 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T

CYK Algorithm for Deciding Context Free Languages 5) Write variables for all length 5 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b a a a b b b A A A B B B S,T T X S,T X

CYK Algorithm for Deciding Context Free Languages 6) Write variables for all length 6 substrings. S  e | AB | XB T  AB | XB X  AT A  a B  b S is included so aaabbb accepted! a a a b b b A A A B B B S,T T X S,T X S,T

CYK Algorithm for Deciding Context Free Languages Can also use a table for same purpose. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb 1:aaabbb 2:aaabbb 3:aaabbb 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 1. Variables for length 1 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A 1:aaabbb 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 2. Variables for length 2 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb 2:aaabbb S,T 3:aaabbb B 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 3. Variables for length 3 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb X 2:aaabbb S,T 3:aaabbb B 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 4. Variables for length 4 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - 1:aaabbb X S,T 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 5. Variables for length 5 substrings. end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - X 1:aaabbb S,T 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

CYK Algorithm for Deciding Context Free Languages 6. Variables for aaabbb. ACCEPTED! end at start at 1: aaabbb 2: aaabbb 3: aaabbb 4: aaabbb 5: aaabbb 6: aaabbb 0:aaabbb A - X S,T 1:aaabbb 2:aaabbb 3:aaabbb B 4:aaabbb 5:aaabbb

Parsing results We keep the results for every wij in a table. Note that we only need to fill in entries up to the diagonal – the longest substring starting at i is of length n-i+1

Constructing parse tree we need to construct parse trees for string w: Idea: Keep back-pointers to the table entries that we combine At the end - reconstruct a parse from the back-pointers This allows us to find all parse trees

Ambiguity Efficient Representation of Ambiguities Local Ambiguity Packing : a Local Ambiguity - multiple ways to derive the same substring from a non-terminal All possible ways to derive each non-terminal are stored together When creating back-pointers, create a single back-pointer to the “packed” representation Allows to efficiently represent a very large number of ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse trees by following the back-pointers.

References Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp. 139-141 “CYK algorithm ” , Wikipedia, the free encyclopedia A representation by Zeph Grunschlag