CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.

Slides:



Advertisements
Similar presentations
March 1, 2009Dr. Muhammed Al_mulhem1 ICS482 Parsing Chapter 13 Muhammed Al-Mulhem March 1, 2009.
Advertisements

Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 4705 Basic Parsing with Context-Free Grammars.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
1 Syntax Sudeshna Sarkar 25 Aug Sentence-Types Declaratives: A plane left S -> NP VP Imperatives: Leave! S -> VP Yes-No Questions: Did the plane.
中文信息处理 Chinese NLP Lecture 9.
1 CKY and Earley Algorithms Chapter 13 October 2012 Lecture #8.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Chapter 13: Parsing with Context-Free Grammars Heshaam Faili University of Tehran.
11 Syntactic Parsing. Produce the correct syntactic parse tree for a sentence.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Speech and Language Processing SLP Chapter 13 Parsing.
1 Statistical methods in NLP Diana Trandabat
CSC 594 Topics in AI – Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Programming Languages Translator
CS60057 Speech &Natural Language Processing
Basic Parsing with Context Free Grammars Chapter 13
CSCI 5832 Natural Language Processing
CS 388: Natural Language Processing: Syntactic Parsing
Parsing and More Parsing
Parsing I: CFGs & the Earley Parser
David Kauchak CS159 – Spring 2019
NLP.
Presentation transcript:

CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011

Roadmap Motivation: Parsing (In) efficiency Dynamic Programming Cocke-Kasami-Younger Parsing Algorithm Chomsky Normal Form Conversion CKY Algorithm Parsing by tabulation

Repeated Work Top-down and bottom-up parsing both lead to repeated substructures Globally bad parses can construct good subtrees But overall parse will fail Require reconstruction on other branch No static backtracking strategy can avoid Efficient parsing techniques require storage of shared substructure Typically with dynamic programming Example: a flight from Indianapolis to Houston on TWA

Bottom-Up Search

Dynamic Programming Challenge: Repeated substructure -> Repeated work Insight: Global parse composed of parse substructures Can record parses of substructures Dynamic programming avoids repeated work by tabulating solutions to subproblems Here, stores subtrees

Parsing w/Dynamic Programming Avoids repeated work Allows implementation of (relatively) efficient parsing algorithms Polynomial time in input length Typically cubic ( ) or less Several different implementations Cocke-Kasami-Younger (CKY) algorithm Earley algorithm Chart parsing

Chomsky Normal Form (CNF) CKY parsing requires grammars in CNF Chomsky Normal Form All productions of the form: A -> B C, or A -> a However, most of our grammars are not of this form E.g., S -> Wh-NP Aux NP VP Need a general conversion procedure Any arbitrary grammar can be converted to CNF

Grammatical Equivalence Weak equivalence: Recognizes same language Yields different structure Strong equivalence Recognizes same languages Yields same structure CNF is weakly equivalent

CNF Conversion Three main conditions: Hybrid rules: INF-VP -> to VP Unit productions: A -> B Long productions: A -> B C D

CNF Conversion Hybrid rule conversion: Replace all terminals with dummy non-terminals E.g., INF-VP -> to VP INF-VP -> TO VP; TO -> to Unit productions: Rewrite RHS with RHS of all derivable non-unit productions If and B -> w, then add A -> w

CNF Conversion Long productions: Introduce new non-terminals and spread over rules S -> Aux NP VP S -> X1 VP; X1 -> Aux NP For all non-conforming rules, Convert terminals to dummy non-terminals Convert unit productions Binarize all resulting rules

CKY Parsing Cocke-Kasami-Younger parsing algorithm: (Relatively) efficient bottom-up parsing algorithm based on tabulating substring parses to avoid repeated work Approach: Use a CNF grammar Build an (n+1) x (n+1) matrix to store subtrees Upper triangular portion Incrementally build parse spanning whole input string

Dynamic Programming in CKY Key idea: For a parse spanning substring [i,j], there exists some k such there are parses spanning [i,k] and [k,j] We can construct parses for whole sentence by building up from these stored partial parses So, To have a rule A -> B C in [i,j], We must have B in [i,k] and C in [k,j], for some i<k<j CNF grammar forces this for all j>i+1

CKY Given an input string S of length n, Build table (n+1) x (n+1) Indexes correspond to inter-word positions W.g., 0 Book 1 That 2 Flight 3 Cells [i,j] contain sets of non-terminals of ALL constituents spanning i,j [j-1,j] contains pre-terminals If [0,n] contains Start, the input is recognized

CKY Algorithm

Is this a parser?

CKY Parsing Table fills: Column-by-column Left-to-right Bottom-to-top Why? Necessary info available (below and left) Allows online sentence analysis Works across input string as it arrives

CKY Table Book the flight through Houston

Filling CKY cell

From Recognition to Parsing Limitations of current recognition algorithm:

From Recognition to Parsing Limitations of current recognition algorithm: Only stores non-terminals in cell Not rules or cells corresponding to RHS

From Recognition to Parsing Limitations of current recognition algorithm: Only stores non-terminals in cell Not rules or cells corresponding to RHS Stores SETS of non-terminals Can’t store multiple rules with same LHS

From Recognition to Parsing Limitations of current recognition algorithm: Only stores non-terminals in cell Not rules or cells corresponding to RHS Stores SETS of non-terminals Can’t store multiple rules with same LHS Parsing solution: All repeated versions of non-terminals

From Recognition to Parsing Limitations of current recognition algorithm: Only stores non-terminals in cell Not rules or cells corresponding to RHS Stores SETS of non-terminals Can’t store multiple rules with same LHS Parsing solution: All repeated versions of non-terminals Pair each non-terminal with pointers to cells Backpointers

From Recognition to Parsing Limitations of current recognition algorithm: Only stores non-terminals in cell Not rules or cells corresponding to RHS Stores SETS of non-terminals Can’t store multiple rules with same LHS Parsing solution: All repeated versions of non-terminals Pair each non-terminal with pointers to cells Backpointers Last step: construct trees from back-pointers in [0,n]

Filling column 5

CKY Discussion Running time:

CKY Discussion Running time: where n is the length of the input string

CKY Discussion Running time: where n is the length of the input string Inner loop grows as square of # of non-terminals Expressiveness:

CKY Discussions Running time: where n is the length of the input string Inner loop grows as square of # of non-terminals Expressiveness: As implemented, requires CNF Weakly equivalent to original grammar Doesn’t capture full original structure Back-conversion?

CKY Discussions Running time: where n is the length of the input string Inner loop grows as square of # of non-terminals Expressiveness: As implemented, requires CNF Weakly equivalent to original grammar Doesn’t capture full original structure Back-conversion? Can do binarization, terminal conversion Unit non-terminals require change in CKY

Parsing Efficiently With arbitrary grammars Earley algorithm Top-down search Dynamic programming Tabulated partial solutions Some bottom-up constraints