Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

Slides:



Advertisements
Similar presentations
CSC411Artificial Intelligence 1 Chapter 3 Structures and Strategies For Space State Search Contents Graph Theory Strategies for Space State Search Using.
Advertisements

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Yasuhiro Fujiwara (NTT Cyber Space Labs)
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.
Branch and Bound Optimization In an exhaustive search, all possible trees in a search space are generated for comparison At each node, if the tree is optimal.
Branch & Bound Algorithms
PARSING WITH CONTEXT-FREE GRAMMARS
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Efficient Statistical Pruning for Maximum Likelihood Decoding Radhika Gowaikar Babak Hassibi California Institute of Technology July 3, 2003.
1 Structures and Strategies for State Space Search 3 3.0Introduction 3.1Graph Theory 3.2Strategies for State Space Search 3.3Using the State Space to Represent.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Search by partial solutions.  nodes are partial or complete states  graphs are DAGs (may be trees) source (root) is empty state sinks (leaves) are complete.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Chap. 7, Syntax-Directed Compilation J. H. Wang Nov. 24, 2015.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Organization of Programming Languages Meeting 3 January 15, 2016.
Parsing with Context Free Grammars. Slide 1 Outline Why should you care? Parsing Top-Down Parsing Bottom-Up Parsing Bottom-Up Space (an example) Top -
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
PRESENTED BY: PEAR A BHUIYAN
Backtracking And Branch And Bound
Recitation 5 2/4/09 ML in Phylogeny
Parsing Unrestricted Text
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Probabilistic Parsing
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006

2 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

3 What is Natural Language Parsing? Provides a sentence with syntactic information by hierarchically clustering and labeling its constituents. A constituent is a group of one or more words that function together as a unit.

4 What is Natural Language Parsing? Provides a sentence with syntactic information by hierarchically clustering and labeling its constituents. A constituent is a group of one or more words that function together as a unit.

5 Why Parse Sentences? Syntactic structure is useful in –Speech Recognition –Machine Translation –Language Understanding Word Sense Disambiguation (ex. “bottle”) Question-Answering Document Summarization

6 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

7 Data Driven Parsing Parsing = Grammar + Algorithm Probabilistic Context-Free Grammar P( children=[Determiner, Adjective, Noun] | parent=NounPhrase )

8 Find the maximum likelihood parse tree from all grammatically valid candidates. The probability of a parse tree is the product of all its grammar rule (constituent) probabilities. The number of grammatically valid parse trees increases exponentially with the length of the sentence. Data Driven Parsing

9 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

10 Hypergraphs A directed hypergraph can facilitate dynamic programming (Klein and Manning, 2001). A hyperedge connects a set of tail nodes to a set of head nodes. Standard EdgeHyperedge

11 Hypergraphs

12 The CYK Algorithm Separates the hypergraph into “levels” Exhaustively traverses every hyperedge, level by level

13 The A* Algorithm Maintains a priority queue of traversable hyperedges Traverses best-first until a complete parse tree is found Priority Queue

14 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

15 High(er) Accuracy Parsing Modify the Grammar to include more context (Grand) Parent Annotation (Johnson, 1998) P( children=[Determiner, Adjective, Noun] | parent=NounPhrase, grandParent=Sentence )

16 Increased Search Space Original Grammar Parent Annotated Grammar

17 Increased Search Space Original Grammar Parent Annotated Grammar

18 Increased Search Space Original Grammar Parent Annotated Grammar

19 Increased Search Space Original Grammar Parent Annotated Grammar

20 Increased Search Space Original Grammar Parent Annotated Grammar

21 Grammar Comparison Exact Inference with the CYK algorithm becomes intractable. Most algorithms using Lexical models resort to greedy search strategies. We want to find the globally optimal (Viterbi) parse tree for these high- accuracy models efficiently.

22 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

23 Coarse-to-Fine Efficiently find the optimal parse tree of a large, context-enriched model (Fine) by following hyperedges suggested by solutions of a simpler model (Coarse). To evaluate the feasibility of Coarse-to-Fine, we use –Coarse = WSJ –Fine = Parent

24 Increased Search Space Coarse Grammar Fine Grammar

25 Coarse-to-Fine Build Coarse hypergraph

26 Coarse-to-Fine Choose a Coarse hyperedge

27 Coarse-to-Fine Replace the Coarse hyperedge with Fine hyperedge (modifies probability)

28 Coarse-to-Fine Propagate probability difference

29 Coarse-to-Fine Repeat until optimal parse tree has only Fine hyperedges

30 Upper-Bound Grammar Replacing a Coarse hyperedge with a Fine hyperedge can increase or decrease its probability. Once we have found a parse tree with only Fine hyperedges, how can we be sure it is optimal? Modify the probability of Coarse grammar rules to be an upper- bound on the probability of Fine grammar rules. where N is the set of non-terminals and is a grammar rule.

31 Outline What is Natural Language Parsing? Data Driven Parsing Hypergraphs and Parsing Algorithms High Accuracy Parsing Coarse-to-Fine Empirical Results

32 Results

33 Summary & Future Research Coarse-to-Fine is a new exact inference algorithm to efficiently traverse a large hypergraph space by using the solutions of simpler models. Full probability propagation through the hypergraph hinders computational performance. –Full propagation is not necessary; lower-bound of log 2 (n) operations. Over 95% reduction in search space compared to baseline CYK algorithm. –Should prune even more space with higher-accuracy (Lexical) models.

34 Thanks

35 Choosing a Coarse Hyperedge Top-Down vs. Bottom-Up

36 Top-Down vs. Bottom-Up Top-Down Traverses more hyperedges Hyperedges are closer to the root Requires less propagation (1/2) Bottom-Up Traverses less hyperedges Hyperedges are near the leaves (words) and shared by many trees True probability of trees isn’t know at the beginning of CTF

37 Coarse-to-Fine Motivation Optimal Coarse Tree Optimal Fine Tree