Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

Albert Gatt LIN3022 Natural Language Processing Lecture 8.

Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.

Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Intro to NLP - J. Eisner1 The Expectation Maximization (EM) Algorithm … continued!

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.

Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Natural Language Processing Artificial Intelligence CMSC February 28, 2002.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.

1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.

Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:

Supertagging CMSC Natural Language Processing January 31, 2006.

Intro to NLP - J. Eisner1 Parsing Tricks.

Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University of Seoul) Chao-Yue Lai (UC Berkeley) Slav Petrov (Google Research) Kurt Keutzer (UC Berkeley)

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Speech and Language Processing SLP Chapter 13 Parsing.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Natural Language Processing Vasile Rus

1 Statistical methods in NLP Diana Trandabat

CSC 594 Topics in AI – Natural Language Processing

PRESENTED BY: PEAR A BHUIYAN

Probabilistic and Lexicalized Parsing

CS 388: Natural Language Processing: Syntactic Parsing

Zhifei Li and Sanjeev Khudanpur Johns Hopkins University

CSCI 5832 Natural Language Processing

Probabilistic Parsing

David Kauchak CS159 – Spring 2019

Presentation transcript:

Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010

2 Talk Outline Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

3 Intro to Syntactic Parsing Hierarchically cluster and label syntactic word groups (constituents) Provides structure and meaning

4 Intro to Syntactic Parsing Why Parse? –Machine Translation Synchronous Grammars –Language Understanding Semantic Role Labeling Word Sense Disambiguation Question-Answering Document Summarization –Language Modeling Long-distance dependencies –Because it’s fun

5 Intro to Syntactic Parsing What you (usually) need to parse –Supervised data: A treebank of sentences with annotated parse structure WSJ treebank: 50k sentences –A Binarized Probabilistic Context Free Grammar induced from a treebank –A parsing algorithm Example grammar rules: –S  NP VP prob=0.2 –NP  NP NN prob=0.1 –NP  JJ NN prob=0.06 –Binarize: VP  PP VB NN VP   VB NN prob=0.5

6 Parsing Accuracy Non- terminals Grammar Size Sec / Sent F-Score Baseline2,50064, % Parent Annotation (Johnson)6,00075, % Manual Refinement (Klein)15,00086% Latent Variable (Petrov)1,1004,000, % Lexical (Collins, Charniak)LotsImplicit89% Accuracy Improvements from grammar refinement –Split original non-terminal categories (Subject-NP vs. Object-NP) –Accuracy at the cost of speed Solution space becomes impractical to exhaustively search

7 Berkeley Grammar & Parser Petrov et al. automatically split non-terminals using latent variables Example grammar rules: –S_3  NP_12 VP_6 prob=0.2 –NP_12  NP_9 NN_7 prob=0.1 –NN_7  house prob=0.06 Berkeley Coarse-to-Fine parser uses six latent variable grammars –Parse input sentence once with each grammar –Posterior probabilities from pass n used to prune pass n+1 –Must know mapping between non-terminals from different grammars Grammar(2) { NP_1, NP_6 }  Grammar(3) { NP_2, NP_9, NP_14 }

8 Research Goals Our Research Goals –Find good solutions very quickly in this LARGE grammar space (not ML) –Algorithms should be grammar agnostic –Consider practical implications (speed, memory) This talk: Exponential Decay Pruning –Beam-Search parsing for efficient search –Searches the final grammar space directly –Balance overhead of targeted exploration (best-first) vs. memory and cache benefits of local exploration (CYK)

9 Parsing Algorithms: CYK Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

10 Parsing Algorithms: CYK Exhaustive population of all parse trees permitted by the grammar Dynamic Programming algorithm give Maximum Likelihood solution

11 Parsing Algorithms: CYK Fill in cells for SPAN=1,2,3,4,… Grammar S  NP VP (p=0.7) NP  NP NP (p=0.2) NP  NP VP (p=0.1) NN  court (p=0.4) VB  court (p=0.1) ….

12 Parsing Algorithms: CYK Grammar S  NP VP (p=0.7) NP  NP NP (p=0.2) NP  NP VP (p=0.1) NN  court (p=0.4) VB  court (p=0.1) …. N iterations through the grammar at each chart cell to consider all possible midpoints

13 Parsing Algorithms: Best-First Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

14 Parsing Algorithms: Best-First Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13 Frontier is a Priority Queue of all potentially buildable entries Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

15 Parsing Algorithms: Best-First Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13 Frontier is a Priority Queue of all potentially buildable entries Add best entry from Frontier; expand Frontier with all possible chart + grammar extensions

16 Parsing Algorithms: Best-First How do we rank Frontier entries? –Figure-of-Merit (FOM) –FOM = Inside (grammar) * Outside (heuristic) –Caraballo and Charniak, 1997 (C&C) –Problem with comparisons of different spans Grammar S  NP VP (p=0.7) VB  court (p=0.1) …. Frontier PQ [try][shooting,defendant] VP  VB NP fom=28.1 [try,shooting][defendant] VP  VB NP fom=14.7 [Juvenile][court] NP  ADJ NN fom=13

17 Parsing Algorithms: Beam-Search Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

18 Parsing Algorithms: Beam-Search Beam-Search: Best of both worlds CKY exhaustive traversal (bottom-up) At each chart cell –Compute FOM for all possible cell entries –Rank entries in a (temporary) local priority queue –Only populate the cell with the n-best entries (beam-width) Less Memory –Not storing all cell entries (CYK) nor bad frontier entries (Best-First) Runs Faster –Search space is pruned (unlike CYK) and don’t need to maintain global priority queue (Best-First) Eliminates problem of global cell entry comparison

19 Parsing Algorithms: Beam-Search Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

20 Exponential Decay Pruning What is the optimal beam-width per chart cell? –Common solutions: Relative score difference from highest ranking entry Global maximum number of candidates Exponential Decay Pruning –Adaptive beam-width conditioned on chart cell information –How reliable is our Figure-of-Merit per chart cell? –Plotted rank of Gold entry against span and sentence size FOM is more reliable for larger spans –Less dependent on outside estimate FOM is less reliable for short sentences –Atypical grammatical structure (in WSJ?)

21 Exponential Decay Pruning Confidence in FOM can be modeled with the Exponential Decay function –N 0 = Global beam-width maximum –n = sentence length –s = span length (number of words covered) –λ = tuning parameter

22 Exponential Decay Pruning Confidence in FOM can be modeled with the Exponential Decay function

23 Intro to Syntactic Parsing –Why Parse? Parsing Algorithms –CYK –Best-First –Beam-Search Exponential Decay Pruning Results

24 Results Wall Street Journal treebank –Train: Sections 2-21 (40k sentences) –Dev: Section 24 (1.3k sentences –Test: Section 23 (2.4k sentences) Berkeley SM6 Latent Variable Grammar Figure-of-Merit from Caraballo and Charniak, 1997 (C&C) Also applied Cell Closing Constraints (Roark and Hollingshead, 2008) External comparison with Berkeley Coarse-to-Fine parser using same grammar

25 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK Best-FirstInside Best-FirstC&C Beam-SearchInsideConstant Beam-SearchInsideDecay Beam-SearchC&CConstant Beam-SearchC&CDecay Beam-SearchC&CConstantYes Beam-SearchC&CDecayYes Figure-of-Merit makes a big difference Fast solution, but significant accuracy degradation

26 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK Best-FirstInside Best-FirstC&C Beam-SearchInsideConstant Beam-SearchInsideDecay Beam-SearchC&CConstant Beam-SearchC&CDecay Beam-SearchC&CConstantYes Beam-SearchC&CDecayYes Using the inside probability for the FOM –95% speed reduction with Beam-Search over Best-First –Exponential Decay adds additional 47% speed reduction

27 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK Best-FirstInside Best-FirstC&C Beam-SearchInsideConstant Beam-SearchInsideDecay Beam-SearchC&CConstant Beam-SearchC&CDecay Beam-SearchC&CConstantYes Beam-SearchC&CDecayYes Using the C&C FOM –Beam-Search is faster (57%) and more accurate than Best-First –Exponential Decay adds additional 40% speed reduction

28 Results: Dev AlgorithmFOMBeam- Width Cell Closing Seconds per Sent Chart Entries F-Score CYK Best-FirstInside Best-FirstC&C Beam-SearchInsideConstant Beam-SearchInsideDecay Beam-SearchC&CConstant Beam-SearchC&CDecay Beam-SearchC&CConstantYes Beam-SearchC&CDecayYes

29 Results: Test AlgorithmFOMBeam- Width Cell Closing Seconds per Sent F-Score CYK Beam-SearchC&CConstant Beam-SearchC&CDecay Beam-SearchC&CDecayYes Berkeley C2F % relative speed-up (Decay vs. Constant beam-width) Decay pruning and Cell Closing Constraints are complementary Same ball-park as Coarse-to-Fine (perhaps a bit faster) Requires no knowledge of the grammar

30 Thanks

31 FOM Details C&C FOM Details –FOM(NT) = Outside left * Inside * Outside right –Inside = Constituent grammar score for NT –Outside left = Max { POS forward prob * POS-to-NT transition prob } –Outside right = Max { NT-to-POS transition prob * POS bkwd prob }

32 FOM Details C&C FOM Details

33 Research Goals –Find good solutions very quickly in this LARGE grammar space (not ML) –Algorithms should be grammar agnostic –Consider practical implications (speed, memory) Current projects towards these goals –Better FOM function Inside estimate (grammar refinement) Outside estimate (participation in complete parse tree) –Optimal chart traversal strategy Which areas of the search space are most promising? Cell Closing Constraints (Roark and Hollingshead, 2008) –Balance between targeted and exhaustive exploration How much “work” should be done exploring the search space around these promising areas? Overhead of targeted exploration (best-first) vs. memory and cache benefits of local exploration (CYK)