Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.

Slides:



Advertisements
Similar presentations
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
Advertisements

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
Exponential Decay Pruning for Bottom-Up Beam-Search Parsing Nathan Bodenstab, Brian Roark, Aaron Dunlop, and Keith Hall April 2010.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Intro to NLP - J. Eisner1 Probabilistic CKY.
Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Cube Pruning as Heuristic Search Mark Hopkins and Greg Langmead Language Weaver, Inc.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011.
LING 388: Language and Computers Sandiway Fong Lecture 17.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages Dan Garrette, Jason Mielens, and Jason Baldridge Proceedings of ACL 2013.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Prototype-Driven Grammar Induction Aria Haghighi and Dan Klein Computer Science Division University of California Berkeley.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Statistical NLP Winter 2009
CSE 517 Natural Language Processing Winter 2015
Solving MAP Exactly by Searching on Compiled Arithmetic Circuits
Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley
Statistical NLP Spring 2011
Prototype-Driven Learning for Sequence Models
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
CS246: Information Retrieval
Statistical NLP Spring 2011
Presentation transcript:

Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley

Inference for NLP Tasks A* Search

Inference as Search y a1a1 a2a2 a3a3 Partial Hypothesis a2a2

VP S NP Bitext Parsing as Search translation is hard, la traducción es dificil Weighted Synchronous Grammar Parsing O(n 6 ) Modified CKY over bi-spans (X[i,j],X’[i’,j’]) Source Target VP S NP SS’

A* Search Completion ScoreScore So Far y

A* Search  Heuristic Design  Tight small  Admissible  Efficient to compute This way hypothesis! A* Heuristic Man Optimal Result

A* Example: Bitext Search Viterbi Inside Score Cost So Far Bi-Span

A* Bitext Search Viterbi Outside Score Completion Score O(n6)O(n6) Ideal Heuristic

Of Stately Projections ¼ SS’ S S VP S NP S S’ S VP S NP VP’ S’ NP’VP’ S’ NP’

A* Bitext Search Suppose, Then, VP S NP S S’ VP S NP S VP’ S’ NP’

Projection Heuristic O(n3)O(n3) O(n3)O(n3) O(n6)O(n6) Klein and Manning [2003]

When models don’t factorize

Pointwise Admissibility y c( a ) x ¼s(y)¼s(y) Ás(a)Ás(a) ¼s(x)¼s(x) ¼t(y)¼t(y) Át(a)Át(a) ¼t(x)¼t(x)

When models don’t factorize Admissibility ¼s(y)¼s(y) ¼t(y)¼t(y) y

Finding Factored Costs Pointwise Gap How to find Á s and Á t ?

Finding Factored Costs Small gaps

Finding Factored Costs Pointwise Admissibility

Finding Factored Costs

Bitext Experiments Synchronous Tree-to-Tree Transducer  Trained on 40k sentences of English-Spanish Europarl [Galley et. al, 2004]  Rare words replaced with POS tags  Tested on 1,200 sent. max length 5-15 Optimization Problem  Solved only once per grammar  206K Variables  160K Constraints  29 minutes

Bitext Experiments

Zhang and Gildea (2006)

Bitext Experiments Zhang and Gildea (2006)

Lexicalized Parsing NP- (translation,NN) S- (is,VBZ) VP-(is,VBZ) (is,VBZ) (translation, NN) NP S VP Klein and Manning [2003]

Lexicalized Parsing

Too many constraints to efficiently solve! Over 64e 13 possible lexicalized rules

Lexicalized Parsing

Lexicalized Model Experiments Standard Setup  Train on section 2-21 of the treebank  Test on section 23 (length · 40) Models Tested  Factored model [Klein and Manning, 2003]  Non-Factored Model

Lexicalized Parsing Factored Model [Klein and Manning, 2003]

Lexicalized Parsing Non-Factored Model

Conclusions  General technique for generating A* estimates  Can explicitly control admissibility tightness trade-off  Future Work: Explore different objectives and applications

Thanks