Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Learning HMM parameters

The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Lecture 5: Learning models using EM

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

University of AlbertaNov 30, 2006 Inversion Transduction Grammar with Linguistic Constraints Colin Cherry University of Alberta.

CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

Stochastic Inversion Transduction Grammars Dekai Wu Advanced Machine Translation Seminar Presented by: Sanjika Hewavitharana 04/13/2006.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Unit-3 Bottom-Up-Parsing.

Parsing IV Bottom-up Parsing

Variational Bayes Model Selection for Mixture Distribution

Stochastic Context-Free Grammars for Modeling RNA

Programming Language Syntax 7

Training Tree Transducers

Stochastic Context-Free Grammars for Modeling RNA

Statistical Machine Translation

Introduction to EM algorithm

CSCI 5832 Natural Language Processing

Programming Language Syntax 5

Stochastic Context Free Grammars for RNA Structure Modeling

CSCI 5832 Natural Language Processing

Statistical Machine Translation Papers from COLING 2004

Pegna, J.M., Lozano, J.A., and Larragnaga, P.

Statistical NLP Spring 2011

Dekai Wu Presented by David Goss-Grubbs

Presentation transcript:

Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun Lang I2R SMT-Reading Group 1

Paper info Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing ACL-08 Long Paper Cited :Thirty Seven Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea 2

Core Ideas Variational Bayes Tic-tac-toe pruning Word-to-phrase bootstrapping 3

Outline Paper present – Pipeline – Model – Training – Parsing (Pruning) – Result Shortcomings Discussion 4

Summary of the Pipeline Run IBM Model 1 on sentence-aligned data Use tic-tac-toe pruning to prune the bitext space Word-based ITG, Variational Bayes training, get the Viterbi alignment Non-compositional constraints to constrain the space of phrase pairs Phrasal ITG, VB training, Viterbi pass to get the phrasal alignment 5

Phrasal Inversion Transduction Grammar 6

Dirichlet Prior for Phrasal ITG 7

X1X1 X n-1 ZnZn X n+1 XNXN …….. root 0/0T/Vt/vs/u i Review : Inside-Outside Algorithm …….. Forward-backward Algorithm: not only used for HMM, but also for any State Space Model Inside-Outside Algorithm is a special case of Forward-backward Algorithm. Shujie liu 8

VB Algorithm for Training SITGs - E1 Inside probabilities : Initialization : Recursion : i (s/u-t/v) t/vs/u S/U j (s/u-S/U) k (S/U-t/v) Copy from liu 9

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 10

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 11

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 12

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) Copy from liu 13

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 14

VB Algorithm for Training SITGs - E2 Outside probabilities : Initialization : Recursion : j (s/u-t/v) t/vS/U s/u k (S/U-s/u) i (s/u-t/v) j (s/u-t/v) S/Us/u i (S/U-s/u) k (s/u-t/v) t/v Copy from liu 15

VB Algorithm for Training SITGs - M s=3, is the number of right-hand-sides for X m is the number of observed phrase pairs ψ is the digamma function 16

Pruning Tic-tac-toe pruning (Hao Z hang 2005) Fast Tic-tac-toe pruning (Hao Z hang 2008) High-precision alignments pruning (Haghighi ACL2009) – Prune all bitext cells that would invalidate more than 8 of high-precision alignments 1-1 alignment posterior pruning (Haghighi ACL2009) – Prune all 1-1 bitext cells that have a posterior below in both HMM Models 17

Tic-tac-toe pruning (Hao Z hang 2005) 18

Non-compositional Phrases Constraint e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring 19

Word Alignment Evaluation Both 10 iterations training EM : lowest AER is achieved after the second iteration, which is At iteration 10, AER for EM increase to 0.42 VB : ac is 1e-9, VB get AER close to 0.35 at iteration

End-to-end Evaluation NIST Chinese-English training data NIST 2002 evaluation datasets for tuning and evalution 10-reference development set was used for MERT 4-reference test set was used for evaluation. 21

Shortcomings Grammar is not perfect Itg ordering is context independent Phrasal pairs are sparse 22

Grammar is not perfect Over-counting problem alternative ITG parse trees have the same word alignment matching, which is called over-counting problem. ITG Parser Tree SpaceWord Alignment Space I am rich ! ^^ vv 23

A better-constrained grammar A series of nested constituents with the same orientation will always have a left-heavy derivation And the second parser tree of the former example will not be generated. C->1/3C->2/4C-> 3/2C-> 4/1 A -> [C C] B -> ? 24

Thanks Q&A 25