Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.

A Tutorial on Learning with Bayesian Networks

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,

Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.

Gibbs Sampling Qianji Zheng Oct. 5th, 2010.

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

British Museum Library, London Picture Courtesy: flickr.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.

Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.

Natural Language Processing Expectation Maximization.

Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

1 Data Mining over the Deep Web Tantan Liu, Gagan Agrawal Ohio State University April 12, 2011.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.

Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Rating Table Tennis Players An application of Bayesian inference.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Integrating Topics and Syntax -Thomas L

Korea Maritime and Ocean University NLP Jung Tae LEE

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Logic and Vocabulary of Hypothesis Tests Chapter 13.

Addressing the Rare Word Problem in Neural Machine Translation

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

STATISTICAL INFERENCE PART VI HYPOTHESIS TESTING 1.

Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.

Sampling considerations within Market Surveillance actions Nikola Tuneski, Ph.D. Department of Mathematics and Computer Science Faculty of Mechanical Engineering.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Dirichlet Distribution

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Statistics -Continuous probability distribution 2013/11/18.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.

Fast search for Dirichlet process mixture models

Statistical Machine Translation Part II: Word Alignments and EM

More on Inference.

MCMC Output & Metropolis-Hastings Algorithm Part I

Advanced Statistical Computing Fall 2016

Distributions and Concepts in Probability Theory

More on Inference.

Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.

Expectation-Maximization Algorithm

Stochastic Optimization Maximization for Latent Variable Models

Machine Translation and MT tools: Giza++ and Moses

Improving IBM Word-Alignment Model 1(Robert C. MOORE)

Topic Models in Text Processing

Machine Translation and MT tools: Giza++ and Moses

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group

Paper info Bayesian Word Alignment for Statistical Machine Translation ACL 2011 Short Paper With Source Code in Perl on 379 lines Authors –Coskun Mermer –Murat Saraclar

Core Idea Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1 Result –Outperform classical EM in BLEU up to 2.99 –Effectively address the rare word problem –Much smaller phrase table than EM

Mathematics (E, F): parallel corpus e i, f j : i-th (j-th) source (target) word in e (f), which contains I (J) words in corpus E (F). e 0 : Each E sentence contains “null” word V E (V F ): size of source (target) vocabulary a (A): alignment for sentence (corpus) a j : f j has alignment a j for source word e aj T: parameter table, size is V E x V F t e,f = P(f|e): word translation probability

IBM Model 1 T as a random variable

Dirichlet Distribution T={t e,f } is an exponential family distribution Specifically being multinomial distribution We choose the conjugate prior In the case of Dirichlet Distribution for computational convenience

Dirichlet Distribution Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution Avoid rare words acting as “garbage collectors”

Dirichlet Distribution sample the unknowns A and T in turn ¬j denotes the exclusion of the current value of aj.

Algorithm A can be arbitrary, but normal EM output is better

Results

Code View bayesalign.pl

Conclusions Outperform classical EM in BLEU up to 2.99 Effectively address the rare word problem Much smaller phrase table than EM Shortcomings –Too slow: 100 sentence pairs costs 18 mins –Maybe can be speedup by parallel computing

3