Improving IBM Word-Alignment Model 1(Robert C. MOORE)

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Measures of Coincidence Vasileios Hatzivassiloglou University of Texas at Dallas.

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.

Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

Natural Language Processing Expectation Maximization.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Students: Nidal Hurani, Ghassan Ibrahim Supervisor: Shai Rozenrauch Industrial Project (234313) Tube Lifetime Predictive Algorithm COMPUTER SCIENCE DEPARTMENT.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Statistical Machine Translation Part II: Word Alignments and EM

Bayesian Semi-Parametric Multiple Shrinkage

Today Cluster Evaluation Internal External

Alexander Fraser CIS, LMU München Machine Translation

Erasmus University Rotterdam

KantanNeural™ LQR Experiment

Joint Training for Pivot-based Neural Machine Translation

Statistical NLP: Lecture 13

Neural Machine Translation By Learning to Jointly Align and Translate

--Mengxue Zhang, Qingyang Li

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

Alex Fraser Institute for Natural Language Processing

Neural Language Model CS246 Junghoo “John” Cho.

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Learning Algorithm Evaluation

N-Gram Model Formulas Word sequences Chain rule of probability

Expectation-Maximization Algorithm

iSRD Spam Review Detection with Imbalanced Data Distributions

CS4705 Natural Language Processing

Improved Word Alignments Using the Web as a Corpus

Statistical Machine Translation Papers from COLING 2004

The XMU SMT System for IWSLT 2007

Generalized Linear Models

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Translating Collocations for Bilingual Lexicons

Presentation transcript:

Improving IBM Word-Alignment Model 1(Robert C. MOORE) nonstructural problems with IBM model 1 cause alignment error. Rare words in source language tend to be “garbage collectors”. Null source word is aligned with too few target words. Changing parameter estimation addresses the problem. Smoothing translation counts: “add-n” smoothing. Adding null words to the source sentence. Initialize Model 1 with LLR statistics.

Improving IBM Word-Alignment Model 1(Robert C. MOORE) Evaluation results show that parameter estimation helps reducing AER (Och and Ney, 2003). Each method helps reduce AER. Combined model reduces AER by 30%. Conclusion: 30% reduction in AER is achieved simply by changing parameter estimation. LLR, compared with Dice coefficient, as the initializing statistics, addresses the over-fitting problem for rare words.

Multi-Engine Machine Translation with Voted Language Model (Tadashi Nomoto) Describes a particular approach that: Takes into account the reliability of each model in MEMT. Uses a voting scheme to pick up an LM. Nomoto (2003) uses Support Vector regression (SVR) to exploit bias in performance of MT systems. Votes for language model (LM). Experiments show choice of LM does influence performance. V-by-M: perplexity of LM is a good predictor.

Multi-Engine Machine Translation with Voted Language Model (Tadashi Nomoto) Experiments results: V-by-M scheme improves significantly the performance of MEMT. V-by-M does not influence regressive MEMT systems as much as it influences MEMT systems. Both MEMT and regressive MEMT outperform single MT systems.

Statistical Machine Translation with Word and Sentence-Aligned Parallel Corpora (Chris Callison-Burch David Talbot Miles Osborne) Significant improvement can be achieved by including word-aligned information during training. The modified parameter estimation approach: There are sentence pairs that have explicit word-level alignment. In parameter estimation, the mixed likelihood function combines the expected information in sentence-aligned pairs and complete information in word-aligned pairs.

Statistical Machine Translation with Word and Sentence-Aligned Parallel Corpora (Chris Callison-Burch David Talbot Miles Osborne) Adding word-aligned data reduces AER. For IBM models 1, 3, 4 and HMM, adding word-aligned sentence pairs helps reduce AER. The difference in best models with (IBM model 4) and without (HMM model) word-aligned information is 38% reduction in AER. Using word-aligned data improves translation quality. Increasing the weight and ratio of word-aligned data both increase AER decrease. Discussion and future work: Using word-aligned data is much cheaper and more accurate to build parallel corpora than using professional translators. Which sentences in the training corpus should be word-aligned?

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler) The paper: Views aligning words from sentences as Orthogonal Non-negative Matrix Factorization (ONMF). Develops an algorithm that performs ONMF. Improves in several ways over state-of-the-art results. An algorithm performs ONMF in 2 steps. Factorize M using Probabilistic Latent Semantic Analysis, akaPLSA. Orthogonalise factors using Maximum A Posteriori (MAP) assignment of words to cepts. Estimate the number of cepts by maximising AIC or BIC between min(I, J) and 1.

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler)

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler) Results show: For HLT-NAACL French-English task, better recall and F-score is achieved however at the cost of a low precision. For Romanian-English task, matrix factorization approach increases recall, but decreases AER and precision. For both tasks, matrix factorization approach provides 100% coverage and aligns all words. Discussion and conclusion. Problems: local optima with PLSA and other ways to get original translation matrix M. This matrix factorization does not improve AER, but guarantees both proper alignments and good coverage.