Improving IBM Word-Alignment Model 1(Robert C. MOORE)

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Measures of Coincidence Vasileios Hatzivassiloglou University of Texas at Dallas.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Natural Language Processing Expectation Maximization.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Students: Nidal Hurani, Ghassan Ibrahim Supervisor: Shai Rozenrauch Industrial Project (234313) Tube Lifetime Predictive Algorithm COMPUTER SCIENCE DEPARTMENT.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Statistical Machine Translation Part II: Word Alignments and EM
Bayesian Semi-Parametric Multiple Shrinkage
Today Cluster Evaluation Internal External
Alexander Fraser CIS, LMU München Machine Translation
Erasmus University Rotterdam
KantanNeural™ LQR Experiment
Joint Training for Pivot-based Neural Machine Translation
Statistical NLP: Lecture 13
Neural Machine Translation By Learning to Jointly Align and Translate
--Mengxue Zhang, Qingyang Li
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Alex Fraser Institute for Natural Language Processing
Neural Language Model CS246 Junghoo “John” Cho.
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Learning Algorithm Evaluation
N-Gram Model Formulas Word sequences Chain rule of probability
Expectation-Maximization Algorithm
iSRD Spam Review Detection with Imbalanced Data Distributions
CS4705 Natural Language Processing
Improved Word Alignments Using the Web as a Corpus
Statistical Machine Translation Papers from COLING 2004
The XMU SMT System for IWSLT 2007
Generalized Linear Models
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Translating Collocations for Bilingual Lexicons
Presentation transcript:

Improving IBM Word-Alignment Model 1(Robert C. MOORE) nonstructural problems with IBM model 1 cause alignment error. Rare words in source language tend to be “garbage collectors”. Null source word is aligned with too few target words. Changing parameter estimation addresses the problem. Smoothing translation counts: “add-n” smoothing. Adding null words to the source sentence. Initialize Model 1 with LLR statistics.

Improving IBM Word-Alignment Model 1(Robert C. MOORE) Evaluation results show that parameter estimation helps reducing AER (Och and Ney, 2003). Each method helps reduce AER. Combined model reduces AER by 30%. Conclusion: 30% reduction in AER is achieved simply by changing parameter estimation. LLR, compared with Dice coefficient, as the initializing statistics, addresses the over-fitting problem for rare words.

Multi-Engine Machine Translation with Voted Language Model (Tadashi Nomoto) Describes a particular approach that: Takes into account the reliability of each model in MEMT. Uses a voting scheme to pick up an LM. Nomoto (2003) uses Support Vector regression (SVR) to exploit bias in performance of MT systems. Votes for language model (LM). Experiments show choice of LM does influence performance. V-by-M: perplexity of LM is a good predictor.

Multi-Engine Machine Translation with Voted Language Model (Tadashi Nomoto) Experiments results: V-by-M scheme improves significantly the performance of MEMT. V-by-M does not influence regressive MEMT systems as much as it influences MEMT systems. Both MEMT and regressive MEMT outperform single MT systems.

Statistical Machine Translation with Word and Sentence-Aligned Parallel Corpora (Chris Callison-Burch David Talbot Miles Osborne) Significant improvement can be achieved by including word-aligned information during training. The modified parameter estimation approach: There are sentence pairs that have explicit word-level alignment. In parameter estimation, the mixed likelihood function combines the expected information in sentence-aligned pairs and complete information in word-aligned pairs.

Statistical Machine Translation with Word and Sentence-Aligned Parallel Corpora (Chris Callison-Burch David Talbot Miles Osborne) Adding word-aligned data reduces AER. For IBM models 1, 3, 4 and HMM, adding word-aligned sentence pairs helps reduce AER. The difference in best models with (IBM model 4) and without (HMM model) word-aligned information is 38% reduction in AER. Using word-aligned data improves translation quality. Increasing the weight and ratio of word-aligned data both increase AER decrease. Discussion and future work: Using word-aligned data is much cheaper and more accurate to build parallel corpora than using professional translators. Which sentences in the training corpus should be word-aligned?

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler) The paper: Views aligning words from sentences as Orthogonal Non-negative Matrix Factorization (ONMF). Develops an algorithm that performs ONMF. Improves in several ways over state-of-the-art results. An algorithm performs ONMF in 2 steps. Factorize M using Probabilistic Latent Semantic Analysis, akaPLSA. Orthogonalise factors using Maximum A Posteriori (MAP) assignment of words to cepts. Estimate the number of cepts by maximising AIC or BIC between min(I, J) and 1.

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler)

Align using matrix factorization (Cyril Goutte, Kenji Yamada and Eric Gaussler) Results show: For HLT-NAACL French-English task, better recall and F-score is achieved however at the cost of a low precision. For Romanian-English task, matrix factorization approach increases recall, but decreases AER and precision. For both tasks, matrix factorization approach provides 100% coverage and aligns all words. Discussion and conclusion. Problems: local optima with PLSA and other ways to get original translation matrix M. This matrix factorization does not improve AER, but guarantees both proper alignments and good coverage.