LREC 2008 Marrakech 29 May 20081 Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Statistical Machine Translation. General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Corpora and Translation Parallel corpora Statistical MT (not to mention: Corpus of translated text, for translation studies)
9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Natural Language Processing Expectation Maximization.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.
An Investigation of Statistical Machine Translation (Spanish to English) Raghav Bashyal.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Haitham Elmarakeby.  Speech recognition
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Machine Translation and MT tools: Giza++ and Moses
Machine Translation and MT tools: Giza++ and Moses
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine Translation based on Simulated Annealing

LREC 2008 Marrakech 29 May Outline Statistical Machine Translation (SMT) Concept of inter-lingual triggers Our SMT system based on inter-lingual triggers –Word-based approach –Phrase-based approach using Simulated Annealing algorithm (SA) Experiments Conclusion

LREC 2008 Marrakech 29 May T* = argmax T P(T|S) T* = argmax T P(T) * P(S|T) Given a source sentence S, find the best target sentence T * which maximizes the probability P(T|S)  Noisy channel approach Language model Translation Model Introduction Approaches Statistical Machine Translation Introduction:

LREC 2008 Marrakech 29 May Word-based approach –Translation process is done word-by-word –IBM models (Brown et al., 1993) Phrase-based approach (Och et al., 1999), (Yamada and Knight, 2001), (Marcu and Wong, 2002 ) –Better MT system quality –Advantages: Explicitly models lexical units Ex: rat de bibliothèque  bookworm Easily captures local reordering Ex: Tour Eiffel  Eiffel tower Introduction Approaches Statistical Machine Translation Approaches:

LREC 2008 Marrakech 29 May Current translation models are complex Their estimation needs a lot of time and memory A new translation model based on inter-lingual triggers: We propose a new translation model based on a simple concept: the triggers.

LREC 2008 Marrakech 29 May Triggers in statistical language modeling: Concept of inter-lingual triggers A trigger is a set composed of a word and its best correlated words. Triggers are determined by computing Mutual Information (MI) between words on a monolingual corpus. A trigger is a set composed of a word and its best correlated words. Triggers are determined by computing Mutual Information (MI) between words on a monolingual corpus. Gary Kasparov is a chess champion In statistical language modeling, triggers allow to enhance the probability of triggered words given a triggering word. Gary Kasparov is a chess champion In statistical language modeling, triggers allow to enhance the probability of triggered words given a triggering word. Review of triggers Inter-lingual triggers

LREC 2008 Marrakech 29 May Inter-lingual triggers: Review of triggers Inter-lingual triggers An inter-lingual trigger is a set composed of a source unit s and its best correlated target units: t 1, …, t n. Inter-lingual triggers are determined by computing Mutual Information (MI) between units on a bilingual aligned corpus. An inter-lingual trigger is a set composed of a source unit s and its best correlated target units: t 1, …, t n. Inter-lingual triggers are determined by computing Mutual Information (MI) between units on a bilingual aligned corpus. Gary Kasparov is a chess champion Gary Kasparov est un champion d’échecs We hope to find possible translations of s among the set of its triggered target units t 1, …, t n Gary Kasparov is a chess champion Gary Kasparov est un champion d’échecs We hope to find possible translations of s among the set of its triggered target units t 1, …, t n Concept of inter-lingual triggers

1 source word triggers 1 target word. LREC 2008 Marrakech 29 May To-1 triggers: Review of triggers Inter-lingual triggers n-To-m triggers: Gary→ →Kasparov →Gary Kasparov→chess échecs→chess champion→ n source words trigger m target words. Gary Kasparov→ champion d’échecs→chess champion un champion→a champion Kasparov est un→is a chess échecs→chess champion champion→is a chess Source: Gary Kasparov est un champion d’échecs Target: Gary Kasparov is a chess champion Source: Gary Kasparov est un champion d’échecs Target: Gary Kasparov is a chess champion Concept of inter-lingual triggers

LREC 2008 Marrakech 29 May SMT based on inter-lingual triggers How to make good use of inter-lingual triggers in order to estimate a translation model? Word-based translation model using 1-To-1 triggers Phrase-based translation model using n-To-m triggers

LREC 2008 Marrakech 29 May SMT based on inter-lingual triggers For each source word, we keep its k best 1-To-1 triggers. We hope this constitute its potential translations. Translation model –We assign to each inter-lingual trigger a probability calculated as follow: Word-based translation model Word-based translation model using 1-To-1 triggers

Motivations: –Most methods for learning phrase translations require word alignments –All phrase pairs that are consistent with this word alignment are collected  phrases with no linguistic motivation  noisy phrases LREC 2008 Marrakech 29 May Phrase-based translation model using on n-To-m triggers SMT based on inter-lingual triggers Phrase-based translation model

1.Extract phrases from the source corpus 2.Determine potential translations of the source phrases by using n-To-m triggers 3.Start with 1-To-1 triggers to set a baseline MT system 4.Select an optimal subset of n-To-m triggers by Simulated Annealing algorithm LREC 2008 Marrakech 29 May Method for learning phrase translation: SMT based on inter-lingual triggers Phrase-based translation model

Iterative process which selects phrases by grouping words with high Mutual Information. (Zitouni et al., 2003) Only those which improve the perplexity on the source corpus are kept. → pertinent source phrases LREC 2008 Marrakech 29 May Phrase extraction: Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm

LREC 2008 Marrakech 29 May Learning potential phrase translation: A source phrase can be translated by different target sequences of variable sizes. Assumption: each source phrase of l words can be translated by a sequence of j target words where j Є [l-Δl, l+Δl] For each source phrase of length l, potential translations are: sets of n-To-m triggers with n = l and m Є [l-Δl, l+ Δl] Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm

LREC 2008 Marrakech 29 May Example: Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Potential translations of porter plainte 2-To-12-To-22-To-3 presspress chargescan press charges chargescan pressnot press charges easynot pressyou can press Source phrase: porter plainte ( l=2 ) We assume that porter plainte can be translated by sequences of 1, 2 or 3 target words ( Δl=1 ).

LREC 2008 Marrakech 29 May General case: Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm All source phrases associated with its k potential translations constitute the set of n-To-m triggers. We have to select among n-To-m triggers pertinent translations and discard noisy ones. Our problem: find a optimal subset of phrase translations which leads to the best MT performance Unreasonnable to try all possibilities!!  Proposed method: use Simulated Annealing algorithm

LREC 2008 Marrakech 29 May Terminate search Initial configuration Pertub the configuration Accept new configuration Accept new configuration Update current configuration Adjust temperature Stop no yes no Simulated Annealing: Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm Technique applied to find an optimal solution to a combinatorial problem Initial temperature

LREC 2008 Marrakech 29 May Algorithm applied to SMT: Method for learning phrase translation Source phrase extraction Determine potential phrase translation Select optimal phrase translations by SA algorithm 1.Start with a high temperature T and a baseline word-based MT system using 1-To-1 triggers 2.do a)Perturb the system from state i to state j by randomly adding a subset of n-To-m triggers into the currrent SMT system b)Evaluate the performance of the new system ( E j ) c)If ( E j >E i ) then move from state i to state j Otherwise accepte state j with a probability random(P)<e(E i -E j )/T with P Є[0-1] Until the performance of our SMT system stops increasing 3.Decrease the temperature and go to step 2 until the performance of the system stops increasing

LREC 2008 Marrakech 29 May Text input decoder Text output Translation model Language model Bleu initial Subset of n-to-m trigers Bleu new > Bleu current Initial system Bleu new ≤ Bleu current New system Bleu current  Pertubation of the current system Bleu new 1-To-1 triggers n-To-m triggers

LREC 2008 Marrakech 29 May Subtitle copora: Experiments Corpora Tuning step Evaluation Subtitle parallel corpora built using Dynamic Time Wrapping algorithm (Lavecchia et al., 2007) FrenchEnglish TrainSentences Words Singletons Vocabulary DevSentences 1959 Words TestSentences 756 Words

LREC 2008 Marrakech 29 May SA algorithm parameters: Experiments Corpora Tuning step Evaluation 1-To-1 triggers: all source words associated with its best 50 target words n-To-m triggers: –15860 source phrases –all source phrases associated with its 30 best n-To-1, n-To-2 and n-To-3 inter-lingual triggers Initial temperature: System perturbation: adding 10 potential translations of 10 source phrases

LREC 2008 Marrakech 29 May Initial system: Experiments Corpora Tuning step Evaluation Translation ModeltmlmdwBleu 1-To-1 triggers IBM M3 (2) (1)Trigram model (2)(Brown et al., 1993) Text input Text output Pharaoh decoder Word translation model Language model (1)

LREC 2008 Marrakech 29 May Final system: Experiments Corpora Tuning step Evaluation Translation ModeltmlmdwBleu optimal n-To-m triggers Reference (2) (1)Trigram model (2)(Och, 2002) Text input Text output Pharaoh decoder Phrase translation model Language model (1)

LREC 2008 Marrakech 29 May Evaluation of the final system: Experiments Corpora Tuning step Evaluation Inter-lingual triggersState of the art 1-To-1n-To-m IBM3Reference Dev Test Lead of n-to-m triggers on 1-to-1 triggers not corroborated on the test corpus Explanations: - Over-fitting due to poor amount of data - Corpora of different movie styles Lead of n-to-m triggers on 1-to-1 triggers not corroborated on the test corpus Explanations: - Over-fitting due to poor amount of data - Corpora of different movie styles Impact of over-fitting more important on the state-of-the-art systems.

LREC 2008 Marrakech 29 May Conclusion and future work: A new method for learning phrase translations 1.Extract source phrases 2.Find phrase translations using inter-lingual triggers 3.Select the pertinent ones using SA algorithm advantages: no word alignment + more pertinent phrase translations Experiments on movie subtitle corpora  More robust on sparse data than a state-of-the-art approach  Better translation quality in terms of Bleu score (+7pts dev., +4pts test) A new method for learning phrase translations 1.Extract source phrases 2.Find phrase translations using inter-lingual triggers 3.Select the pertinent ones using SA algorithm advantages: no word alignment + more pertinent phrase translations Experiments on movie subtitle corpora  More robust on sparse data than a state-of-the-art approach  Better translation quality in terms of Bleu score (+7pts dev., +4pts test) Improvement of our system  Classify movies  Integrate linguistic knowledge in the translation process  Considering inter-lingual triggers not only on word surface forms Improvement of our system  Classify movies  Integrate linguistic knowledge in the translation process  Considering inter-lingual triggers not only on word surface forms