A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages Minh-Thang Luong, Preslav Nakov & Min-Yen.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, /20/20141.

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Morphological Analysis for Phrase-Based Statistical Machine Translation LUONG Minh Thang Supervisor: Dr. KAN Min Yen National University of Singapore Web.

Morphological Analysis for Phrase- Based Statistical Machine Translation Luong Minh Thang WING group meeting – 15 Aug, 2008 HYP update - part1 4/30/20151.

Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.

MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.

Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.

Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.

2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.

Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.

Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.

Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages Minh-Thang Luong, Preslav Nakov & Min-Yen Kan EMNLP 2010,

Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.

Is Neural Machine Translation the New State of the Art?

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Statistical Machine Translation Part II: Word Alignments and EM

METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.

Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita

Approaches to Machine Translation

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.

Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU

Alexander Fraser CIS, LMU München Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Joint Training for Pivot-based Neural Machine Translation

Suggestions for Class Projects

Morphological Segmentation Inside-Out

Build MT systems with Moses

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Approaches to Machine Translation

Improved Word Alignments Using the Web as a Corpus

Memory-augmented Chinese-Uyghur Neural Machine Translation

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Statistical Machine Translation Papers from COLING 2004

Machine Translation(MT)

Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.

Statistical NLP Spring 2011

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

A Joint Model of Orthography and Morphological Segmentation

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages Minh-Thang Luong, Preslav Nakov & Min-Yen Kan EMNLP 2010, Massachusetts, USA

epäjärjestelmällisyydellistyttämättömyydellänsäkäänköhän This is a Finnish word!!! epä+ järjestelmä+ llisyy+ dellisty+ ttä+ mättö+ myy+ dellä+ nsä+ kään+ kö+ hän 56 characters Okie, first of all, does anyone of you know what it is? Well... it looks like random characters, but it’s actually not. This is a 56 character-long Finnish word Here how is break into morphemes, which are minimal meaning-bearing units in languages, and the meaning of individual morphemes. Huge number of distinct word forms A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

epäjärjestelmällisyydellistyttämättömyydellänsäkäänköhän This is a Finnish word!!! epä+ järjestelmä+ llisyy+ dellisty+ ttä+ mättö+ myy+ dellä+ nsä+ kään+ kö+ hän I wonder if it’s not with his act of not having made something be seen as unsystematic 56 characters Okie, first of all, does anyone of you know what it is? Well... it looks like random characters, but it’s actually not. This is a 56 character-long Finnish word Here how is break into morphemes, which are minimal meaning-bearing units in languages, and the meaning of individual morphemes. Huge number of distinct word forms A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

epäjärjestelmällisyydellistyttämättömyydellänsäkäänköhän This is a Finnish word!!! epä+ järjestelmä+ llisyy+ dellisty+ ttä+ mättö+ myy+ dellä+ nsä+ kään+ kö+ hän I wonder if it’s not with his act of not having made something be seen as unsystematic 56 characters Okie, first of all, does anyone of you know what it is? Well... it looks like random characters, but it’s actually not. This is a 56 character-long Finnish word Here how is break into morphemes, which are minimal meaning-bearing units in languages, and the meaning of individual morphemes. Huge number of distinct word forms A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

epäjärjestelmällisyydellistyttämättömyydellänsäkäänköhän This is a Finnish word!!! epä+ järjestelmä+ llisyy+ dellisty+ ttä+ mättö+ myy+ dellä+ nsä+ kään+ kö+ hän I wonder if it’s not with his act of not having made something be seen as unsystematic 56 characters Okie, first of all, does anyone of you know what it is? Well... it looks like random characters, but it’s actually not. This is a 56 character-long Finnish word Here how is break into morphemes, which are minimal meaning-bearing units in languages, and the meaning of individual morphemes. Huge number of distinct word forms A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Morphologically rich languages are hard for MT epäjärjestelmällisyydellistyttämättömyydellänsäkäänköhän This is a Finnish word!!! epä+ järjestelmä+ llisyy+ dellisty+ ttä+ mättö+ myy+ dellä+ nsä+ kään+ kö+ hän I wonder if it’s not with his act of not having made something be seen as unsystematic Morphologically rich languages (Arabic, Basque, Turkish, Russian, Hungarian, etc. ) Extensive use of affixes Okie, first of all, does anyone of you know what it is? Well... it looks like random characters, but it’s actually not. This is a 56 character-long Finnish word Here how is break into morphemes, which are minimal meaning-bearing units in languages, and the meaning of individual morphemes. Huge number of distinct word forms Morphologically rich languages are hard for MT  Analysis at the morpheme level is needed. A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Our approach: the basic unit of translation is the morpheme, Morphological Analysis Helps – Translation into morphologically poor languages Morpheme representation alleviates data sparseness Arabic  English (Lee, 2004) Czech  English (Goldwater & McClosky, 2005) Finnish  English (Yang & Kirchhoff, 2006) For large corpora, word representation captures context better Arabic English (Sadat & Habash, 2006) Finnish English (de Gispert et al., 2009) Our approach: the basic unit of translation is the morpheme, but word boundaries are respected at all MT stages. But, the effect diminishes with more data ….. Also, if you notice, all of these systems translate from morphologically rich languages into morphologically poor languages like English  We interested in the reverse direction A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

We want an unsupervised approach Morphological Analysis Helps – Translation into morphologically rich languages A challenging translation direction Recent interest in: English  Arabic (Badr et al., 2008) English  Turkish (Oflazer and El-Kahlout, 2007)  enhance the performance for small bi-texts only English  Greek (Avramidis and Koehn, 2008), English  Russian (Toutanova et al., 2008) rely heavily on language-specific tools much harder We want an unsupervised approach that works for large training bi-texts. A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Methodology Morphological Analysis – Unsupervised Morphological Enhancements – Respect word boundaries Translation Model Enrichment – Merge phrase tables (PT) (5) Translation Model Enrichment PT scoring Phrase table pairs (2) Phrase extraction (3) MERT tuning (4) Decoding (1) Morphological analysis Morph- emes Align- ment Alignment training Words Here’s the sketch of our system: * Our morphological analysis breaks words into morphemes. Now everything operates at the morpheme level * After IBM Model 4 training, we obtain the alignment. * Next come our morphological enhancements: we impose the word boundaries in the phrase extraction, MERT tuning, and decoding * Last is our translation model enrichment. We essentially merge phrase tables at the word and morpheme levels. And this is orthogonal to the above enhancements A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Morphological Analysis Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Morphological Analysis Use Morfessor (Creutz and Lagus, 2007) - unsupervised morphological analyzer Segments words  morphemes (PRE, STM, SUF) un/PRE+ care/STM+ ful/SUF+ ly/SUF “+” sign used to enforce word boundary constraints later A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Word Boundary-aware Phrase Extraction Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Word Boundary-aware Phrase Extraction SRC = theSTM newSTM , unPRE+ democraticSTM immigrationSTM policySTM TGT = uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM (uusi=new , epädemokraattisen=undemocratic maahanmuuttopolitiikan=immigration policy) Typical SMT: maximum phrase length n=7 words Problem: morpheme phrases of length n can span less than n words may only partially span words This problem is severe for morphologically rich languages. Solution: morpheme phrases span up to n words fully span words * Now, we look at the phrase extraction process during training * This is a pair of English-Finnish fragments, with alignments, indicating this part means ``new’’ … * Typical SMT extract phrases based on these alignments, and often the maximum length of the phrases extracted is often n=7 words * However, there are these problems at the morpheme level: ….. For example here, this single Finnish word already spans 6 morphemes, and a phrase of 7 morphemes in this case span less than 2 words, which is a disadvantage * Another problem …. This phrase is spurious since it ends somewhere in the middle of a word. And this problem is severe for morphologically rich languages with a lot of affixes. * While we may miss the opportunity to generate new wordforms for known baseforms, we remove the problem of proposing nonwords in the target language. avoid proposing nonwords Training A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Morpheme MERT Optimizing Word BLEU Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Morpheme MERT Optimizing Word BLEU Why ? BLEU’s brevity penalty is influenced by sentence length The same # of words span different # of morphemes a 6-morpheme Finnish word: epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF  Suboptimal weight for the SMT word penalty feature Solution: optimize on word BLEU. Each MERT iteration: Decode at the morpheme level Convert morpheme translation  word sequence Compute word BLEU Convert back to morphemes * At the tuning stage, morpheme systems naturally perform MERT at the same level, ... and essentially they optimize on morpheme BLEU. * So we think that it’s not optimal because: BLEU brevity penalty ... And in the morpheme context, the same # words ... For example here, a word spans 6 morphemes; other words might span 1, 2, or 3 morphemes This leads to ... * The solution: Tuning A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Decoding with Twin Language Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Decoding with Twin Language Models Morpheme language model (LM) Pros: alleviates data sparseness Cons: phrases span fewer words Introduce a second LM at the word level Log-linear model: add a separate feature Moses decoder: add word-level “view” on the morpheme-level hypotheses Previous hypotheses Current hypothesis uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM Morpheme LM: Now, at the decoding stage, Typical morpheme LM has the pros of ... and the cons .... So, we introduce ... ... What do we mean by that? This is an example of decoding outputs: here’s the part that the current hypothesis spans and the other part covered by previous hypotheses. With the traditionaly 3-gram morpheme LM, we first … Decoding A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Decoding with Twin Language Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Decoding with Twin Language Models Morpheme language model (LM) Pros: alleviates data sparseness Cons: phrases span fewer words Introduce a second LM at the word level Log-linear model: add a separate feature Moses decoder: add word-level “view” on the morpheme-level hypotheses Previous hypotheses Current hypothesis uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM Morpheme LM: “sSUF+ enSUF maahanmuuttoPRE+” Decoding A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Decoding with Twin Language Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Decoding with Twin Language Models Morpheme language model (LM) Pros: alleviates data sparseness Cons: phrases span fewer words Introduce a second LM at the word level Log-linear model: add a separate feature Moses decoder: add word-level “view” on the morpheme-level hypotheses Previous hypotheses Current hypothesis uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM Morpheme LM: “sSUF+ enSUF maahanmuuttoPRE+” ; “enSUF maahanmuuttoPRE+ politiikanSTM ” Decoding A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Decoding with Twin Language Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Decoding with Twin Language Models Morpheme language model (LM) Pros: alleviates data sparseness Cons: phrases span fewer words Introduce a second LM at the word level Log-linear model: add a separate feature Moses decoder: add word-level “view” on the morpheme-level hypotheses Previous hypotheses Current hypothesis uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM Morpheme LM: “sSUF+ enSUF maahanmuuttoPRE+” ; “enSUF maahanmuuttoPRE+ politiikanSTM ” Word LM: uusi , epädemokraattisen maahanmuuttopolitiikan Decoding A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Decoding with Twin Language Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Decoding with Twin Language Models Morpheme language model (LM) Pros: alleviates data sparseness Cons: phrases span fewer words Introduce a second LM at the word level Log-linear model: add a separate feature Moses decoder: add word-level “view” on the morpheme-level hypotheses Previous hypotheses Current hypothesis uusiSTM , epäPRE+ demokraatSTM+ tSUF+ iSUF+ sSUF+ enSUF maahanmuuttoPRE+ politiikanSTM Morpheme LM: “sSUF+ enSUF maahanmuuttoPRE+” ; “enSUF maahanmuuttoPRE+ politiikanSTM ” Word LM: uusi , epädemokraattisen maahanmuuttopolitiikan Scoring with multiple LMs, e.g. in Moses: requires same token granularity Factored translation model (Koehn & Hoang, 2007) : requires a 1-1 correspondence among factors N-best re-scoring with a word LM (Oflazer & El-Kahlout, 2007) is inferior This is (1) different from scoring with two word-level LMs & (2) superior to n-best rescoring. Decoding A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Building Twin Translation Models Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Building Twin Translation Models Word GIZA++ Word alignment Morpheme alignment Morpheme Phrase Extraction Parallel corpora PTw PTm problems ||| vaikeuksista problemSTM+ sSUF ||| vaikeuSTM+ ksiSUF+ staSUF Morphological segmentation PTw→m Decoding PT merging problemSTM+ sSUF ||| ongelmaSTM+ tSUF * Lastly, I talk about our translation model enrichment, which is independent and orthogonal to previous morphological enhancements. * There are two parts: we first build what we call a twin translation model. * This is a typical SMT process: we first get alignments from GIZA++ then extract phrases. * So we have two phrase tables: one at the word-level, and one at the morpheme level * Vaikeuksista here means difficulties, where as ongelmat means exactly problems. We want to get translation evidences from both phrase tables, so we first break PTw into morphemes, and perform a merging. From the same source, we generate two translation models. Enriching A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Phrase Table (PT) Merging Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Phrase Table (PT) Merging Phrase translation probabilities Lexicalized translation probabilities problemSTM+ sSUF ||| vaikeuSTM+ ksiSUF+ staSUF ||| 0.07 0.11 0.01 0.01 2.7 ||| 0.37 0.60 0.11 0.14 2.7 problemSTM+ sSUF ||| ongelmaSTM+ tSUF Phrase penalty Add-feature methods e.g., Chen et al., 2009 Heuristic-driven Interpolation-based methods e.g., Wu & Wang, 2007 Linear interpolation of phrase and lexicalized translation probabilities  For two PTs originating from different sources in the 1st PT problemSTM+ sSUF ||| vaikeuSTM+ ksiSUF+ staSUF ||| 0.07 0.11 0.01 0.01 2.7 2.7 1 ||| 0.37 0.60 0.11 0.14 2.7 1 2.7 problemSTM+ sSUF ||| ongelmaSTM+ tSUF in the 2nd PT Just a brief note about phrase table merging: here’s an examples of two entries in a PT. Here’re the forward & backward translation probabilities .. We take into account the fact that our twin translation models are of equal quality. Enriching A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Our Method: Phrase Translation Probabilities Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Our Method: Phrase Translation Probabilities Preserve the normalized ML estimations (Koehn et al., 2003) Use the raw counts of both models to compute The number of times the pair was extracted from the training dataset as much as possible we refrain from interpolation PTm PTw→m Enriching A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages PTwm PTm

Lexicalized Translation Probabilities Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Lexicalized Translation Probabilities Use linear interpolation What if a phrase pair belongs to one PT only? Previous methods: interpolate with 0 Might cause some good phrases to be penalized Our method: induce all scores before interpolation  Use the lexical model of one PT to score phrase pairs for the other one problemSTM+ sSUF ||| vaikeuSTM+ ksiSUF+ sta problemSTM+ sSUF ||| ongelmaSTM+ staSUF problemSTM+ sSUF ||| ongelmaSTM+ tSUF PTwm PTm just because they were not extracted in both tables, which we want to revent Recall, our PTs are generated from the same source (problemSTM+ sSUF | ongelmaSTM+ tSUF) Morpheme Lexical Model (PTm) Word Lexical Model (PTw) (problemSTM+ sSUF | ongelmaSTM+ tSUF) (problems | ongelmat) Enriching A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages PTwm PTm

Dataset & Settings Dataset Standard phrase-based SMT settings: Past shared task WPT05 (en/fi) 714K sentence pairs Split into T1, T2, T3, and T4 of sizes 40K, 80K, 160K, and 320K Standard phrase-based SMT settings: Moses IBM Model 4 Case insensitive BLEU Experiments A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

SMT Baseline Systems w-system m-system BLEU m-BLEU T1 11.56 45.57 11.07 49.15 T2 12.95 48.63 12.68 53.78 T3 13.64 50.30 13.32 54.40 T4 14.20 50.85 13.57 54.70 Full 14.58 53.05 14.08 55.26 w-system: word level m-system: morpheme level m-BLEU: morpheme version of BLEU We have two baseline systems: one at the word and one at the morpheme levels. The results: w-system outperforms m-system, which is not very surprising , but a little bit more extreme. In previous work, morpheme level systems are competitive at small training data sizes, but here, for highly inflected languages, it struggles. We look at another metric, m-BLEU Either the m-system does not perform as well as the w-system or BLEU is not capable of measuring morpheme improvements. Experiments A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Morphological Enhancements: Individual Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Morphological Enhancements: Individual System T1 (40K) Full (714K) w-system 11.56 14.58 m-system 11.07 14.08 m+phr 11.44+0.37 14.43+0.35 m+tune 11.73+0.66 14.55+0.47 m+lm 11.58+0.51 14.53+0.45 phr: boundary-aware phrase extraction tune: MERT tuning for word BLEU lm: decoding with twin LMs individual allow longer morpheme phrases, as long as no more than 7 words MERT for morpheme translations while optimizing word BLEU 5-gram morpheme LM & 4-gram word LM The individual enhancements yield improvements for both small and large corpora. Experiments A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Morphological Enhancements: Combined Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Morphological Enhancements: Combined System T1 (40K) Full (714K) w-system 11.56 14.58 m-system 11.07 14.08 m+phr+lm 11.77+0.70 14.58+0.50 m+phr+lm+tune 11.90+0.83 14.39+0.31 phr: boundary-aware phrase extraction tune: MERT tuning for word BLEU lm: decoding with twin LMs combined The morphological enhancements are on par with the w-system and yield sizable improvements over the m-system. Experiments A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Translation Model Enrichment Alignment training 2) Boundary-aware Phrase extraction 1) Morphological segmentation 3) MERT tuning – word BLEU 4) Decoding with twin LMs PT scoring 5) Translation Model Enrichment Translation Model Enrichment Merging methods Full (714K) m-system 14.08 w-system 14.58 add-1 14.25+0.17 add-2 13.89-0.19 interpolation 14.63+0.55 ourMethod 14.82+0.74 add-1: one extra feature add-2: two extra features Interpolation: linear interpolation ourMethod Our method outperforms the w-system baseline. Experiments A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Results Significance BLEU m-system 14.08 w-system 14.58 ourSystem 14.82+0.74 Absolute improvement of 0.74 BLEU over the m-system, non-trivial relative improvement of 5.6% Outperformed the w-system by 0.24 points (1.56% relative) Statistically significant with p < 0.01 (Collins’ sign test) might look modest, but note that the baseline BLEU is only 14.08 from which our system evolved Analysis A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Translation Proximity Match Automatically extract phrase triples (src, out, ref ) use src as the pivot: translation output (out) & reference (ref) longest common subsequence ratio high character-level similarity between out & ref (economic and social, taloudellisia ja sosiaalisia, taloudellisten ja sosiaalisten) 16,262 triples: 6,758 match the references exactly the remaining triples were close wordforms Hypothesis: our approach yields translations close to the reference wordforms, but it is unjustly penalized by BLEU. Analysis A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Human Evaluation 4 native Finnish speakers 50 random test sentences follow WMT’09 evaluation: provided judges with the source sentence, its reference translation & outputs of (m-system, w-system, ourSystem) shown in random order asked for three pairwise judgments our vs. m our vs. w w vs. m Judge 1 25 18 19 12 21 Judge 2 24 16 15 14 Judge 3 27† 17 11 Judge 4 20 26† 22 Total 101‡ 66 81‡ 50 95† 70 Preferences are statistically significant as found by the sign test The judges consistently preferred: (1) ourSystem to the m-system, (2) ourSystem to the w-system, (3) w-system to the m-system. Analysis A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Sample Translations src: we were very constructive and we negotiated until the last minute of these talks in the hague . ref: olimme erittain rakentavia ja neuvottelimme haagissa viime hetkeen saakka . our: olemme olleet hyvin rakentavia ja olemme neuvotelleet viime hetkeen saakka naiden neuvottelujen haagissa . w : olemme olleet hyvin rakentavia ja olemme neuvotelleet viime tippaan niin naiden neuvottelujen haagissa . m : olimme erittain rakentavan ja neuvottelimme viime hetkeen saakka naiden neuvotteluiden haagissa . Rank: our > m ≥ w src: it would be a very dangerous situation if the europeans were to become logistically reliant on russia . ref: olisi erittäin vaarallinen tilanne , jos eurooppalaiset tulisivat logistisesti riippuvaisiksi venäjästä . our: olisi erittäin vaarallinen tilanne , jos eurooppalaiset tulee logistisesti riippuvaisia venäjän . w : se olisi erittäin vaarallinen tilanne , jos eurooppalaisten tulisi logistically riippuvaisia venäjän . m : se olisi hyvin vaarallinen tilanne , jos eurooppalaiset haluavat tulla logistisesti riippuvaisia venäjän . Rank: our > w ≥ m Match reference Wrong case Confusing meaning Match reference Wrong case OOV Change the meaning ourSystem consistently outperforms the m-system & w-system, and seems to blend well translations from both baselines. A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

Thank you! Conclusion Our approach: Future work: The basic unit of translation is the morpheme But word boundaries are respected at all MT stages Unsupervised method that works for large training bi-texts Future work: Extend the morpheme-level framework Incorporate morphological analysis directly into the translation process In the quest towards a morphology-aware SMT that only uses unannotated data, there are Thank you! A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages