TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006
English - Turkish MT The challenge Traditionally statistical MT research has focused on language pairs with rich resources Ambitious goal – Complete English-to-Turkish MT system on par with those on the Web (Google, Systran, etc.) Realistic goal – Outperform the general-purpose baseline The focus Address scarcity issues stemming from rich Turkish inflectional morphology The strategy Approximate a morphological analysis by exploiting certain aspects of Turkish morphology to get sub-lexical units Customize translation model building heuristics to deal correctly with these units
Baseline English to Turkish MT System Sentence Aligned English-Turkish GIZA++ (aligner) Word Aligned English-Turkish Phrase building heuristics Phrase translation table Turkish Corpus (training set) SRILM Turkish Language Model Pharaoh (decoder) English Sentences Turkish Translations Corpus: Approx. 22,000 aligned sentence pairs covering several genres
The Turkalator Way… Segmentation Turkish Text English Text Stem Alignment General word Alignment Phrase Extraction and Scoring Phrase Translation table Turkish Language Model Pharaoh (decoder)
Evaluation BaselineTurkalator 1Turkalator 2 Bleu Score Quantitative results Qualitative results Scarcity reduced greatly: many more Turkish words are now translated An example: English input: “ She thought it over.” Reference translation: “J ulia bunu iyice düşündü.” Baseline translation: “ Ba ş vuran dü ş ünce bu over.” Turkalator translation: “ Julia onun üzerinde dü ş ündü.”