Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012
Monolingual Semantic Text Alignment Input: two (semantically-comparable) text strings Output: Most likely word (or multi-word) alignment correspondence between the two strings Example: Applications in MT: Automated MT Evaluation Metrics (METEOR) Automated MT Error Analysis (METEOR) MT System Combination (Multi-Engine MT) Automated MT Post-Editing MT System Optimization March 29, 2012 Monolingual Text Alignment
Monolingual Semantic Text Alignment Monolingual Text Aligner originally developed as an integral component of our METEOR MT Evaluation Metric Input: Sentence-level MT-generated translation and a human-generated reference translation Goal: Assess and assign a quality score to the MT-generated translation based on how similar (in meaning) it is to the reference Address translation variability by explicitly identifying different types of semantically-equivalent correspondences: Exact matches Morphological variants Synonyms Paraphrases Explicit alignment provides information on differences in word-ordering and other characteristics of correspondence beyond short n-gram matches March 29, 2012 Monolingual Text Alignment
Monolingual Semantic Text Alignment Semantic-similarity Information Sources: Porter Stemmers (morphological variants) WordNet (synonyms) Paraphrase Tables (statistically-derived, filtered) Alignment Algorithm: Identify all matches of all types Bipartite Match: each word (or multi-word) can match at most once Single-pass approximate search for the maximal alignment with minimal crossing branches March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR METEOR = Metric for Evaluation of Translation with Explicit Ordering [Lavie and Denkowski, 2010] [Denkowski and Lavie, 2011] Main ideas: Combine Recall and Precision as weighted score components Look only at unigram Precision and Recall Align MT output with each reference individually and take score of best pairing Matching takes into account translation variability via word inflection variations, synonymy and paraphrasing matches Addresses fluency via a direct penalty for word order: how fragmented is the matching of the MT output with the reference? Parameters of metric components are tunable to maximize the score correlations with human judgments for each language METEOR has been shown to consistently outperform BLEU and other metrics in correlation with human judgments March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR Scoring March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR Scoring March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR Scoring March 29, 2012 Monolingual Text Alignment
METEOR Parameter Tuning METEOR has several “free” parameters that can be optimized to maximize correlation with different notions of human judgments Alpha, Beta and Gamma control overall metric behavior Tunable weights for different types of matches Latest version (1.3) distinguishes and tunes separate weights for content and function words Optimized for Adequacy, Fluency, A+F, Rankings, and Post-Editing effort for English on available development data Optimized independently for different target languages Limited number of parameters means that optimization can be done by full exhaustive search of the parameter space March 29, 2012 Monolingual Text Alignment
METEOR Parameter Tuning March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR Analysis Tools METEOR v1.3 comes with a suite of analysis and visualization tools called METEOR-XRAY March 29, 2012 Monolingual Text Alignment
Monolingual Text Alignment METEOR Analysis Tools March 29, 2012 Monolingual Text Alignment
MT System Combination Idea: apply several MT engines to each input in parallel and combine their output translations Goal: leverage the strengths and diversity of different MT engines to generate an improved translation system Particularly useful in assimilation scenarios where input is uncontrolled and diverse in domain, genre, style or other characteristics Can result in significant gains in translation quality 13
CMU’s Alignment-based Multi-Engine System Combination Works with any MT engines Assumes original MT systems are “black-boxes” – no internal information other than the translations themselves Explores broader search spaces than other MT system combination approaches using linguistically-based and statistical features Achieves state-of-the-art performance in competitive research evaluations of recent years Developed over last seven years under research funding from several government grants (DARPA, DoD and NSF) 14
Alignment-based MEMT Two Stage Approach: Example: Align: Identify and align equivalent words and phrases across the translations provided by the engines Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of government 15
Alignment-based MEMT Two Stage Approach: Example: Align: Identify and align equivalent words and phrases across the translations provided by the engines Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of government MEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees 16
The MEMT Decoder Algorithm Search-space of system combination hypotheses implicitly defined by the initial alignment stage, and partially explored Search-space is controlled by linguistic similarity features Algorithm builds collections of partial hypotheses of increasing length Partial hypotheses are extended by selecting the “next available” word from one of the original systems Extending a partial hypothesis with a word marks the word as “used” and marks its aligned words as also “used” Partial hypotheses are scored and ranked Pruning and re-combination for efficiency Hypothesis can end if any original system proposes an end of sentence as next word 17
Recent Performance Results NIST-2009 and WMT-2009 18
Recent Performance Results WMT-2011 19
Multi-Engine Human Translation Translation crowdsourcing using untrained bilingual labor is error-prone Idea: combine the output of multiple human translators into a consensus translation using MT system combination methods [Ambati 2011] March 29, 2012 Monolingual Text Alignment
Smoothing MERT in SMT [Cettolo, Bertoldi and Federico 2011] Interesting application of MT system combination to overcome instability of MERT optimization in SMT Perform MERT multiple times Use the CMU MEMT system to combine the different instances of the same MT system 21
MT Automated Post-Editing Idea: Target-language-side adaptation and refinement method Adapt and specialize the output of a single “baseline” cross-lingual MT system to specific data characteristics: genre, domain, dialect, etc. Particularly useful in situations where each specialization has only limited amounts of data available Approach: Train a secondary “translation” system from baseline MT system target-language output to the specialized target output Monolingual text alignment is a highly-effective method for aligning the monolingual “parallel” training data Used extensively by Safaba Translation Solutions for client-specific MT adaptation Goal: minimize the amount of human post-editing required on client-specific MT output March 29, 2012 Monolingual Text Alignment
Safaba Two-Stage MT Approach Baseline MT Target Language Text Client-Adapted Target Language Text Source Language Text Baseline MT Engine Automated Post-Editing (APE) Engine Safaba or Partner-developed (SMT or RBMT) Safaba-developed (Moses-based) Client + General Data Client Data
MT System Optimization Goals: Optimize MT systems to minimize important types of translation errors and maximize utility of generated MT (i.e. who did what to whom) Minimize the amount of human post-editing effort required for correcting the MT output Approach: Develop the capabilities to automatically identify and quantify different types of MT errors (important content words, Named Entities, etc.) Use the above capability to analyze in depth the error characteristics of different types of MT systems (phrase-based, Hierarchical, Syntax- based, etc.) at segment-level granularity Develop an automated evaluation metric that can be used as an effective optimization function in advanced MT tuning optimizers such as PRO and MIRA Existing metrics such as HTER and METEOR 1.3 are starting points Monolingual Semantic Text Alignment is a critical first step
Post-Editing Pilot Study Pilot study conducted with colleagues at Kent State IAL Goals: Develop post-editing analysis infrastructure and identify procedures and issues Conduct a preliminary human post-editing study and analyze post-editing behavior, issues and outcomes Pilot Setup: English-to-Spanish Two MT systems (Microsoft and Safaba) Five short texts (~300 words), computer software documentation domain 20 trials: 5 texts x 2 MT systems x 2 post-edit translations Seven bilingual translator post-editors each post-editing three MT docs Collected all post-editing output + timing information + complete keystroke logger information Expert translator generated segment-level human assessments on a 4-point scale [1= no edit; 2 = minor edit; 3 = major edit; 4 = complete rewrite] Analysis currently underway March 29, 2012 Monolingual Text Alignment
Questions