MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Parallel and Distributed IR
Scalable Text Mining with Sparse Generative Models
Overview of Search Engines
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Arthur Chan Prepared for Advanced MT Seminar
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Automatic Metric for MT Evaluation with Improved Correlations with Human Judgments.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
MEMT: Multi-Engine Machine Translation Machine Translation Alon Lavie February 19, 2007.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Multi-Engine Machine Translation
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Collection Fusion in Carrot2
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Statistical Machine Translation Part IV – Log-Linear Models
CMU Y2 Rosetta GnG Distillation
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Presentation transcript:

MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman, Satanjeev Banerjee)

October 26, 2005MEMT2 MEMT Goals and Approach Scientific Challenge: –How to combine the output of multiple MT engines into a synthetic output that outperforms the originals in translation quality –Synthetic combination of the output from the original systems, NOT just selecting the best system Engineering Challenge: –How to integrate multiple distributed translation engines and the MEMT combination engine in a common framework that supports ongoing development and evaluation

October 26, 2005MEMT3 Synthetic Combination MEMT Approach: –Original MT engines treated as “black boxes” – each provides a single “best” translation –Explicitly identify and align the words that are common between any pair of translations –Use the alignments as reinforcement and as indicators of possible locations for the words in the combined output –Each engine has a “confidence” that is used for the words that it contributes –Decoder searches for an optimal synthetic combination of words and phrases that optimizes a scoring function that combines the alignment confidence weights and a LM score

October 26, 2005MEMT4 The Word Alignment Matcher Developed by Satanjeev Banerjee as a component in our METEOR Automatic MT Evaluation metric Finds maximal alignment match with minimal “crossing branches” Allows alignment of: –Identical words –Morphological variants of words –Synonymous words (based on WordNet synsets) Implementation: Clever search algorithm for best match using pruning of sub-optimal sub- solutions

October 26, 2005MEMT5 Matcher Example the sri lanka prime minister criticizes the leader of the country President of Sri Lanka criticized by the country’s Prime Minister

October 26, 2005MEMT6 The MEMT Algorithm Algorithm builds collections of partial hypotheses of increasing length Partial hypotheses are extended by selecting the “next available” word from one of the original systems Sentences are initially assumed synchronous: –Each word is either aligned with another word or is an alternative of another word Extending a partial hypothesis with a word “pulls” and “uses” its aligned words with it, and marks its alternatives as “used” – “vectors” keep track of this Partial hypotheses are scored and ranked Pruning and re-combination Hypothesis can end if any original system proposes an end of sentence as next word

October 26, 2005MEMT7 Scoring MEMT Hypotheses Scoring: –Word confidence score [0,1] based on engine confidence and reinforcement from alignments of the words –LM score based on trigram LM –Log-linear combination: weighted sum of logs of confidence score and LM score –Select best scoring hypothesis based on: Total score (bias towards shorter hypotheses) Average score per word

October 26, 2005MEMT8 Additional Parameters Parameters: –“lingering word” horizon: how long is a word allowed to linger when words following it have already been used? –“lookahead” horizon: how far ahead can we look for an alternative for a word that is not aligned? –“POS matching”: limit search for an alternative to only words of the same POS

October 26, 2005MEMT9 Example IBM: victims russians are one man and his wife and abusing their eight year old daughter plus a ( 11 and 7 years ) man and his wife and driver, egyptian nationality. : ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman, wife of bus drivers Egyptian nationality. : MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls

October 26, 2005MEMT10 Current System Initial development tests performed on TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output Evaluation tests performed on Arabic- to-English EBMT Apptek and SYSTRAN system output and on three Chinese- to-English COTS systems

October 26, 2005MEMT11 Experimental Results: Arabic-to-English SystemMETEOR Score Apptek.4241 EBMT.4231 Systran.4405 Choosing best online translation.4432 MEMT.5185 Best hypothesis generated by MEMT.5883

October 26, 2005MEMT12 Experimental Results: Chinese-to-English SystemMETEOR Score Online Translator A.4917 Online Translator B.4859 Online Translator C.4910 Choosing best online translation.5381 MEMT.5301 Best hypothesis generated by MEMT.5840

October 26, 2005MEMT13 Demo

October 26, 2005MEMT14 Architecture and Engineering Challenge: How do we construct an effective architecture for running MEMT within large- scale distributed projects? –Example: GALE Project –Multiple MT engines running at different locations –Input may be text or output of speech recognizers, Output may go downstream to other applications (IE, Summarization, TDT) Approach: Using IBM’s UIMA: Unstructured Information Management Architecture –Provides support for building robust processing “workflows” with heterogeneous components –Components act as “annotators” at the character level within documents

October 26, 2005MEMT15 UIMA-based MEMT MT engines and MEMT engine are set up as distributed servers: –Communication over socket connections –Sentence-by-sentence translation Java “wrappers” convert these into UIMA-style annotator components UIMA-based “workflows” implement a variety of a- synchronous tasks, with results stored in a common Annotations Database (ADB) –Translation workflows –MEMT workflow –Evaluation/scoring workflow ADB and ADB Collection Reader/Consumer components developed at CMU by Eric Nyberg’s group

October 26, 2005MEMT16 UIMA-based MEMT Translation Workflow: –Retrieve document from ADB –“Annotate” document with translation annotator X –Write back new “annotation” into ADB

October 26, 2005MEMT17 UIMA-based MEMT MEMT Workflow: –Retrieve document translation annotations labeled by X, Y, Z from ADB –“Annotate” the document with a new MEMT annotation –Write back MEMT annotation into ADB

October 26, 2005MEMT18 Conclusions New sentence-level MEMT approach with promising performance Easy to run on both research and COTS systems UIMA-based architecture design for effective integration in large distributed systems/projects –Pilot study has been very positive –Can serve as a model for integration framework(s) under GALE

October 26, 2005MEMT19 Open Research Issues Main Open Research Issues: –Improvements to the underlying algorithm: better word alignments, “artificial” word alignments –Confidence scores at the sentence or word level –Decoding is still suboptimal Oracle scores show there is much room for improvement Need for additional discriminant features –Extend approach to Multi-Engine SR combination –Engineering issues: synchronization, human friendly interfaces with workflows

October 26, 2005MEMT20 References 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Companion Volume of Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching" 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching". In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT- 2005), Budapest, Hungary, May 2005.Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching"

October 26, 2005MEMT21