July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Multi-Engine Machine Translation
Approaches to Machine Translation
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Alon Lavie “Visionary Talk” LTI Faculty Retreat May 4, 2007
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Approaches to Machine Translation
Presentation transcript:

July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD –MEMT system combination for GNG evaluation within Rosetta consortium –Team: Greg Hanneman, Shilpa Aurora, Dave Svoboda, Alon Lavie and Eric Nyberg Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed over past year –Included in Rosetta Phase-II GNG evaluation –Team: Erik Peterson and Alon Lavie

July 24, 2007GALE Update: Alon Lavie2 CMU Statistical-XFER System Truly hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages (under NSF funding) Large-coverage “clean” bilingual lexicon + syntactic transfer rules (human written + extracted from data) XFER formalism is a Synchronous CFG + feature unification constraints Supports morphological analysis and generation as “plug in” components Two-stage translation process: –Build lattice of translation constituents at all grammar levels “bottom-up” –Monotonic decoder selects best combination of lattice edges –Beam-search with multiple features at both stages –Features include: LM, fragmentation, lexical probabilities, length, etc. –Optimized Log-linear combination of feature scores

July 24, 2007GALE Update: Alon Lavie3 Chinese-English S-XFER System Bilingual lexicon: over 1.1 million entries (multiple resources, incl. ADSO, Wikipedia, extracted base NPs) Manual syntactic XFER grammar: 76 rules! (mostly NPs, a few PPs, and reordering of NPs/PPs within VPs) Multiple overlapping Chinese word segmentations English morphology generation Uses CMU SMT-group’s Suffix-Array LM toolkit for LM Current Performance (GALE dev-test): –NW XFER: 10.89(B)/0.4509(M) Best (UMD): 15.58(B)/0.4769(M) –NG XFER: 8.92(B)/0.4229(M) Best (UMD): 12.96(B)/0.4455(M) In Progress: –Automatic extraction of “clean” base NPs from parallel data –Automatic learning and extraction of xfer-rules from parallel data

July 24, 2007GALE Update: Alon Lavie4 Recent Performance Analysis What fraction of the time does each MT system produce the best translation (sentence-by-sentence)? Evaluated on Chinese GALE dev-test (text) data BLEU METEOR CMU-PhraseSyntaxCombination 60 of 284 (21.1%) 41 of 284 (14.4%) IBM-smt 50 of 284 (17.6%) 49 of 284 (17.2%) IBM-ylee 64 of 284 (22.5%) 50 of 284 (17.6%) maryland-jhu-combination 71 of 284 (25.0%) 77 of 284 (27.1%) Stat-XFER 32 of 284 (11.2%) 56 of 284 (19.7%)

July 24, 2007GALE Update: Alon Lavie5 Translation Example REFERENCE: When responding to whether it is possible to extend Russian fleet's stationing deadline at the Crimean peninsula, Yanukovych replied, "Without a doubt. Stat-XFER (0.3989): In reply to whether the possibility to extend the Russian fleet stationed in Crimea Pen. left the deadline of the problem, Yanukovich replied : " of course. IBM-ylee (0.2203): In response to the possibility to extend the deadline for the presence in Crimea peninsula, the Queen Vic said : " of course. CMU-SMT (0.2067): In response to a possible extension of the fleet in the Crimean Peninsula stay on the issue, Yanukovych vetch replied : " of course. maryland-hiero (0.1878): In response to the possibility of extending the mandate of the Crimean peninsula in, replied: "of course. IBM-smt (0.1862):The answer is likely to be extended the Crimean peninsula of the presence of the problem, Yanukovych said: " Of course. CMU-syntax (0.1639): In response to the possibility of extension of the presence in the Crimean Peninsula, replied : " of course.

July 24, 2007GALE Update: Alon Lavie6 MEMT – Main Activities Preserving Source Alignments: target phrases that originate from same source word can be marked as unbreakable units (performance effects are currently under testing…) LM experiments using CMU’s Suffix-Array LM toolkit and new LM features (work in progress) Case Restoration: scheme for selecting the case of words in final MEMT output Handling of varying number of MEMT input engines Parameter Optimization for MEMT decoder Upgrades to MEMT software infrastructure to support IOD-3 requirements: UIMA v2.0 and ActiveMQ MEMT server is up 24/7 for ongoing IOD runs Active participation in phase-II GNG evaluation

July 24, 2007GALE Update: Alon Lavie7 Recent Evaluation Results

July 24, 2007GALE Update: Alon Lavie8 Future Plans and New Directions Classifiers for Hypothesis Selection –Simpler than MEMT, but perhaps more effective (given recent analysis results) Constrained Search-spaces for MEMT –Other groups use more constrained combination spaces with good results – can we also do better? Discriminative feature-rich LMs for MT (and MEMT) –Standard statistical LMs are not sufficiently discriminative for MT –New NSF grant (with Rebecca Hwa) to explore novel feature-rich “occurrence-based” models M-TER –Create a fully automatic metric that approximates H-TER using the stemming and synonymy capabilities of METEOR to create “targeted” references

July 24, 2007GALE Update: Alon Lavie9 Translation Example REFERENCE: I believe there is a detail that worth mentioning. Stat-XFER (0.5799):There is a detail, I believe worth together. maryland-hiero(0.3221):There is a one details, I think they deserve. IBM-ylee (0.2525):There is one of the details, I think it is worth. IBM-smt (0.2062):There is a details, I think is a way. CMU-SMT (0.2000):One of the details, I think it is worth to join in. CMU-syntax (0.1546):a detail, I think it is a cold front. maryland-jhu (0.1031):One to details, I think they deserve.

July 24, 2007GALE Update: Alon Lavie10 Translation Example REFERENCE: Xinhua News Agency, Zhuhai, Nov. 2 (reporters Wang Hongshan, Li Xuanliang) - Yang Liwei, Deputy Director of China Astronaut Research and Training Center and 'Space Hero', said in Zhuhai on the 1st that China has no plan to select female astronaut yet. Stat-XFER (0.6349): The Xinhua News Agency and the Zhuhai November seconds ( reporters Wang Hongshan, li announced good ) Chinese astronaut research and training center deputy director, " aerospace hero " Yang Liwei and the first said in zhuhai, china now not yet to select woman astronaut plan. IBM-ylee (0.5325): Zhuhai November 2 ( Xinhua News Agency, Chinese astronauts scientific training center, deputy director of the " space hero Yang Liwei said on July 1 in Zhuhai, China is still no plan and the flight, astronauts. IBM-smt (0.5194): Xinhua Zhuhai, November 2nd (Xinhua) (Reporter,) Chinese astronauts scientific training, deputy director of the Center, " Space hero ' yang Liwei, 1st in Zhuhai, said that China has no plan of selecting astronaut. maryland-hiero (0.5180): ZHUHAI, November 2 (Xinhua, ) deputy director of the Chinese astronauts scientific research and training center, "1" space hero Yang Liwei said in Zhuhai, China does not plan to make astronaut selection. CMU-combination (0.4701): zhuhai, november 2nd ( xinhua ) deputy director of china's astronaut training centre, " space hero yang liwei, 1st in zhuhai, china currently has no plans of the selected astronaut.

July 24, 2007GALE Update: Alon Lavie11 Chinese-English Example - Before THE SCIENTISTS IN ORDER TO Øü TO CLOSE IN THE EARLY PERIOD TO GO THE THE KNOWLEDGE THE THE DISEASE IN THE CHROMOSOME HAS BEEN COMPLETED IS SCHEDULED TO ORDER Overall: , Prob: , Rules: , Frag: 0.4, Length: , Words: 13,

July 24, 2007GALE Update: Alon Lavie12 Chinese-English Example - After THE SCIENTISTS COMPLETED SEQUENCING FOR THE CHROMOSOMES WHICH RELATED TO THE INITIAL STAGE DEMENTIA Overall: , Prob: , Rules: , Frag: 0, Length: , Words: 8, < : ¿Æѧ¼Ò Ϊ Øü¹Ø ³õÆÚ Ê§ÖÇÖ¢ µÄ ȾɫÌå Íê³É ¶¨Ðò (S,1 (NP,1 (LITERAL 'THE') (NB,1 (N,21601 'SCIENTISTS'))) (VP,4 (VP,1 (V,7513 'COMPLETED')(NP,2 (NB,1 (N, 'SEQUENCING')))) (PP,1 (PREP,5 'FOR')(NPRC,1 (NP,1 (LITERAL 'THE') (NB,1 (N, 'CHROMOSOMES'))) (LITERAL 'WHICH') (VP,1 (V,18 'RELATED TO') (NPASSOC,5 (NP,1 (LITERAL 'THE') (NB,1 (N,7637 'INITIAL STAGE'))) (NP,2 (NB,1 (N,445 'DEMENTIA')))))))))>

July 24, 2007GALE Update: Alon Lavie13 C/E Stat-XFER Lexicon Lexical sources and their sizes: –Named Entities from LDC lexical entries –Filtered base NPs from Parallel Corpus lexical entries –ADSO bilingual lexicon lexical entries –LDC word bilingual glossary lexical entries –Wikipedia extracted bilingual lexicon lexical entries –Phrases from Parallel Corpus lexical entries –Manual bilingual lexicon (high freq) 1149 lexical entries