CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Word list entry: (spiser (V spise Pres)) Stem list entry: (spise (V Transitive (sense eat'))) Template list entries: (V ((sense) (trans relation))) (Pres((syntax.
Introducing COMPARA The Portuguese-English Parallel Corpus Ana Frankenberg-Garcia ISLA, Lisbon & Diana Santos SINTEF, Oslo.
Semantic Aspects of Translation Gary G. Hendrix. Semantic Aspects of Translation2 Contents  Introduction  Phase I: Semantic Composition  Phase II:
1 Words and the Lexicon September 10th 2009 Lecture #3.
Resource Acquisition for Syntax-based MT from Parsed Parallel data Alon Lavie, Alok Parlikar and Vamshi Ambati Language Technologies Institute Carnegie.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Direct Translation Approaches: Statistical Machine Translation
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Multi-Engine Machine Translation –MEMT service within the cross-GALE IOD.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
Statistical XFER: Hybrid Statistical Rule-based Machine Translation Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
AMTEXT: Extraction-based MT for Arabic Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Laura Kieras, Peter Jansen Informant: Loubna El Abadi.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Hebrew-to-English XFER MT Project - Update Alon Lavie June 2, 2004.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
MedKAT Medical Knowledge Analysis Tool December 2009.
A Trainable Transfer-based MT Approach for Languages with Limited Resources Alon Lavie Language Technologies Institute Carnegie Mellon University Joint.
Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
The CMU Mill-RADD Project: Recent Activities and Results Alon Lavie Language Technologies Institute Carnegie Mellon University.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
July 24, 2007GALE Update: Alon Lavie1 Statistical Transfer and MEMT Activities Chinese-to-English Statistical Transfer MT system (Stat-XFER) –Developed.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.
Eliciting a corpus of word- aligned phrases for MT Lori Levin, Alon Lavie, Erik Peterson Language Technologies Institute Carnegie Mellon University.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Natural Language Processing Vasile Rus
Multi-Engine Machine Translation
Approaches to Machine Translation
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Basic Parsing with Context Free Grammars Chapter 13
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Approaches to Machine Translation
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage “clean” bilingual lexicon + syntactic transfer rules (human written + extracted from data) XFER formalism is a Synchronous CFG + feature unification constraints Supports morphological analysis and generation as “plug in” components Two-stage translation process: –Build lattice of translation fragments at all levels “bottom-up” –Monotonic decoder selects best combination of lattice edges –Beam-search with multiple features at both stages –Features include: LM, fragmentation, length, …

Chinese-English S-XFER System Bilingual lexicon: over 1.1 million entries (multiple resources, incl. ADSO) Manual syntactic xfer grammar: 65 rules! (mostly NPs and reordering of NPs/PPs) Multiple overlapping Chinese word segmentations English morphology generation Uses CMU’s Suffix-Array LM toolkit for LM Current Performance (GALE dev-test): NW 14.04(B)/0.4825(M) UMD: 30.29(B) NG 7.92(B) UMD: 9.82(B) WL 5.40(B)/0.3022(M) UMD: 6.30(B) Integration: provides n-best lists (combination/rescoring) In Progress: –Additional features for decoding + MERT –Automatic extraction of “clean” NPs from parallel data –Automatic extraction of xfer-rules from parallel data

Chinese-English Example - Before THE SCIENTISTS IN ORDER TO Øü TO CLOSE IN THE EARLY PERIOD TO GO THE THE KNOWLEDGE THE THE DISEASE IN THE CHROMOSOME HAS BEEN COMPLETED IS SCHEDULED TO ORDER Overall: , Prob: , Rules: , Frag: 0.4, Length: , Words: 13,

Chinese-English Example - After SrcSent 0 ¿Æѧ¼ÒΪØü¹Ø³õÆÚʧÖÇÖ¢µÄȾɫÌåÍê³É¶¨Ðò 0 0 THE SCIENTISTS COMPLETED SEQUENCING FOR THE CHROMOSOMES WHICH RELATED TO THE INITIAL STAGE DEMENTIA Overall: , Prob: , Rules: , Frag: 0, Length: , Words: 8, < : ¿Æѧ¼Ò Ϊ Øü¹Ø ³õÆÚ Ê§ÖÇÖ¢ µÄ ȾɫÌå Íê³É ¶¨Ðò (S,1 (NP,1 (LITERAL 'THE') (NB,1 (N,21601 'SCIENTISTS'))) (VP,4 (VP,1 (V,7513 'COMPLETED')(NP,2 (NB,1 (N, 'SEQUENCING')))) (PP,1 (PREP,5 'FOR')(NPRC,1 (NP,1 (LITERAL 'THE') (NB,1 (N, 'CHROMOSOMES'))) (LITERAL 'WHICH') (VP,1 (V,18 'RELATED TO') (NPASSOC,5 (NP,1 (LITERAL 'THE') (NB,1 (N,7637 'INITIAL STAGE'))) (NP,2 (NB,1 (N,445 'DEMENTIA')))))))))>

MEMT – Main Activities Preserving Source Alignments: target phrases that originate from same source word can be marked as unbreakable units (performance effects under testing…) LM experiments using CMU’s Suffix-Array LM toolkit and new features (work still in progress…) Case Restoration: scheme for selecting the case of words in final MEMT output Improved tokenization and handling of punctuation Handling of varying number of MEMT input engines Upgrades to MEMT software infrastructure to support IOD-2 requirements, GTS 1.0 and UIMA v1.4 MEMT server is up 24/7 for ongoing IOD runs