Microsoft Research Faculty Summit 2008. Robert Moore Principal Researcher Microsoft Research.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Natural Language Processing Expectation Maximization.
Machine translation Context-based approach Lucia Otoyo.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Automatic Metric for MT Evaluation with Improved Correlations with Human Judgments.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin Merrill (Shyamsundar Jayaraman,
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
MEMT: Multi-Engine Machine Translation Faculty: Alon Lavie, Robert Frederking, Ralf Brown, Jaime Carbonell Students: Shyamsundar Jayaraman, Satanjeev Banerjee.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Multi-Engine Machine Translation
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Statistical Machine Translation Papers from COLING 2004
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
Neural Machine Translation
Presentation transcript:

Microsoft Research Faculty Summit 2008

Robert Moore Principal Researcher Microsoft Research

1950s – mid 1960s MT based on ad-hoc, hand-written rules and lexicons 1966 ALPAC report kills MT research funding in U.S. for 20 years Late 1980s – early 1990s DARPA restarts US funding for MT research IBM speech recognition group lays foundation for statistical MT 2000s DARPA greatly expands funding for MT research (TIDES, GALE) Statistical MT becomes the hottest research topic in NLP

Sentence-align and word-align a parallel bilingual corpus Extract translation pairs of contiguous phrases Optimize weights of a linear model for scoring a source- string, target-string, phrase-alignment triple, combining Sum of logarithms of phrase-translation probabilities Logarithm of target language string probability Word reordering penalty Other features Translate a source language string by trying to find the corresponding target language string and phrase alignment that score the highest according to the linear model

Tendencies that can be leveraged in statistical alignment Lengths of sentences that are translations are highly correlated Words that are translations co-occur in sentence translation pairs far more often than chance N-to-M word or sentence translations are possible, but less probable the more N and M differ Word reordering is possible, but less probable the more extreme the reordering These tendencies can be reflected in the structure of statistical alignment models, whose parameters can be trained by machine learning methods

Exact search for the best translation is NP-hard, and even approximate search can be slow String-based reordering models are very weak, and work poorly for language pairs that have substantially different word orders Taking words as minimal units is problematic with heavily inflected languages, especially translation into such languages Variations in models that make some translations better usually make others worse; can we combine systems to get advantages of multiple approaches?

Translation speed – Moore and Quirk developed improved pruning methods for a widely-used beam search algorithm, producing up to order-of-magnitude speed-ups Word order – Quirk, Menezes, and Cherry developed a tree-to-string translation model that leverages a dependency parse of the source sentence to make more intelligent ordering decisions Heavily inflected languages – Toutanova et al. developed methods for explicitly modeling word inflection that significantly improve translation into such languages

Overall approach Take N-best translations from several systems Pick one translation as a “backbone” Word-align all other translations to the backbone Use alignments to determine a set of alternative words for each position in the combined output (in backbone order) Choose a word from each set of alternatives by weighted vote of system outputs, plus a statistical target language model Our main innovation He et al. developed a probabilistic alignment method that consistently out-performs previous edit-distance based methods

A combination of 8 systems from MSR Redmond MSR Asia SRI International Canadian National Research Council obtained the highest score in the NIST 2008 Open MT Evaluation on the constrained training track for Chinese-to- English (B LEU = 31.0 vs for best single system) Adding 7 more systems from ISI/LW and BBN substantially improved the system combination (B LEU = 34.9 vs for best single system)

Reference translation Relief officials said today that in total it had caused the deaths of 43 people 15-system combination Rescue officials said today that a total of 43 people were killed Single system 1 Rescue officials said today that killed 43 people Single system 2 Rescue officials said today that a total of 43 people were killed

Reference translation white house pushes for nuclear inspectors to be sent as soon as possible to monitor north korea's closure of its nuclear reactors 15-system combination the white house urges nuclear inspection and supervision north korea shut down the nuclear reaction as soon as possible to Single system 1 white house urges nuclear north korea as soon as possible to close the furnace Single system 2 the white house urged to send nuclear inspection to monitor north korea nuclear reactor shut down as soon as possible

Reference translation from june to september every year, strong winds and heavy rains brought by the monsoon often cause widespread flood disasters and even human casualties in india, a country with a population of 1.1 billion. 15-system combination from june to september every year, strong winds brought about by monsoon rains often lead to india which has a population of 1.1 billion, serious floods in the country and even causing casualties. Single system 1 every monsoon winds and rains often leads to india from june to september, causing serious floods in the country with a population of 1.1 billion, and even causing casualties. Single system 2 the flood swamped, causing casualties every year from june to september, strong winds and heavy rains brought by the monsoon often lead to india which has a population of 1.1 billion.

Continuing to work in most of the areas mentioned above Extending other approaches to syntax-based translation based on synchronous probabilistic context-free grammars, seeking Faster translation algorithms Better translation models Exploring “minimum Bayes risk” approaches to choosing the best translation Developing discriminative, syntax-based target language models for MT