Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Research Faculty Summit 2008. Robert Moore Principal Researcher Microsoft Research.

Similar presentations


Presentation on theme: "Microsoft Research Faculty Summit 2008. Robert Moore Principal Researcher Microsoft Research."— Presentation transcript:

1 Microsoft Research Faculty Summit 2008

2 Robert Moore Principal Researcher Microsoft Research

3 1950s – mid 1960s MT based on ad-hoc, hand-written rules and lexicons 1966 ALPAC report kills MT research funding in U.S. for 20 years Late 1980s – early 1990s DARPA restarts US funding for MT research IBM speech recognition group lays foundation for statistical MT 2000s DARPA greatly expands funding for MT research (TIDES, GALE) Statistical MT becomes the hottest research topic in NLP

4 Sentence-align and word-align a parallel bilingual corpus Extract translation pairs of contiguous phrases Optimize weights of a linear model for scoring a source- string, target-string, phrase-alignment triple, combining Sum of logarithms of phrase-translation probabilities Logarithm of target language string probability Word reordering penalty Other features Translate a source language string by trying to find the corresponding target language string and phrase alignment that score the highest according to the linear model

5 Tendencies that can be leveraged in statistical alignment Lengths of sentences that are translations are highly correlated Words that are translations co-occur in sentence translation pairs far more often than chance N-to-M word or sentence translations are possible, but less probable the more N and M differ Word reordering is possible, but less probable the more extreme the reordering These tendencies can be reflected in the structure of statistical alignment models, whose parameters can be trained by machine learning methods

6 Exact search for the best translation is NP-hard, and even approximate search can be slow String-based reordering models are very weak, and work poorly for language pairs that have substantially different word orders Taking words as minimal units is problematic with heavily inflected languages, especially translation into such languages Variations in models that make some translations better usually make others worse; can we combine systems to get advantages of multiple approaches?

7 Translation speed – Moore and Quirk developed improved pruning methods for a widely-used beam search algorithm, producing up to order-of-magnitude speed-ups Word order – Quirk, Menezes, and Cherry developed a tree-to-string translation model that leverages a dependency parse of the source sentence to make more intelligent ordering decisions Heavily inflected languages – Toutanova et al. developed methods for explicitly modeling word inflection that significantly improve translation into such languages

8 Overall approach Take N-best translations from several systems Pick one translation as a “backbone” Word-align all other translations to the backbone Use alignments to determine a set of alternative words for each position in the combined output (in backbone order) Choose a word from each set of alternatives by weighted vote of system outputs, plus a statistical target language model Our main innovation He et al. developed a probabilistic alignment method that consistently out-performs previous edit-distance based methods

9 A combination of 8 systems from MSR Redmond MSR Asia SRI International Canadian National Research Council obtained the highest score in the NIST 2008 Open MT Evaluation on the constrained training track for Chinese-to- English (B LEU = 31.0 vs. 26.2 for best single system) Adding 7 more systems from ISI/LW and BBN substantially improved the system combination (B LEU = 34.9 vs. 29.9 for best single system)

10 Reference translation Relief officials said today that in total it had caused the deaths of 43 people 15-system combination Rescue officials said today that a total of 43 people were killed Single system 1 Rescue officials said today that killed 43 people Single system 2 Rescue officials said today that a total of 43 people were killed

11 Reference translation white house pushes for nuclear inspectors to be sent as soon as possible to monitor north korea's closure of its nuclear reactors 15-system combination the white house urges nuclear inspection and supervision north korea shut down the nuclear reaction as soon as possible to Single system 1 white house urges nuclear north korea as soon as possible to close the furnace Single system 2 the white house urged to send nuclear inspection to monitor north korea nuclear reactor shut down as soon as possible

12 Reference translation from june to september every year, strong winds and heavy rains brought by the monsoon often cause widespread flood disasters and even human casualties in india, a country with a population of 1.1 billion. 15-system combination from june to september every year, strong winds brought about by monsoon rains often lead to india which has a population of 1.1 billion, serious floods in the country and even causing casualties. Single system 1 every monsoon winds and rains often leads to india from june to september, causing serious floods in the country with a population of 1.1 billion, and even causing casualties. Single system 2 the flood swamped, causing casualties every year from june to september, strong winds and heavy rains brought by the monsoon often lead to india which has a population of 1.1 billion.

13 Continuing to work in most of the areas mentioned above Extending other approaches to syntax-based translation based on synchronous probabilistic context-free grammars, seeking Faster translation algorithms Better translation models Exploring “minimum Bayes risk” approaches to choosing the best translation Developing discriminative, syntax-based target language models for MT


Download ppt "Microsoft Research Faculty Summit 2008. Robert Moore Principal Researcher Microsoft Research."

Similar presentations


Ads by Google