Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.

Similar presentations


Presentation on theme: "Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November."— Presentation transcript:

1 Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November 22, 2006 Networking Session Multilinguism and Language Technology - a Challenge for Europe EuroMatrix

2 © 2006 EuroMatrix Consortium EuroMatrix NLP will be everywhere The vision: In a near future, we will - obtain up-to-date information on any subject - from any source, in any language - adapted to our personal interests and needs NLP will play an important role in the automatic acquisition of knowledge and its presentation via dialog and other user-centric modalities BUT:

3 © 2006 EuroMatrix Consortium EuroMatrix Will NLP be multilingual? Will the technology behind these scenarios be exclusively owned by US companies? Will it be seen possible and worthwhile to develop resources for „smaller“ languages? Will NLP technology hence be limited to a small number of „important“ languages or will it be useful for all Europeans? Will NLP help to alleviate, or could it even aggravate the „linguistic divide“?

4 © 2006 EuroMatrix Consortium EuroMatrix Motivation for EuroMatrix Pressing need for MT functionality in a European context 20 official EU languages, more to come  380..600 pairs After phase of stagnation, MT has gained new momentum, Europeans play important role in the revival of MT in the US Driven by DARPA and now Google, US approaches focus on MT into English, broad coverage, accept low quality, MT for assimilation Europe needs more target languages and better quality, MT for dissemination Europe needs integration of data-driven and rule-based approaches

5 © 2006 EuroMatrix Consortium EuroMatrix Language Pairs Covered Today Source: Hutchins 2005

6 © 2006 EuroMatrix Consortium EuroMatrix Language Pairs Covered Today Source: Hutchins 2005

7 © 2006 EuroMatrix Consortium EuroMatrix Hybrid architectures for MT For many language pairs, rule-based MT and SMT based on parallel corpora co-exist and show complementary strengths We can integrate these into hybrid, multi-engine systems and combine their advantages First step: shallow, black-box integration Integration on finer granularity promises better results, but also requires more work The goal is to obtain a tool box of useful modules that can be flexibly combined

8 © 2006 EuroMatrix Consortium EuroMatrix Hybrid MT Architecture I Source Text Target Text Rule-based MT engines Rule-based MT engines Rule-based MT engines Hypo- theses Hypo- theses Hypo- theses SMT- engine(s) Hypo- theses Hypo- theses Hypo- theses Selection Multi-engine MT via black-box integration

9 © 2006 EuroMatrix Consortium EuroMatrix Hybrid MT Architecture II Parallel Corpus Monolingual Corpus Phrase- Table nGram- Model Alignment, Phrase Extraction Counting, Smoothing SMT Decoder Source Text Target Text Rule-based MT engines Rule-based MT engines Rule-based MT engines Hypo- theses Hypo- theses Hypo- theses Dyn. PT SMT has the last word

10 © 2006 EuroMatrix Consortium EuroMatrix Hybrid MT Architecture III Rule-based MT engine Source Text Target Text Parallel Corpus Phrase- Table Alignment, Phrase Extraction Linguistic Processing, Validation MT Lexicon SMT feeds rule-based MT

11 © 2006 EuroMatrix Consortium EuroMatrix Hybrid MT Architectures IV … As layed out by Philipp Koehn, rule-based modules dealing with morphology and syntax can be integrated into the statistical framework We can apply these approaches to practical tasks and learn from the comparison which way to go The best approach may depend on the specific properties of the task at hand wrt. languages, domain, text type, …

12 © 2006 EuroMatrix Consortium EuroMatrix Refined Evaluation Methods The evaluation dilemma: Manual evaluation is meaningful, but expensive, tedious, and error-prone, not useful for regression testing Automatic evaluation is repeatable, objective, but not necessarily relevant; better systems may have worse BLEU scores We need to lower the effort for manual evaluation, increase the quality of automatic evaluation, or do both

13 © 2006 EuroMatrix Consortium EuroMatrix Steps Towards Better MT Evaluation Koehn/Monz 2006 distributed the burden of manual evaluation over the participants in the shared MT task We need to find integrated (semi-automatic) evaluation approaches, where limited manual effort is maximally exploited, e.g. by doing evaluation on a finer (phrase-level) granularity Integration of linguistic knowledge into evaluation process is helpful, but requires more research Combine the MT evaluation campaign with an evaluation of evaluation methods

14 © 2006 EuroMatrix Consortium EuroMatrix Open Questions Can we integrate feed back from end users about quality into the development cycle? Will MT be useful for professional translators? Is the effective US approach of „friendly competition“ applicable to tasks that involve 600 language pairs? Is it compatible with the need to share linguistic resources and modules? Can the development of open source resources be funded? How could MT efforts in FP7 be scaled up to match the importance of the topic?

15 © 2006 EuroMatrix Consortium EuroMatrix Thank You for Your Attention © 2006 EuroMatrix Consortium EuroMatrix


Download ppt "Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November."

Similar presentations


Ads by Google