Download presentation
Presentation is loading. Please wait.
Published byMaximilian Eldred Modified over 10 years ago
1
Masaki Itagaki (Language Excellence) Takako Aikawa (Machine Translation Incubation at MSR) Microsoft
2
MSR-MT (Quirk, et al 2005) A statistical machine translation Training with bilingual contents from software, user guides, Web contents, etc. Used for localizing software and user contents Issues SMT may not use product-specific translations. Contact list Windows, Office etc : Windows Live
3
Do not apply dictionary data BEFORE MTs input sentence analysis This could diffuse treelet mapping (e.g. access information). Try to find a black box solution Do not touch MT engine itself: Customize mapping information by products is not realistic. The solution should work for ANY MT systems Correct translations in MT output.
4
[Source] Your contact list is empty. [Target] Step 1: Get a raw MT output [Source] Your contact list is empty. [Target] Step 2: Identity noun terms [Source] Your contact list is empty. [Target] Step 3: Find a match in the user dictionary [Dict] contact list = contact list = [Dict] Step 4: Swap the translation [Target]
5
How MT translates contact list? Contact list This is a contact list. My contact list already exists. (contact list)
6
Found 15 pattern sentences (or templates)that may generate most of the variations. TemplatesPatternsDescriptions SUBJ + VX existsA term as the subject of an intransitive verb. PREP_WITHwith XA term following a common preposition, with SUBJ+BEX is a word.A term as the subject of a copula. OBJ_VSelect X.A term as an object of a transitive verb. PARENTHESIS(X)A term in parenthesis.
7
How contact list could be translated? contact list Contact list This is contact list I have its contact lits Contact list damns With contact list Contact list exists. MT Candidates: etc Strip out all template text translations: e.g. This is, is a word, etc
8
[Source] Your contact list is empty. [Target] Step 1: Get a raw MT output [Source] Your contact list is empty. [Target] Step 2: Identity noun terms [Source] Your contact list is empty. [Target] Step 3: Find a match in the user dictionary [Dict] contact list = contact list = [Dict] Step 4: Swap the translation [Target]
9
Design A Dummy User Dictionary: 634 nouns Language Pairs: English->Japanese, Chinese, and Korean systems Test data: 500 sentences from a game product (for each sentence, (at least) one candidate(s) for DUMMY) 90.6%92%86%
10
Why not 100%?
11
Experiment Design Bleu, Edit Distance Three MT systems A real user dictionary with real entries (634 noun entries) Language Pair: English -> Japanese 500 sentences from a game domain (=same as those used for the coverage experiment)
12
MSR-MT Without Term SwapperWith Term Swapper Bleu12.4322.51 Edit-distance0.630.52 System A Without Term SwapperWith Term Swapper Bleu6.3913.80 Edit-distance0.660.6 System B Without Term SwapperWith Term Swapper Bleu5.9318.26 Edit-distance0.660.56
13
An automatic way to leverage our templates? Term Swapper for languages with rich inflections/agreement? Term Swapper for other types of lexical items (not just for nouns)?
15
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.