Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Example-based Machine Translation using Structural Translation Examples The title of my talk is “Example-based Machine Translation using Structural Translation Examples” I am Eiji Aramaki from University of Tokyo. As shown in the title, We are developing an example-based machine translation system using structural-translation-examples. Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Proposed System First, I will show you a short demonstration of our system. (%%demo (sel 英語の新聞を下さい -best1)

Proposed System Parses an Input Sentence Selects Combines them
Structural Translation Examples Combines them to generate an output tree First the system parses an input (And) for each part of the input, the system selects Structural translation examples which are bilingual sub-tree pairs. Then, Our system combines them to generate an output dependency tree. Finally the system decide the word-order and output a translation. Decides the word-order

Structural Translation Examples
pos 日本語の (Japanese) Give me obj 新聞を (news paper) Japanese 下さい (give) newspaper The Advantage of High-Usability BUT: It requires many technologies Parsing & Tree Alignment (are still being developed) → A naive method without such technologies may be efficient in a limited domain The point of our system is handling structural-translation-example. A Structural-translation-example has the potential advantage of high-usability, However, building such translation-examples requires many-technologies, for example, parsing-technology and tree-alignment-technology and so on, which are still being developed. So a naive method without such technologies may be efficient in a limited domain. In such a situation, we believe that the comparison of our system and the other approach systems is meaningful.

Outline Algorithm Experimental Results Conclusion Alignment Module
Translation Module Experimental Results Conclusion This is outline of this talk. First, I will explain our system Algorithm. Then, I will reports about the experimental results. Finally, I will conclude my talk.

System Frame Work Alignment module Translation module
Input Bilingual Corpus Translation Memory Alignment module Translation module Output Alignment module Builds Translation Examples from Bilingual Corpus Our system consists of 2 modules. The Alignment module and Translation module. The Alignment-module builds translation example from bilingual corpus. The Translation-module (which I‘ve already showed), selects translation examples and combine them into a translation. Translation module Selects Translation Examples Combines them into a Translation

Alignment Module （1/2） A sentence pair is analyzed by parsers [Kurohashi1994][Charniak2000] Correspondences are estimated by Dictionary-based Alignment method [Aramaki 2001] pos 日本語の (Japanese) First, Alignment module analyzes bilingual sentence pairs using these parsers. The Japanese parser outputs a phrasal dependency structure. We use it as it is. The English parser outputs phrase structures. We convert it into a dependency structure by using rules which decide on a head word in a phrase. Then, the system estimates correspondences using translation-dictionaries. We used 4 dictionaries which have about two million entries in total. Give (S (VP (VBP Give) (NP (PRP me)) (NP (DT a) (JJ Japanese) (NN newspaper.)))) me obj 新聞を (news paper) Japanese 下さい (give) newspaper

Alignment Module （2/2） Translation example
= A combinations of correspondences which are connected to each other With Surrounding phrases (= the parent and children phrases of correspondences) for Selection of Translation Examples Then, the system generates all combinations of correspondences which are connected to each other. For example, these 6 combinations are generated from the previous sentence-pair We regard such a combination of correspondences as a translation example. In this operation, we also preserve its surrounding phrases which are parent and children of correspondences The Surrounding phrases are used for selection of translation examples

System Frame Work Alignment module Translation module
Input Bilingual Corpus Translation Memory Alignment module Translation module Output Alignment module Builds Translation Examples from Bilingual Corpus Then, I will explain about the Translation module. Translation module Selects of Translation Examples Combines them into a Translation

Translation Module(1/2)
INPUT TRANSLATION EXAMPE pos 中国語の (Chinese) Give me obj obj 新聞を (news paper) 新聞を (news paper) 下さい (give) 下さい (give) newspaper Equality : The number of equal phrases First, an input sentence is analyzed by the parser. Then, for each phrase of the input. the system selects suitable translation examples by using 3 measures. Equality , Context similarity and Alignment confidence. Context Similarity: calculated with a Japanese thesaurus Alignment Confidence: the ratio of content words which can be found in dictionaries

INPUT TRANSLATION EXAMPE pos 中国語の (Chinese) Give me obj obj 新聞を (news paper) 新聞を (news paper) 下さい (give) 下さい (give) newspaper Equality : The number of equal phrases Equality is the number of phrases which are equal to the input. The system conducts this equal check basically in content words So, the differences of function-words are disregarded. Context Similarity: calculated with a Japanese thesaurus Alignment Confidence: the ratio of content words which can be found in dictionaries

Context = surrounding phrases
INPUT TRANSLATION EXAMPE pos 中国語の (Chinese) pos 日本語の (Japanese) Give me obj obj 新聞を (news paper) 新聞を (news paper) Japanese 下さい (give) 下さい (give) newspaper Equality : The number of equal phrases A Context is also an important clue for word selection. As I‘ve mentioned before, we regard the context as the surrounding phrases of the equal part. The context similarity between the surrounding phrases is calculated by using a Japanese thesaurus Context Similarity: calculated with a Japanese thesaurus Alignment Confidence: the ratio of content words which can be found in dictionaries

INPUT TRANSLATION EXAMPE pos 中国語の (Chinese) pos 日本語の (Japanese) Give me obj obj 新聞を (news paper) 新聞を (news paper) Japanese 下さい (give) 下さい (give) newspaper Equality : The number of equal phrases We also take into account the alignment confidence. We define the alignment confidence as the ratio of content words which can be found in dictionaries Context Similarity: calculated with a Japanese thesaurus Alignment Confidence: the ratio of content words which can be found in dictionaries

Selection Score:= ( Equality + Similarity ) x (λ+ Confidence) Combine The dependency relations & the word order in the translation examples are preserved Give Give By using those measures, We define the score as shown in this formula. (And) the system adapts examples which have the highest scores. Then, the system combines them. In this operation, The dependency relations & the word order in the translation examples are preserved. The dependency relations & the word order between the translation examples are decided by heuristic rules me + = me Japanese Japanese newspaper newspaper The dependency relations & the word order between the translation examples are decided by heuristic rules

Exception: Shortcut Input Bilingual Corpus Translation Memory Alignment module Translation module Output If a Translation Example is almost equal to the input ⇒ the system outputs its target parts as it is. Almost equal = Character-based DP Matching Similarity > 90% Finally, I will explain some Exceptions. If a translation example is almost equal to the input, The system outputs its target parts as it is. Almost equal is Character-based DP matching similarity > 90%

Outline Algorism Experimental Results Conclusion Alignment Module
Translation Module Experimental Results Conclusion Then, I will reports about the experimental results.

Experiments We built Translation Examples from training-set
(only given in IWSLT) Auto. Eval. Result bleu nist wer per gtm Dev-set 0.38 7.86 0.52 0.45 0.66 Test-set 0.39 7.89 0.49 0.42 0.67 We built Translation Examples from training-set We used only a training-corpus given in the IWSLT, Then, we checked our system performance by development-set & test-set (As a result), Both score are similar it is because the system has no tuning metrics for the development-set. Dev-set & Test-set score are similar ← the system has no tuning metrics for the dev-set.

Corpus size & Performance
The system without a corpus can generate translations using only the translation dictionaries. The score is not saturated ⇒the system will achieve a higher performance if we obtain more corpora. Then, we investigated the relation between the corpus size and its performance. The result is shown in this Figure. X axis indicates corpus-size & Y-axis indicates bleu. The system without a corpus can generate translations using only the translation dictionaries. And, The score is not saturated at this max point. So, the system will achieve a higher performance if we obtain more corpora.

Subjective Evaluation
Subjective Evaluation Result Error Analysis Most of the errors are classified into the following three problems: (1) Function Words (2) Word Order (3) Zero-pronoun 5: "Flawless English" 4: "Good English" 3: "Non-native English" 2: "Disfluent English" 1: "Incomprehensible" Fluency 3.650 Adequacy 3.316 These are subjective evaluation results. Fluency 3.6 & Adequacy 3.3. It’s performance more than non-native English. So, the system may be more fluent than me. Why the output is not good English. Most of the errors are classified into the following 3 problems: Function Words, Word Order & zero-pronoun

Problem1: Function words
OUTPUT i 'd like to contact my Japanese embassy Translation Example I 'd like to contact my bank The system selects translation examples using mainly content words ⇒ it sometimes generates un-natural function words Determiners, prepositions Because the system selects translation examples using mainly content words, it sometimes generates un-natural function words, especially in determiners and prepositions. For example, the system generates the output ``i 'd like to contact my Japanese embassy'' using a translation example ``I 'd like to contact my bank'‘ We have to deal with such words more carefully.

Problem 3: Zero-pronoun
Problem 2: Word Order OUTPUT is there anything a like local cuisine? The word order between translation examples is decided by the heuristic rules. The lack of rules leads to the wrong word order. Problem 3: Zero-pronoun Second problem is the word-order. The word order between translation examples is decided by the heuristic rules. (So) The lack of rules leads to the wrong word order. %(A target language model may be helpful for this problem.) Third problem is zero-pronoun The input sentence sometimes includes zero-pronoun. So, there are many outputs without pronouns. In the future, we are planning to incorporate the zero-pronoun resolution technology. OUTPUT has a bad headache. The input includes zero-pronoun. ⇒ outputs without a pronoun.

Outline Algorism Experimental Results Conclusion Alignment Module
Translation Module Experimental Results Conclusion Finally, let me conclude the talk.

Conclusions We described an EBMT system which handles Structural translation examples The experimental results shows the basic feasibility of this approach In the future, as the amount of corpora increases, the system will achieve a higher performance In this research, we describe an EBMT system which handles structural translation examples. The experimental result shows the basic feasibility of this approach. In the future, as the amount of corpora increases, the system will achieve a higher performance.

Alignment Module （1/2）日本語の日本語の Give 新聞を新聞を Japanese Japanese 下さい
newspaper newspaper 日本語の日本語の Give Give me 新聞を新聞を Japanese Japanese 下さい下さい newspaper newspaper The National Institute for Japanese Language Give 日本語の Give me me 新聞を新聞を Japanese 下さい下さい newspaper newspaper

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Similar presentations

Presentation on theme: "Eiji Aramaki* Sadao Kurohashi* * University of Tokyo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Similar presentations

Presentation on theme: "Eiji Aramaki* Sadao Kurohashi* * University of Tokyo"— Presentation transcript:

Similar presentations

About project

Feedback