Presentation is loading. Please wait.

Presentation is loading. Please wait.

Example-based Machine Translation Pursuing Fully Structural NLP

Similar presentations


Presentation on theme: "Example-based Machine Translation Pursuing Fully Structural NLP"— Presentation transcript:

1 Example-based Machine Translation Pursuing Fully Structural NLP
Kurohashi-lab M1 Toshiaki Nakazawa

2 Outline History of Machine Translation
Introduction of recent MT systems Statistic Machine Translation (SMT) Example-based Machine Translation (EBMT) Related work for EBMT Logical Form Efficient retrieval method EBMT pursuing fully structural NLP Conclusion

3 Outline History of Machine Translation
Introduction of recent MT systems Statistic Machine Translation (SMT) Example-based Machine Translation (EBMT) Related work for EBMT Logical Form Efficient retrieval method EBMT pursuing fully structural NLP Conclusion

4 History of Machine Translation
1940 1950 1960 1970 1980 1990 2000 2010 MT quality had been improving because of the development of NLP “Machine Translation based on analogy” is proposed [Nagao, 1981] MT quality didn’t improved despite spending much money When I look at an article in Russian, I say: "This is really written in English, but is has been coded in some strange symbols. I will now proceed to decode." [Warren Weaver, 1947] Not enough quality yet… Doldrums of MT Beginning of Machine Translation “Mu project” started SMT had been becoming active [Brown et al., 1993]

5 Outline History of Machine Translation
Introduction of recent MT systems Statistic Machine Translation (SMT) Example-based Machine Translation (EBMT) Related work for EBMT Logical Form Efficient retrieval method EBMT pursuing fully structural NLP Conclusion

6 Statistical Machine Translation (SMT)
田植えフェスティバル石川県輪島市で外国の大使や一般の参加者など千人あまりが急な斜面の棚田で田植えを体験する催しが行われました。 輪島市白米町には(しろよねまち)千枚田と呼ばれる(せんまいだ)大小二千百枚の棚田が急な斜面から海に向かって拡がっています。 田植え体験は農作業を通して米作りの意義などを考えていこうという地球環境平和財団の呼び掛けで開かれたもので、海外三十四ヵ国の大使や書記官、それに一般の参加者ら合わせておよそ千人が集まりました。 田植えに使われた苗は去年の秋、天皇陛下が皇居で収穫された稲籾から育てたものです。 参加者たちは裸足になって水田に足を踏み入れ地元に伝わる田植え歌に合わせて慣れない手つきで苗を植えていました。 きょうの輪島市は雲が広がったもののまずまずの天気となり、出席された高円宮さまも海からの風に吹かれながら田植えに加わっていました。 地球環境平和財団では今年の夏休みに全国の子どもたちを対象に草刈りや生きものの観察会を開く他、秋には稲刈体験を行なう予定にしています。 Ambassadors and diplomats from 37 countries took part in a rice planting festival on Sunday in small paddies on steep hillsides in Wajima, central Japan. About one-thousand people gathered at the hill, where some two-thousand 100 miniature paddies, called Senmaida, stretch toward the Sea of Japan. The event was organized by the private Foundation for Global Peace and Environment. The rice seedlings are grown from grain harvested by the Emperor at the Imperial Palace in Tokyo last autumn. Barefoot participants waded into the paddies to plant the seedlings by hand while singing a local folk song about the practice of rice planting. Parallel Corpus Learn models for translation from parallel corpus statistically Not use any linguistic resources Small translation unit (= “word”) Require large parallel corpus for highly-accurate translation

7 Learn from a parallel corpus
Basic Method for SMT Translate by maximizing the probability: Language Model Translation Model Learn from a parallel corpus

8 Translation Model IBM Model 4 [Brown et al., 93] Fertility model:
(J = Japanese, E = English) Fertility model: # of J words which each E word generates NULL generation model: Model for generating NULL to justify the # of words Lexicon model: Probability of translation from one E word to one J word Distortion model: Model for word order

9 Translation Model IBM Model 4 [Brown et al., 93] × =
Probability of translation from one E word to one J word Model for word order × Translation Model # of Japanese words which each English word generates Model for generating NULL to justify the # of words

10 Advanced NLP technologies
Overview of EBMT 交差 点 で 、 at the intersection Input TMDB Parallel Corpus Alignment Translation Output Advanced NLP technologies

11 Example-based Machine Translation (EBMT)
Divide the input sentence into a few parts Find similar expressions (= examples, TMs) from parallel corpus for each part Combine the examples to generate output translation Use any linguistic resources as much as possible Larger translation unit (larger example) is better

12 Flow of EBMT

13 Furthermore... Translation algorithm is implicit in EBMT
→ Probabilistic Model for EBMT [Aramaki et al., 05] Recently, the number of studies handling bigger unit is increasing Difference between SMT and EBMT is becoming smaller Most active study = Phrase-based SMT SMT and EBMT will be merged (?)

14 Outline History of Machine Translation
Introduction of recent MT systems Statistic Machine Translation (SMT) Example-based Machine Translation (EBMT) Related work for EBMT Logical Form Efficient retrieval method EBMT pursuing fully structural NLP Conclusion

15 Alignment method using Logical Form
[Arul et al., 01] Logical Form Represent the relations among the content words of a sentence by unordered graph Spanish Nodes are content words Branches indicate underlying semantic relations Abstract language-particular aspects of a sentence Ex. word order, inflectional morphology, function words English Under Hyperlink Information, click the hyperlink address

16 Efficient Retrieval Method [Doi et al,. 04]
Similarity between input and examples is calculated by word-based Edit Distance Finding suitable examples from a large parallel corpus takes a long time Challenged to resolve this problem by Classifying sentences into groups according to the # of content words and function words Compressing all sentences in a group into “directed word graph” Searching best example in a group by A* algorithm

17 Detail of the Method Similarity between input and examples is calculated by word-based Edit Distance Need to reduce the # of calculations Every paths have the same # of content words and function words Each path represents one original sentence Find the best path by A* algorithm

18 Outline History of Machine Translation
Introduction of recent MT systems Statistic Machine Translation (SMT) Example-based Machine Translation (EBMT) Related work for EBMT Logical Form Efficient retrieval method EBMT pursuing fully structural NLP Conclusion

19 Why EBMT? Pursuing structural NLP Adequacy of problem settings
Improvement of basic analyses leads to improvement of MT as an application of basic analyses Feedback from application (MT) can be expected Adequacy of problem settings Not a large corpus, but similar examples in relatively close domain Ex. Translation of -> version up of instruction manual related patent document ...

20 Advanced NLP technologies
Overview of EBMT Input TMDB Parallel Corpus Alignment Translation EBMT Output Advanced NLP technologies

21 Alignment Japanese:交差点で、突然あの車が飛び出して来たのです。
English:The car came at me from the side at the intersection. 交差 点 で 、 突然 あの 車 が 飛び出して 来た のです 。 the car came at me from the side at the intersection Transform into dependency structure Word-based alignment using bilingual lexicon Extend the correspondence of phrases Extract Translation Examples

22 Translation Translation Examples 交差点に入る時 私の信号は青でした。 Input Output
came at me from the side at the intersection 私 の サイン 家 に 入る 脱ぐ 交差 点 で 、 突然 飛び出して 来た のです 。 信号 は でした 。 my signature traffic The light was green to remove when entering a house Translation Examples (suddenly) (rush out) (house) (put off) (signal) (enter) (when) (cross) (point) (my) (blue) (was) 交差点に入る時 私の信号は青でした。 my traffic The light was green when entering the intersection Input 交差 点 に 入る 私 の 信号 は でした 。 (cross) (point) (enter) (when) (my) (signal) (blue) (was) Language Model Output My traffic light was green when entering the intersection.

23 IWSLT2005 IWSLT Outline of campaign
International Workshop on Spoken Language Translation Aiming at translation of ASR (Automatic Speech Recognition) Outline of campaign Training set: parallel corpus including 20K sentences Development set: two sets including 500 and 506 sentences Test set: manual transcription and ASR output (500 sentences each)

24 Manual Transcription (Supplied & Tools)
Evaluation Results Manual Transcription (Supplied & Tools) Name BLUE ATR-C3 0.4774 MICROSOFT 0.4057 ATR-SLR 0.3884 TUV 0.3718 NGKUT 0.3418 USC 0.2741 Name NIST ATR-C3 8.1720 MICROSOFT 8.0375 TUV 7.8472 NGKUT 7.7158 ATR-SLR 4.3928 USC 2.9648

25 Conclusion In this presentation … Future work
History of Machine Translation SMT and EBMT Two related work for EBMT Introduction of our EBMT system Future work Improve our EBMT system Resolve paraphrase problem Apply anaphora resolution


Download ppt "Example-based Machine Translation Pursuing Fully Structural NLP"

Similar presentations


Ads by Google