Download presentation
Presentation is loading. Please wait.
Published byPeter Watkins Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun choi Hitoshi Isahara A machine transliteration model based on correspondence between graphemes and phonemes 2007.TALIP (ACM Transactions on Asian Language Information Processing)
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Previous work Method Experiments Conclusions Opinion
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Machine transliteration (MT) automatically convert in one language into phonetically equivalent ones in another language Such as from English to Korean, Japanese, or Chinese a special case of CLIR, it is useful for query translation … Graphemes-based Source G target G Phonemes-based Source G source P target G Hybrid linear interpolation dynamically handle source graphemes and phonemes data (English) deiteo (korean); deta (Jap.) data (English) deiteo (korean); deta (Jap.) [`det#]
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective Correspondence-based correspondence between source G and P dynamically handle source G and P based on the contexts an example: neomycin (G + P) data (English) deiteo (korean); deta (Jap.) [`det#] data (English) deiteo (korean); deta (Jap.) [`det#]
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Previous work- Grapheme-Based 1/4 G-based transliteration modes are classified into: statistical translation, decision trees, transliteration network, joint source channels board (/B AO R D/); b, oa, r, d are PUs 依音節切割 E i = epu i1, … epu in [1998, 1999] K i = kpu i1, … kpu in E=b:oar:d, b:oa:r:d, b:o:a:r:d, K=b:o:deu, b:o:reu:deu, b:o:a:reu:deu error PUs, incorrect PUs for each word, time
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Previous work- Grapheme-Based 2/4 Decision trees [2000; 2001] English grapheme to Korean grapheme conversion no consider the phonetic aspect of the transliteration
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Previous work- Grapheme-Based 3/4 network [2000] Each node is composed of more than one English grapheme and the corresponding Korean graphemes. Each arc represents a possible link between nodes. The optimal path is the highest total weight, Viterbi and tree-trellis algorithms ca ka ca ki
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Previous work- Grapheme-Based 4/4 Network [2003] EN: actinium Jap: a ku chi ni u mu
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Previous work- Phoneme-Based 1/3 source language word pronunciation target language Weighted finite-state transducers (WFSTs) sord sequence word to English sound English sound to Japanese sound Japanese sound to katakana katakana to OCR A basic framework for Phoneme-based 0.6
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Previous work- Phoneme-Based 2/3 Two-step procedure English PUs English phoneme, [statistical translation model] English phoneme Korean PUs, [EKSCRs standard conversion rule] Two problems: error propagation: English PU English phoneme usually error limitation EKSCRs
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Previous work- Phoneme-Based 3/3 decision trees Phoneme-based English Korean transliteration depend on a pronunciation dictionary
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Previous work- Hybrid Transliteration 1/1 Combined through linear interpolation 0.4 G-based + 0.6 P-based not consider the dependence between the source graphemes and phonemes in the combining process
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Summary G-based source grapheme target grapheme P-based source grapheme source phoneme target grapheme Correspondence-based minimize error caused by error propagation by using source grapheme corresponding to a source phoneme use dynamically source graphemes and source phoneme depending on context, produce effectively data (English) deiteo (korean); deta (Jap.) [`det#]
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based d
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-baed
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Producing Pronunciation The most relevant source phoneme of b, /B/ can be produced by means of the context, f s, f Stype, and f p at L1-L3, C0, and R1-R3.
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Producing Target Graphmemes
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Maximum Entropy Model 1/2/1/3
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Maximum Entropy Model 2/2/1/3
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Decision Tree 2/3
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. C-based Memory-Based Learning 3/3 k-nearest neighborhood algorithm
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 1/2 P-based G-based C-based
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 2/2
24
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Discuss
25
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion The author plans to apply the transliteration model to an English-to-Chinese transliteration.
26
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusion The author plans to apply the transliteration model to an English-to-Chinese transliteration.
27
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 27 Opinion Advantage Combine Grapheme and Phoneme Drawback lack dynamic alignment Application machine translation, CLIR, IR, NLP applications
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.