Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011.

Similar presentations


Presentation on theme: "Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011."— Presentation transcript:

1 Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011

2 Outline ● Introduction ● Previous work ● Approach description ● Experimental results & analysis ● Conclusion & future work

3 Introduction You are narrating an article about Eyjafjallajökull for The Economist. How do you pronounce it? エイヤフィヤトラヨークトル Эйяфьядлайёкюдль

4 Introduction ● Computers have the same problem – Speech synthesis requires automatic pronunciation ● But can apply the same solution – Lots of data on the Web that can easily be mined

5 Grapheme-to-phoneme conversion Graphemes Eyjafjallajökull Aditya Bhargava Phonemes [ ˈɛɪ ja ˌ fjatl ̥ a ˌ jœk ʰʏ tl ̥ ] /ə ˈ ditjə ˌ ba ɹˈɡ ævə/ ● Important for speech synthesis ● Refer to the phoneme outputs as transcriptions

6 Machine transliteration Source language Sudan Target language スーダン ดอส McGee ● “Phonetic translation” – Pronunciation preserved, not meaning ● Important for machine translation – Applied to named entities ● Inputs and outputs are graphemes DOS Source language Sudan Макги

7 Idea: apply supplemental data ● G in English is ambiguous ● Is Gershwin pronounced with / ɡ / (Gertrude) or /d ͡ʒ / (Gerald)? – (or even some rarer sounds like / ʒ /) ● Transliterations can help! – ジョージ・ガーシュウィン – Гершвин ● Can similarly help machine transliteration ● And can similarly apply transcriptions

8 Idea: apply supplemental data ● But it's hard – Can't follow transliterations exactly (differing phonemic inventories) ● So we need to use some complex methods ● My approach: re-order existing systems' output lists (n-best lists)

9 Existing G2P systems ● Festival – Decision trees – Popular end-to-end speech synthesis ● Sequitur – Joint n-grams – G2P only ● DirecTL+ – Discriminative phrasal decoding – G2P only

10 Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) had a shared task on machine transliteration – Intuitive way is phoneme-based: generate pronunciation first – Best (general) systems were based on Sequitur and DirecTL+; both grapheme-based (direct grapheme-to- grapheme)

11 Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) shared task on machine transliteration – Best (general) systems were based on Sequitur and DirecTL+

12 Previous combination methods ● Combine different systems for same task – Re-order based on linear combination of system scores – Hand-tuned linear weights ● Triangulation for machine translation – Refer to a third language when translation data for a pair is scarce ● Post-conversion – Convert a system's output post hoc

13 Outputs Supplements a b... My approach: abstract description Input s t1t1 t2t2... t n

14 Tasks ● Four cases 1. Improving G2P with transliterations 2. Improving G2P with transcriptions (from another corpus) 3. Improving machine transliteration with transliterations from other languages 4. Improving machine transliteration with transcriptions

15 Leveraging similarity ● Compare the supplemental data to the outputs – Choose the most similar one – Smarter approach: linearly combine similarity with system score ● How do we measure similarity? – M2M-Aligner ● Unsupervised ● Script-agnostic

16 Specific example

17 Leveraging similarity ● But this simple method only allows one supplemental datum at a time – Multiple data are possible but hand-tuning the linear combination parameters becomes complicated ● And we can't use other types of information

18 SVM re-ranking ● Support Vector Machines: binary classification – Maximum margin ● Applied to re-ranking – Pairwise comparison ● Allow many features

19 SVM re-ranking features ● Score features – Derived from M2M-Aligner scores between outputs and supplemental data – Applied to each supplemental datum and each system output ● n-gram features based on DirecTL+ features – Binary features that indicate n-gram presence – Key point: the same features are applied across the supplemental data

20 Improving G2P with transliterations ● Scenario: need to pronounce a new name ● Use transliterations of the name to help ● Realistic – Names can be hard – Transliterations are plentiful on the Web, and are easier to mine than pronunciations ● G2P data come from Combilex ● Transliteration data come from NEWS 2009, 2010 (nine languages)

21 Improving G2P with transliterations

22

23

24

25 Improving G2P with transliterations: names only

26 Improving G2P with transliterations: core vocabulary only

27 Improving G2P with transcriptions from another corpus ● This scenario relies less on Web data – Transcriptions are harder to mine – And require specialized knowledge ● We have two (or more) G2P corpora ● Use one to improve the other ● Two simple methods: – Merge the corpora – Train the system to convert from one corpus to the other

28 Improving G2P with transcriptions from another corpus ● Use CELEX as main corpus ● Combilex as supplemental

29 Improving G2P with transcriptions from another corpus

30

31

32 Improving machine transliteration with other-language transliterations ● Like the G2P case, we can turn to the Web for transliterations ● We want to transliterate to one language but have data from other languages available ● I use English-to-Hindi transliteration with the remaining eight languages as supplements

33 Improving machine transliteration with other-language transliterations

34 Improving machine transliteration with transcriptions ● We are tasked with transliterating but also have G2P corpora available ● I use English-to-Japanese transliteration with CELEX and Combilex – (Japanese had larger overlap)

35 Improving machine transliteration with other-language transliterations

36 Improving machine transliteration with transcriptions ● Intuitive approach: transliterate from transcriptions directly – Phoneme-based approach – Learn a phoneme-to-Japanese converter

37 Improving machine transliteration with other-language transliterations

38 Analysis ● Overall, see improvements across the board – And always better than alternatives ● Festival and Sequitur get higher improvement – The better the base system, the harder it is to re-rank ● Festival is low-performing ● Sometimes Sequitur has higher oracle accuracy – n-gram features styled after DirecTL+ ● But score features usually help, which aren't related to DirecTL+ features

39 Analysis ● Hard to draw conclusions from Festival and Sequitur – Since we're giving them DirecTL+-style information ● DirecTL+ shows no significant improvement for G2P of core vocabulary with transliterations – So we can conclude that supplemental transliterations are only useful for names

40 Analysis ● n-gram features more useful overall than scores – n-grams are more granular – Weights can be learned for individual character groups ● Some n-grams are more useful than others ● Some may be explicitly detrimental! – Scores are global indicators; just one number ● But still helpful, as results show

41 Future work ● Supplemental models rather than data ● Applying supplemental information directly into a model ● Web transcriptions – Both amateur (IPA on Wikipedia) and really amateur (ad hoc transcriptions, e.g. Trans-SKRIP-shuns) – Noisy, but transliteration data were noisy too

42 Conclusion ● First use of disparate tasks and data ● Improvements with SVMs using similar features on supplemental data – Suggests similar possibilities for other tasks


Download ppt "Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011."

Similar presentations


Ads by Google