Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011.

Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011

Outline ● Introduction ● Previous work ● Approach description ● Experimental results & analysis ● Conclusion & future work

Introduction You are narrating an article about Eyjafjallajökull for The Economist. How do you pronounce it? エイヤフィヤトラヨークトル Эйяфьядлайёкюдль

Introduction ● Computers have the same problem – Speech synthesis requires automatic pronunciation ● But can apply the same solution – Lots of data on the Web that can easily be mined

Grapheme-to-phoneme conversion Graphemes Eyjafjallajökull Aditya Bhargava Phonemes [ ˈɛɪ ja ˌ fjatl ̥ a ˌ jœk ʰʏ tl ̥ ] /ə ˈ ditjə ˌ ba ɹˈɡ ævə/ ● Important for speech synthesis ● Refer to the phoneme outputs as transcriptions

Machine transliteration Source language Sudan Target language スーダン ดอส McGee ● “Phonetic translation” – Pronunciation preserved, not meaning ● Important for machine translation – Applied to named entities ● Inputs and outputs are graphemes DOS Source language Sudan Макги

Idea: apply supplemental data ● G in English is ambiguous ● Is Gershwin pronounced with / ɡ / (Gertrude) or /d ͡ʒ / (Gerald)? – (or even some rarer sounds like / ʒ /) ● Transliterations can help! – ジョージ・ガーシュウィン – Гершвин ● Can similarly help machine transliteration ● And can similarly apply transcriptions

Idea: apply supplemental data ● But it's hard – Can't follow transliterations exactly (differing phonemic inventories) ● So we need to use some complex methods ● My approach: re-order existing systems' output lists (n-best lists)

Existing G2P systems ● Festival – Decision trees – Popular end-to-end speech synthesis ● Sequitur – Joint n-grams – G2P only ● DirecTL+ – Discriminative phrasal decoding – G2P only

Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) had a shared task on machine transliteration – Intuitive way is phoneme-based: generate pronunciation first – Best (general) systems were based on Sequitur and DirecTL+; both grapheme-based (direct grapheme-to- grapheme)

Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) shared task on machine transliteration – Best (general) systems were based on Sequitur and DirecTL+

Previous combination methods ● Combine different systems for same task – Re-order based on linear combination of system scores – Hand-tuned linear weights ● Triangulation for machine translation – Refer to a third language when translation data for a pair is scarce ● Post-conversion – Convert a system's output post hoc

Outputs Supplements a b... My approach: abstract description Input s t1t1 t2t2... t n

Tasks ● Four cases 1. Improving G2P with transliterations 2. Improving G2P with transcriptions (from another corpus) 3. Improving machine transliteration with transliterations from other languages 4. Improving machine transliteration with transcriptions

Leveraging similarity ● Compare the supplemental data to the outputs – Choose the most similar one – Smarter approach: linearly combine similarity with system score ● How do we measure similarity? – M2M-Aligner ● Unsupervised ● Script-agnostic

Specific example

Leveraging similarity ● But this simple method only allows one supplemental datum at a time – Multiple data are possible but hand-tuning the linear combination parameters becomes complicated ● And we can't use other types of information

SVM re-ranking ● Support Vector Machines: binary classification – Maximum margin ● Applied to re-ranking – Pairwise comparison ● Allow many features

SVM re-ranking features ● Score features – Derived from M2M-Aligner scores between outputs and supplemental data – Applied to each supplemental datum and each system output ● n-gram features based on DirecTL+ features – Binary features that indicate n-gram presence – Key point: the same features are applied across the supplemental data

Improving G2P with transliterations ● Scenario: need to pronounce a new name ● Use transliterations of the name to help ● Realistic – Names can be hard – Transliterations are plentiful on the Web, and are easier to mine than pronunciations ● G2P data come from Combilex ● Transliteration data come from NEWS 2009, 2010 (nine languages)

Improving G2P with transliterations

Improving G2P with transliterations: names only

Improving G2P with transliterations: core vocabulary only

Improving G2P with transcriptions from another corpus ● This scenario relies less on Web data – Transcriptions are harder to mine – And require specialized knowledge ● We have two (or more) G2P corpora ● Use one to improve the other ● Two simple methods: – Merge the corpora – Train the system to convert from one corpus to the other

Improving G2P with transcriptions from another corpus ● Use CELEX as main corpus ● Combilex as supplemental

Improving G2P with transcriptions from another corpus

Improving machine transliteration with other-language transliterations ● Like the G2P case, we can turn to the Web for transliterations ● We want to transliterate to one language but have data from other languages available ● I use English-to-Hindi transliteration with the remaining eight languages as supplements

Improving machine transliteration with other-language transliterations

Improving machine transliteration with transcriptions ● We are tasked with transliterating but also have G2P corpora available ● I use English-to-Japanese transliteration with CELEX and Combilex – (Japanese had larger overlap)

Improving machine transliteration with transcriptions ● Intuitive approach: transliterate from transcriptions directly – Phoneme-based approach – Learn a phoneme-to-Japanese converter

Analysis ● Overall, see improvements across the board – And always better than alternatives ● Festival and Sequitur get higher improvement – The better the base system, the harder it is to re-rank ● Festival is low-performing ● Sometimes Sequitur has higher oracle accuracy – n-gram features styled after DirecTL+ ● But score features usually help, which aren't related to DirecTL+ features

Analysis ● Hard to draw conclusions from Festival and Sequitur – Since we're giving them DirecTL+-style information ● DirecTL+ shows no significant improvement for G2P of core vocabulary with transliterations – So we can conclude that supplemental transliterations are only useful for names

Analysis ● n-gram features more useful overall than scores – n-grams are more granular – Weights can be learned for individual character groups ● Some n-grams are more useful than others ● Some may be explicitly detrimental! – Scores are global indicators; just one number ● But still helpful, as results show

Future work ● Supplemental models rather than data ● Applying supplemental information directly into a model ● Web transcriptions – Both amateur (IPA on Wikipedia) and really amateur (ad hoc transcriptions, e.g. Trans-SKRIP-shuns) – Noisy, but transliteration data were noisy too

Conclusion ● First use of disparate tasks and data ● Improvements with SVMs using similar features on supplemental data – Suggests similar possibilities for other tasks

Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011.

Similar presentations

Presentation on theme: "Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011.

Similar presentations

Presentation on theme: "Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011."— Presentation transcript:

Similar presentations

About project

Feedback