Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011.

Slides:



Advertisements
Similar presentations
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE.
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
An Introduction of Support Vector Machine
Machine learning continued Image source:
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
An Introduction to Support Vector Machines Martin Law.
Machine Transliteration T BHARGAVA REDDY (Knowledge sharing)
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Transliteration Transliteration CS 626 course seminar by Purva Joshi Mugdha Bapat Aditya Joshi Manasi Bapat
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Describing Factorial Effects Kinds of means & kinds of effects Inspecting tables to describe factorial data patterns Inspecting line graphs to describe.
CSC Intro. to Computing Lecture 5: Boolean Logic, Gates, & Circuits.
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
How do you pronounce your name? Improving G2P with transliterations Aditya Bhargava and Grzegorz Kondrak University of Alberta ACL-HLT 2011.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
You Can’t Afford to be Late!
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
PREDICT 422: Practical Machine Learning
Support Feature Machine for DNA microarray data
Chapter 7. Classification and Prediction
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Negative Numbers and Subtraction
Sentiment analysis algorithms and applications: A survey
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Rule Induction for Classification Using
Relation Extraction CSCI-GA.2591
Conditional Random Fields for ASR
Joint Training for Pivot-based Neural Machine Translation
Learning to Rank Shubhra kanti karmaker (Santu)
CS 4/527: Artificial Intelligence
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Morphological Segmentation Inside-Out
CS 2750: Machine Learning Support Vector Machines
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
CSCI 5832 Natural Language Processing
Statistical Machine Translation Papers from COLING 2004
Machine Learning in Practice Lecture 17
ECE 352 Digital System Fundamentals
Machine learning overview
Using Uneven Margins SVM and Perceptron for IE
The quality of choices determines the quantity of Key words
English phonetic symbols (Consonant Symbols)
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Learning and Memorization
Software Re-engineering and Reverse Engineering
Introduction to Machine Learning
Presentation transcript:

Leveraging supplemental transcriptions and transliterations via re-ranking Aditya Bhargava April 19, 2011

Outline ● Introduction ● Previous work ● Approach description ● Experimental results & analysis ● Conclusion & future work

Introduction You are narrating an article about Eyjafjallajökull for The Economist. How do you pronounce it? エイヤフィヤトラヨークトル Эйяфьядлайёкюдль

Introduction ● Computers have the same problem – Speech synthesis requires automatic pronunciation ● But can apply the same solution – Lots of data on the Web that can easily be mined

Grapheme-to-phoneme conversion Graphemes Eyjafjallajökull Aditya Bhargava Phonemes [ ˈɛɪ ja ˌ fjatl ̥ a ˌ jœk ʰʏ tl ̥ ] /ə ˈ ditjə ˌ ba ɹˈɡ ævə/ ● Important for speech synthesis ● Refer to the phoneme outputs as transcriptions

Machine transliteration Source language Sudan Target language スーダン ดอส McGee ● “Phonetic translation” – Pronunciation preserved, not meaning ● Important for machine translation – Applied to named entities ● Inputs and outputs are graphemes DOS Source language Sudan Макги

Idea: apply supplemental data ● G in English is ambiguous ● Is Gershwin pronounced with / ɡ / (Gertrude) or /d ͡ʒ / (Gerald)? – (or even some rarer sounds like / ʒ /) ● Transliterations can help! – ジョージ・ガーシュウィン – Гершвин ● Can similarly help machine transliteration ● And can similarly apply transcriptions

Idea: apply supplemental data ● But it's hard – Can't follow transliterations exactly (differing phonemic inventories) ● So we need to use some complex methods ● My approach: re-order existing systems' output lists (n-best lists)

Existing G2P systems ● Festival – Decision trees – Popular end-to-end speech synthesis ● Sequitur – Joint n-grams – G2P only ● DirecTL+ – Discriminative phrasal decoding – G2P only

Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) had a shared task on machine transliteration – Intuitive way is phoneme-based: generate pronunciation first – Best (general) systems were based on Sequitur and DirecTL+; both grapheme-based (direct grapheme-to- grapheme)

Existing machine transliteration systems ● 2009 and 2010 Named Entities Workshops (NEWS) shared task on machine transliteration – Best (general) systems were based on Sequitur and DirecTL+

Previous combination methods ● Combine different systems for same task – Re-order based on linear combination of system scores – Hand-tuned linear weights ● Triangulation for machine translation – Refer to a third language when translation data for a pair is scarce ● Post-conversion – Convert a system's output post hoc

Outputs Supplements a b... My approach: abstract description Input s t1t1 t2t2... t n

Tasks ● Four cases 1. Improving G2P with transliterations 2. Improving G2P with transcriptions (from another corpus) 3. Improving machine transliteration with transliterations from other languages 4. Improving machine transliteration with transcriptions

Leveraging similarity ● Compare the supplemental data to the outputs – Choose the most similar one – Smarter approach: linearly combine similarity with system score ● How do we measure similarity? – M2M-Aligner ● Unsupervised ● Script-agnostic

Specific example

Leveraging similarity ● But this simple method only allows one supplemental datum at a time – Multiple data are possible but hand-tuning the linear combination parameters becomes complicated ● And we can't use other types of information

SVM re-ranking ● Support Vector Machines: binary classification – Maximum margin ● Applied to re-ranking – Pairwise comparison ● Allow many features

SVM re-ranking features ● Score features – Derived from M2M-Aligner scores between outputs and supplemental data – Applied to each supplemental datum and each system output ● n-gram features based on DirecTL+ features – Binary features that indicate n-gram presence – Key point: the same features are applied across the supplemental data

Improving G2P with transliterations ● Scenario: need to pronounce a new name ● Use transliterations of the name to help ● Realistic – Names can be hard – Transliterations are plentiful on the Web, and are easier to mine than pronunciations ● G2P data come from Combilex ● Transliteration data come from NEWS 2009, 2010 (nine languages)

Improving G2P with transliterations

Improving G2P with transliterations: names only

Improving G2P with transliterations: core vocabulary only

Improving G2P with transcriptions from another corpus ● This scenario relies less on Web data – Transcriptions are harder to mine – And require specialized knowledge ● We have two (or more) G2P corpora ● Use one to improve the other ● Two simple methods: – Merge the corpora – Train the system to convert from one corpus to the other

Improving G2P with transcriptions from another corpus ● Use CELEX as main corpus ● Combilex as supplemental

Improving G2P with transcriptions from another corpus

Improving machine transliteration with other-language transliterations ● Like the G2P case, we can turn to the Web for transliterations ● We want to transliterate to one language but have data from other languages available ● I use English-to-Hindi transliteration with the remaining eight languages as supplements

Improving machine transliteration with other-language transliterations

Improving machine transliteration with transcriptions ● We are tasked with transliterating but also have G2P corpora available ● I use English-to-Japanese transliteration with CELEX and Combilex – (Japanese had larger overlap)

Improving machine transliteration with other-language transliterations

Improving machine transliteration with transcriptions ● Intuitive approach: transliterate from transcriptions directly – Phoneme-based approach – Learn a phoneme-to-Japanese converter

Improving machine transliteration with other-language transliterations

Analysis ● Overall, see improvements across the board – And always better than alternatives ● Festival and Sequitur get higher improvement – The better the base system, the harder it is to re-rank ● Festival is low-performing ● Sometimes Sequitur has higher oracle accuracy – n-gram features styled after DirecTL+ ● But score features usually help, which aren't related to DirecTL+ features

Analysis ● Hard to draw conclusions from Festival and Sequitur – Since we're giving them DirecTL+-style information ● DirecTL+ shows no significant improvement for G2P of core vocabulary with transliterations – So we can conclude that supplemental transliterations are only useful for names

Analysis ● n-gram features more useful overall than scores – n-grams are more granular – Weights can be learned for individual character groups ● Some n-grams are more useful than others ● Some may be explicitly detrimental! – Scores are global indicators; just one number ● But still helpful, as results show

Future work ● Supplemental models rather than data ● Applying supplemental information directly into a model ● Web transcriptions – Both amateur (IPA on Wikipedia) and really amateur (ad hoc transcriptions, e.g. Trans-SKRIP-shuns) – Noisy, but transliteration data were noisy too

Conclusion ● First use of disparate tasks and data ● Improvements with SVMs using similar features on supplemental data – Suggests similar possibilities for other tasks