SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia www.kit.edu KIT – University of the State.

Slides:



Advertisements
Similar presentations
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Advertisements

Research & Development ICASSP' Analysis of Model Adaptation on Non-Native Speech for Multiple Accent Speech Recognition D. Jouvet & K. Bartkova France.
Language Model for Cyrillic Mongolian to Traditional Mongolian Conversion Feilong Bao, Guanglai Gao, Xueliang Yan, Hongwei Wang
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
“Improving Pronunciation Dictionary Coverage of Names by Modelling Spelling Variation” - Justin Fackrell and Wojciech Skut Presented by Han.
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Construction of phoneme-to-phoneme converters
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Non-native Speech Languages have different pronunciation spaces
Methodologies for improving the g2p conversion of Dutch names Henk van den Heuvel, Nanneke Konings (CLST, Radboud Universiteit Nijmegen) Jean-Pierre Martens.
On the Application of Artificial Intelligence Techniques to the Quality Improvement of Industrial Processes P. Georgilakis N. Hatziargyriou Schneider ElectricNational.
Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Defining and Measuring Variables Slides Prepared by Alison L. O’Malley Passer Chapter 4.
Introduction to Automatic Speech Recognition
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Subjective Sound Quality Assessment of Mobile Phones for Production Support Thorsten Drascher, Martin Schultes Workshop on Wideband Speech Quality in Terminals.
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Experiments and Results EMG-based speech.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Xiaolin Wang Andrew Finch Masao Utiyama Eiichiro Sumita
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
CRF &SVM in Medication Extraction
Metadata Extraction Progress Report 12/14/2006.
Experiments for the CL-SR task at CLEF 2006
Suggestions for Class Projects
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Automatic Speech Recognition: Conditional Random Fields for ASR
Neural Speech Synthesis with Transformer Network
Rohit Kumar *, Amit Kataria, Sanjeev Sofat
Natural Language to SQL(nl2sql)
Presentation transcript:

SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Tim Schlippe, Wolf Quaschningk, Tanja Schultz

215-May-2014 Outline 1.Motivation and Goals 2.Experimental Setup 1.Grapheme-to-phoneme converters 2.Data 3.Experiments and Results 1.Single grapheme-to-phoneme converters’ performance 2.Phoneme-level combination scheme 3.Adding web-driven grapheme-to-phoneme converters 4.Automatic speech recognition experiments 4.Conclusion and Future Work Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

315-May-2014 Motivation About languages exist in the world ( only few languages have speech processing systems Pronunciation dictionaries needed for text-to-speech and automatic speech recognition (ASR) Manual production of pronunciations slow and costly 19.2–30s / word for Afrikaans ( Davel and Barnard, 2004 ) Automatic grapheme-to-phoneme (G2P) conversion But: Consistency pronunciations first at ~3.7k word- pronunciation pairs for training (30k phoneme tokens)  Methods to reduce manual effort Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

415-May-2014 Goals Common approaches use their single favorite G2P conversion tool Idea: Use synergy effects of multiple G2P converters Close in performance but at the same time produce an output that differs in their errors Provides complementary information  Achieve pronunciations with higher quality through combination of G2P converter outputs Reduce manual effort in semi-automatic methods Impact on ASR performance Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

515-May-2014 Grapheme-to-phoneme converters Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios G2P converters Knowledge-basedManual Rule- based Hand- crafted rules Data-driven Local classification CART1- based „t2p“ (Lenzo, 1998) Probabilistic Graphone-based „Sequitur“ (Bisani & Ney, 2008) WFST2-based „Phonetisaurus“ (Novak 2011) SMT3-based „Moses“ (Koehn, 2005) (According to (Bisani and Ney, 2008)) c a r s K AX 9r S

615-May-2014 Data Languages: English, German, French, Spanish Dictionaries: English: CMU dictionary German, Spanish: GlobalPhone French: Quaero Project Data sets (randomly chosen): Training: 200, 500, 1k, 5k, 10k word-pronunciation pairs Development / test set: 10k word-pronunciation pairs (disjunctive) Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios different amounts of small training data sizes to simulate low resources different grade of G2P relationship

715-May-2014 Analysis of Single G2P Converter Outputs Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Edit distance to reference pronunciations at phoneme level (phoneme error rate (PER)) Lower PERs with increasing amount of training data

815-May-2014 Analysis of Single G2P Converter Outputs Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Edit distance to reference pronunciations at phoneme level (phoneme error rate (PER)) Lowest PERs are achieved with Sequitur and Phonetisaurus for all languages and data sizes – even Moses it is very close for de

915-May-2014 Analysis of Single G2P Converter Outputs Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Edit distance to reference pronunciations at phoneme level (phoneme error rate (PER)) For 200 en and fr W-P pairs, Rules outperforms Moses

1015-May-2014 Phoneme-level combination scheme Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Based on ROVER ( Fiscus, 1997 ) (Recognizer Output Voting Error Reduction) (traditionally at word level) Voting Module by frequency of occurence, since G2P confidence scores not reliable

1115-May-2014 Phoneme-level combination scheme Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios SequiturG2Pk EH 9r ZH 25% PhonetisaurusK AA ZH 25% CARTK AE ZH50%K AA 9r ZH0% MosesK AA 9r S25% 1:1 G2P (Rules)K AX 9r S50% Example (trained with 200 W-P pairs): Reference: cars K AA 9r ZH ConverterOutputPERPLC outputPER

1215-May-2014 Phoneme-level combination Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Relative PER change compared to best single converter output de In 10 of 16 cases  combination equal or better

1315-May-2014 Phoneme-level combination Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Relative PER change compared to best single converter output de Most improvement for de and en  ASR experiments

1415-May-2014 Phoneme-level combination Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Relative PER change compared to best single converter output de es (most regular G2P relationship) never improvements

1515-May-2014 Wiktionary Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios 39 Wiktionary editions with more than 1k IPA prons. (June 2012) Growth of Wiktionary entries over several years ((meta.wikimedia.org/wiki/List of Wiktionaries T. Schlippe, S. Ochs, T. Schultz: Web-based tools and methods for rapid pronunciation dictionary creation, Speech Communication, vol. 56, pp. 101 – 118, January 2014

1615-May-2014 Wiktionary Additional G2P converters based on word- pronunciation pairs in Wiktionary Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Internal consistency (PER %) 3.3k W-P pairs 1.5k W-P pairs 3.8k W-P pairs 4.6k W-P pairs

1715-May-2014 Data Filtered web-derived pronunciations Fully automatic methods from (Schlippe, 2012a, 2012b, 2014) ~15% with each filtering method Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios LanguageBest methodunfiltWDPfiltWDPRel. change English (en)M2NAlign33.18%26.13%+21.25% French (fr)Eps14.96%13.97%+6.62% German (de)G2P Len 16.74%14.17%+15.35% Spanish (es)M2NAlign10.25%10.90%-6.34%

1815-May-2014 Phoneme-level combination Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Relative PER change compared to best single converter output PLC-unfiltWDP already better than w/oWDP

1915-May-2014 Phoneme-level combination Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Relative PER change compared to best single converter output Filtering web-derived pronunciations helps 23.1% rel. PER reduction

2015-May-2014 ASR experiments Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Replace dictionaries in de & en recognizers with pronunciations generated with G2P converters Train and decode the systems Word Error Rate (WER) As in PER evaluation: Sequitur and Phonetisaurus very good in most cases However: Rules results in lowest WERs for most scenarios

2115-May-2014 ASR experiments Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios In only 1 case  PLC-w/oWDP better or equal best single converter

2215-May-2014 ASR experiments Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Filtering web-derived word-pronunciation pairs hels.

2315-May-2014 ASR experiments Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios Confusion Network Combination (CNC) outperforms PLC

2415-May-2014 ASR experiments Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios In 9 cases  Adding system with PLC in helps in CNC

2515-May-2014 Conclusion and Future Work In most cases, PLC comes close validated reference pronunciations more than the single converters Web-derived word-pronunciation pairs can further improve quality (Filtering the web data helpful) Weighting single G2P converters’ outputs gave no improvement according to performance on dev set according to converters‘ confidences Potential to enhance semi-automatic pronunciation dictionary creation by reducing the human editing effort Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

2615-May-2014 Conclusion and Future Work Positive impact of the combination in terms of lower PERs had only little influence on the WERs of our ASR systems Including systems with pronunciation dictionaries that have been built with PLC to CNC can lead to improvements Future work: Embedding PLC and web-derived pronunciations into the semi- automatic pronunciation dictionary creation Further languages and further G2P converters Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

2715-May-2014 благодари ́ м за внима ́ ние! Combining Grapheme-to-Phoneme Converter Outputs for Enhanced Pronunciation Generation in Low-Resource Scenarios

2815-May-2014 Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment References

2915-May-2014 Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment References

3015-May-2014 Pronunciation Extraction Through Cross-lingual Word-to-Phoneme Alignment References