Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Speech-to-Speech Translation Hannah Grap Language Weaver, Inc.
Configuration management
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Test Automation Success: Choosing the Right People & Process
MAI Internship April-May MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages.
Introduction Developing reading & writing skills for primary school
J. Kunzmann, K. Choukri, E. Janke, A. Kießling, K. Knill, L. Lamel, T. Schultz, and S. Yamamoto Automatic Speech Recognition and Understanding ASRU, December.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Rutgers Components Phase 2 Principal investigators –Paul Kantor, PI; Design, modelling and analysis –Kwong Bor Ng, Co-PI - Fusion; Experimental design.
A Tool to Support Ontology Creation Based on Incremental Mini- Ontology Merging Zonghui Lian Data Extraction Research Group Supported by Spring Conference.
Modern Information Retrieval
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Unit II Four Language Skills: Aural and Oral Reading and Writing.
Auditory User Interfaces
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 16 Slide 1 User interface design.
Promoting Success for All Students through Technology.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Intelligent Support Systems
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.
HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa.
1 Beyond the Library: i-Skills for University Administration © Netskills, Quality Internet Training, Newcastle University Partly.
English vs. Mandarin: A Phonetic Comparison Experimental Setup Abstract The focus of this work is to assess the performance of three new variational inference.
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Chapter 2 The process Process, Methods, and Tools
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Computational Investigation of Palestinian Arabic Dialects
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Expanding the Accessibility and Impact of Language Technologies for Supporting Education (TFlex): Edinburgh Effort Dr. Myroslava Dzikovska, Prof. Johanna.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Exploring XML-based Technologies and Procedures for Quality Evaluation from a Real-life Case Perspective Folkert de Vriend 1 & Giulio Maltese 2 1 Speech.
Suléne Pilon & Danie Prinsloo Overview: Teaching and Training in South Africa 25 November 2008;
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
Dr. Tom WayCSC Testing and Test-Driven Development CSC 4700 Software Engineering Based on Sommerville slides.
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
©Bob Godfrey, 2003, 2005 BSA206 Database Management Systems Lecture 1: Introduction.
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
Performance Comparison of Speaker and Emotion Recognition
Ask a Librarian: The Role of Librarians in the Music Information Retrieval Community Jenn Riley, Indiana University Constance A. Mayer, University of Maryland.
M. Liu, T. Stanley, J. Baca and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL:
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Universal Design for Learning
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible.
Introduction to Machine Learning, its potential usage in network area,
How can speech technology be used to help people with disabilities?
Rekayasa Perangkat Lunak Part-6
Model Discovery through Metalearning
Chapter 8 – Software Testing
Google translate app demo
3.0 Map of Subject Areas.
Verification Guidelines for Children with Disabilities
Testing and Test-Driven Development CSC 4700 Software Engineering
Presentation transcript:

Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard 27 February 2006

Slide 2 © CSIR Agenda Background HLT in the developing world Why pronunciation models? Bootstrapping & pronunciation modeling A Bootstrapping framework Components Efficiency Experimental approach Results Conclusions

Slide 3 © CSIR Background: Human Language Technologies Speech processing: Speech recognition, speech synthesis Spoken dialogue systems, telephony systems Text-based language processing: Search, information analysis, machine translation Human Factors in language-based systems: System usability, culturally appropriate interfaces System localisation

Slide 4 © CSIR Background: HLT in the developing world Free and natural access To information To technology Reducing barriers Literacy Fluency in English Technological literacy Various types of disabilities Support for language diversity Support for service delivery

Slide 5 © CSIR Background: HLT in the developing world HLT requires extensive language resources: Electronic resources for local languages scarce Linguistic diversity high Skilled computational linguists scarce Language resource collection expensive

Slide 6 © CSIR Background: Pronunciation modeling Ability to predict pronunciation based on written form of word Core component in speech processing systems: Automatic speech recognition Text-to-speech technology Example: bright:b r ay t girth:g er th Modeling pronunciations Language-specific Can use large pronunciation lexicons Can learn from data

Slide 7 © CSIR Bootstrapping & Pronunciation modeling Bootstrapping: Model improved iteratively Via a controlled series of increments Previous model utilised to generate next

Slide 8 © CSIR Bootstrapping in action (Demonstration)

Slide 9 © CSIR Bootstrapping framework: Components

Slide 10 © CSIR Bootstrapping framework: Efficiency Combine machine learning and human intervention, in order to minimise the amount of human effort required. Human Factors − Required user expertise − User learning curve − Cost of intervention − Task difficulty − Quality and cost of user verification mechanisms − Difficulty of manual task − Initial set-up cost Machine learning factors − Accuracy of representation − Conversion accuracy − Set sampling ability − System continuity − Robustness to human error − On-line conversion speed − Quality and cost of automated verification mechanisms − Validity of base data − Effect of incorporating additional resources

Slide 11 © CSIR Bootstrapping framework Prior work: Demonstrated efficiency for small lexicons [1,2] Developed new algorithms for efficient rule extraction [3,4] Verified the human factors involved, including linguistic sophistication of user and implications of audio assistance [5] Developed additional tools to support process, including automated error detection [6] This experiment: Evaluate efficiency for a medium-sized lexicon: large enough for practical use

Slide 12 © CSIR Experimental approach Combine all prior results (each 1000 to 2000 words) to obtain a single 5000-word lexicon Bootstrap from 5000 to 8000 words, measuring actual effort Bootstrap parameters: Linguistically sophisticated user Incremental Default&Refine (synchronised every 50 words) Automated error detection performed at end of cycle Audio assistance optional

Slide 13 © CSIR Results

Slide 14 © CSIR Results

Slide 15 © CSIR Results

Slide 16 © CSIR Results

Slide 17 © CSIR Results

Slide 18 © CSIR Conclusions Dictionaries developed usable in practice Afrikaans:general-purpose Text-to-Speech developed isiZulu: general-purpose Text-to-Speech developed Sepedi: automatic speech recognition system developed Approach practical and efficient Future work: Open Source release imminent Apply approach to all 11 official languages Expand meta-information to be bootstrapped (including tone, stress) Further algorithmic improvements Evaluate implications of framework for additional resources

Slide 19 © CSIR References [1] M. Davel and E. Barnard, “Bootstrapping for language resource generation,” in Proceedings of the Symposium of the Pattern Recognition Association of South Africa, South Africa, 2003, pp. 97–100 [2] S. Maskey, L. Tomokiyo, and A.Black, “Bootstrapping phonetic lexicons for new languages,” in Proceedings of Interspeech, Jeju, Korea, October 2004, pp. 69–72. [3] M. Davel and E. Barnard, “The efficient creation of pronunication dictionaries: machine learning factors in bootstrapping,” in Proceedings of Interspeech, Jeju, Korea, October 2004, pp. 2781–2784. [4] M. Davel and E.Barnard, “A default-and-refinement approach to pronunciation prediction,” in Proceedings of the Symposium of the Pattern Recognition Association of South Africa, South Africa, November 2004, pp. 119–123. [5] M. Davel and E. Barnard, “The efficient creation of pronunication dictionaries: human factors in bootstrapping,” in Proceedings of Interspeech, Jeju, Korea, October 2004, pp. 2797–2800. [6] M. Davel and E. Barnard, “Bootstrapping pronunciation dictionaries: practical issues,” in Proceedings of Interspeech, Lisboa, Portugal, September 2005.