2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.

Slides:



Advertisements
Similar presentations
1 Speech Sounds Introduction to Linguistics for Computational Linguists.
Advertisements

SSW6 Bonn Aug Communicative Speech Synthesis with XIMERA: a First Step Shinsuke Sakai 1,2, Jinfu Ni 1,2, Ranniery Maia 1,2, Keiichi Tokuda 1,3,
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
II. PHONOLOGY             .
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Text to Speech for In-car Navigation Systems Luisa Cordano August 8, 2006.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Artificial Intelligence. Agenda StartEnd Introduction AI Future Recent Developments Turing Test Turing Test Evaluation.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Kishore Prahallad IIIT Hyderabad 1 Building a Limited Domain Voice Using Festvox (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
CSC – School of Computer Science and Communication.
Prepared by: Waleed Mohamed Azmy Under Supervision:
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Reading Aid for Visually Impaired Veera Raghavendra, Anand Arokia Raj, Alan W Black, Kishore Prahallad, Rajeev Sangal Language Technologies Research Center,
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Vergina: A Modern Greek Speech Database for Speech Synthesis Alexandros Lazaridis Theodoros Kostoulas Todor Ganchev Iosif Mporas Nikos Fakotakis Artificial.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
SPEECH SYNTHESIS --AusTalk Zhijie Shao Master of Computer Science Supervisor: Trent Lewis.
Professor Alan W. Black Language Technologies Institute, Carnegie Mellon University Erik Jonsson School of Engineering and Computer Science The University.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Intelligibility of voiced and voiceless consonants produced by Lebanese Arabic speakers with respect to vowel length Romy Ghanem.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
HMM training strategy for incremental speech synthesis.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Lecture 1 Phonetics – the study of speech sounds
Virtual Tutor Application v1.0 Ruth Agada Dr. Jie Yan Bowie State University Computer Science Department.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
G. Anushiya Rachel Project Officer
Audio to Score Alignment for Educational Software
Text-To-Speech System for English
Artificial Intelligence for Speech Recognition
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Presentation transcript:

2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science and Engineering, Faculty of Technology, Ọbáfẹ́mi Awólọ́wọ̀ University, Ilé-ifẹ̀, 2014

Introduction Yorùbá language is a dialect continuum found in West Africa with over 22 million speakers (Wimbish, 1989). It is spoken in Nigeria, Benin Republic and Togo. It is also used as a language of active religious practice in Cuba, Brazil and parts of the Caribean Islands.

2014 Introduction (contd) Yorùbá is a tone language that uses three tones; the low, mid and high tones. AlphabetTone àlow amid áhigh

2014 Mid tone ta ba ba

2014 Low tone ta bà ba

2014 High tone ta bá ba

2014 Objectives 1. To collect and record speech samples for the purpose of extracting the Yorùbá phoneset. 2. Implement a text-to-speech synthesis system using information from (1). 3. Evaluate the text-to-speech synthesis system based on intelligibility and naturalness.

2014 Methodology 1. Record the Yorùbá speech samples using Praat, a software for the analysis of speech in phonetics. 2. Analyse the speech samples, extract relevant features and synthesize speech using the FESTIVAL synthesis platform. This involves: Linguistic Analysis Waveform generation 3. Evaluation of produced speech using the Mean Opinion Score (MOS) of intelligibility and naturalness

2014 Linguistic Analysis This involves the following: 1. Syllabification 2. Tokenization 3. Letter to sound rule/phonological analysis

2014 Waveform Generation This involves the following: 1. Diphone database design 2. Recording 3. Speech labeling 4. Pitch mark extraction

2014 Evaluation  Evaluation of speech synthesis is notoriously hard. This study evaluates the synthesis system based on the perception of intelligibility and naturalness by first language speakers of Yorùbá.  10 first language speakers of Yorùbá were asked to rate 10 sentences. The perceived naturalness and intelligibility were based on the Mean Opinion Scores (MOS) on a scale of 1 to 5  Listeners were able to identify tone and phonetic errors in which the acoustics of a sound didnt match the label.

2014 Results

2014 Results

2014 Challenges/Future work 1. The Dynamic Time Warping (DTW) technique in labelling the speech failed to align some prompts properly. The Hidden Markov Model (HMM) which makes use of Baum-welch algorithm will be a better technique. This will be adopted in future work 2. The tokenization did not consider Yorùbá numerals due to time limitations. This will be addressed in the future. 3. It is envisaged that the use of HMM will improve the accuracy of the tone realization.

2014 Conclusion In this work, we carried out an analysis of Yorùbá phonology with focus on extracting the knowledge needed for speech synthesis. We also observed and discussed the specific challenges in building a Yorùbá TTS.

2014 References 1. Alan Black et al, Building Synthetic voices. 2. Milan, S. (2009). Information mining from speech signal. 3. Odejobi, O. A. (2008) Text-to-Speech Synthesis for African languages: Modern Techniques, Tools and Technologies, VDM Verlag. Dr. Muller, Germany ISBN: Wimbish, J. (1989). Wordsurv: A program for analysing language survey word lists. Summer Institute of Linguistics.