2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science and Engineering, Faculty of Technology, Ọbáfẹ́mi Awólọ́wọ̀ University, Ilé-ifẹ̀, 2014
Introduction Yorùbá language is a dialect continuum found in West Africa with over 22 million speakers (Wimbish, 1989). It is spoken in Nigeria, Benin Republic and Togo. It is also used as a language of active religious practice in Cuba, Brazil and parts of the Caribean Islands.
2014 Introduction (contd) Yorùbá is a tone language that uses three tones; the low, mid and high tones. AlphabetTone àlow amid áhigh
2014 Mid tone ta ba ba
2014 Low tone ta bà ba
2014 High tone ta bá ba
2014 Objectives 1. To collect and record speech samples for the purpose of extracting the Yorùbá phoneset. 2. Implement a text-to-speech synthesis system using information from (1). 3. Evaluate the text-to-speech synthesis system based on intelligibility and naturalness.
2014 Methodology 1. Record the Yorùbá speech samples using Praat, a software for the analysis of speech in phonetics. 2. Analyse the speech samples, extract relevant features and synthesize speech using the FESTIVAL synthesis platform. This involves: Linguistic Analysis Waveform generation 3. Evaluation of produced speech using the Mean Opinion Score (MOS) of intelligibility and naturalness
2014 Linguistic Analysis This involves the following: 1. Syllabification 2. Tokenization 3. Letter to sound rule/phonological analysis
2014 Waveform Generation This involves the following: 1. Diphone database design 2. Recording 3. Speech labeling 4. Pitch mark extraction
2014 Evaluation Evaluation of speech synthesis is notoriously hard. This study evaluates the synthesis system based on the perception of intelligibility and naturalness by first language speakers of Yorùbá. 10 first language speakers of Yorùbá were asked to rate 10 sentences. The perceived naturalness and intelligibility were based on the Mean Opinion Scores (MOS) on a scale of 1 to 5 Listeners were able to identify tone and phonetic errors in which the acoustics of a sound didnt match the label.
2014 Results
2014 Results
2014 Challenges/Future work 1. The Dynamic Time Warping (DTW) technique in labelling the speech failed to align some prompts properly. The Hidden Markov Model (HMM) which makes use of Baum-welch algorithm will be a better technique. This will be adopted in future work 2. The tokenization did not consider Yorùbá numerals due to time limitations. This will be addressed in the future. 3. It is envisaged that the use of HMM will improve the accuracy of the tone realization.
2014 Conclusion In this work, we carried out an analysis of Yorùbá phonology with focus on extracting the knowledge needed for speech synthesis. We also observed and discussed the specific challenges in building a Yorùbá TTS.
2014 References 1. Alan Black et al, Building Synthetic voices. 2. Milan, S. (2009). Information mining from speech signal. 3. Odejobi, O. A. (2008) Text-to-Speech Synthesis for African languages: Modern Techniques, Tools and Technologies, VDM Verlag. Dr. Muller, Germany ISBN: Wimbish, J. (1989). Wordsurv: A program for analysing language survey word lists. Summer Institute of Linguistics.