Outlines  Objectives  Study of Thai tones  Construction of contextual factors  Design of decision-tree structures  Design of context clustering.

Slides:



Advertisements
Similar presentations
Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Advertisements

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Varied, Vivid Expressive How can you use your voice to engage, express, and create meaning?
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Bootstrapping a Language- Independent Synthesizer Craig Olinsky Media Lab Europe / University College Dublin 15 January 2002.
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Sound and Speech. The vocal tract Figures from Graddol et al.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Chapter three Phonology
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Toshiba Update 14/09/2005 Zeynep Inanoglu Machine Intelligence Laboratory CU Engineering Department Supervisor: Prof. Steve Young A Statistical Approach.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Automatic Continuous Speech Recognition Database speech text Scoring.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Kannada Text to Speech Synthesis Systems: Emotion Analysis By D.J. RAVI Research Scholar, JSS Research Foundation, S.J College of Engg, Mysore-06.
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
National Taiwan University, Taiwan
HMM training strategy for incremental speech synthesis.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
INTONATION (Chapter 17).
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Investigating Pitch Accent Recognition in Non-native Speech
Mr. Darko Pekar, Speech Morphing Inc.
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Kuiper and Allan Chapter 6.2
Kuiper and Allan Chapter 6.2
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Research on the Modeling of Chinese Continuous Speech Recognition
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.
Automatic Prosodic Event Detection
Presentation transcript:

Outlines  Objectives  Study of Thai tones  Construction of contextual factors  Design of decision-tree structures  Design of context clustering styles  Characteristics of Thai tones  Categorizations of Thai tones  Tree-based context clustering  Evaluation of overall tone correctness  Evaluation of tone correctness for each tone type  Evaluation of syllable duration distortion  Experiments  Conclusions

Objectives  To implement an HMM-based speech synthesis system for Thai language with the highest correctness of tone.

Study of Thai tones  Characteristics of Thai tones  Syllable Structure [Nakasakul2002]  Thai : Tonal Language รัก r-a-k^-3 (love) เรื่อย r-va-j^-2 (always) เคร่ง khr-e-ng^-2 (strict) เครียด khr-ia-t^-2 (stress) และ l-x-3 (and) เพลีย phl-iia-0 (exhausted) เสีย s-iia-4 (spoil) ปริ pr-i-1 (break)

Study of Thai tones  Characteristics of Thai tones  F0 contours of Standard Thai Tones (normalized duration) [Luksaneeyanawin1992] สามัญ Middle(0) เอก Low(1) โท Falling(2) ตรี High(3) จัตวา Rising(4)

Study of Thai tones  Categorizations of Thai tones  Abramson divided the tones into two groups:  static group  dynamic group  According to the final trend of contours:  upward trend group  downward trend group

HMM-based speech synthesizer Phoneme based speech unit modeling Provide flexible models, an efficient adaptation  Speaker adaptation  Speaking style conversion  1994 K. Tokuda; et al, proposed HMM-based speech synthesizer for Japanese

 Phrase level current word position in current phrase the number of syllables in {preceding, current, succeeding} phrase  Utterance level current phrase position in current sentence the number of syllables in current sentence the number of words in current sentence  Phoneme level {preceding, current, succeeding} phonetic type {preceding, current, succeeding} part of syllable structure  Syllable level {preceding, current, succeeding} tone type the number of phones in {preceding, current, succeeding} syllable current phone position in current syllable  Word level current syllable position in current word part of speech the number of syllables in {preceding, current, succeeding} word Tree-based context clustering  Construction of contextual factors Context clustering is to treat the problem of limitation of training data.

Tree-based context clustering  Design of decision-tree structures F0 contours of (a) synthesized speech from the clustering style of single binary tree without tone type questions and (b) natural speech. Problem of Misshaped F0 contour

Tree-based context clustering  Design of decision-tree structures

Tree-based context clustering  Design of 8 context clustering styles (a)-(h) + tone type questions (g)+ tone type questions (e)+ tone type questions (h)+ tone type questions (f)

1. Sentence structure analysis 2. Word structure analysis 3. Full context labeling 4. Construction of question set for context clustering 5. Feature extraction System Preparations VAJA Speech corpus Wav fileLabel file ORCHID Text corpus Wav file Label file XML file Parameter file (.cmp) Full context Labeling Feature Extraction (mcep,f0) Parameter file (.cmp) Parameter file (.cmp) Parameter file (.cmp) Full context label file(.lab) Label file (.lab) Label file (.lab) Label file (.lab) Label file (.lab) Full context label file(.lab) Full context label file(.lab) Full context label file(.lab) HMM Training and Synthesis Synthetic Speech

Experiments  Evaluation of overall tone correctness Figure 5: F0 contours of synthesized speech from 8 different clustering styles; and F0 contour of natural speech.

Experiments  Evaluation of overall tone correctness Figure 6: Tone error percentages of synthesized speech from 4 different clustering styles

Experiments  Evaluation of overall tone correctness Figure 7: Tone error percentages of synthesized speech from 8 different clustering styles

Experiments  Evaluation of tone correctness for each tone type Figure 8: Tone error percentages of synthesized speech from 8 different clustering styles categorized by tone types;

Experiments  Evaluation of syllable duration distortion Figure 9: Scores of a paired-comparison test for natural duration among 4 different clustering styles;

Examples of synthesized speech Female Method corpus size (number of training utterances) Examples HMM VAJA (Unit Selection) Analysis-Synthesis speech Female MethodTree StructureAdd tone question set HMM (a)(e) (b)(f) (c)(g) (d)(h)

Conclusions  An analysis of tree-based context clustering of an HMM-based Thai speech synthesis system has been conducted in this paper.  Four structures of decision tree were designed according to tone groups and tone types to obtain higher correctness of tone of synthesized speech.  The results show that the tone-separated tree structures can reduce the tone error percentage of the synthesized speech compared to the single binary tree structure significantly.  As for using the contextual tone information in the syllable level, it can improve the tone correctness for all structures of decision tree.  There are some distortions of the syllable duration appearing in the case of using the simple tone-separated tree context clustering with a small amount of training data, however it can be relieved when using the constancy-based-tone-separated or the trend-based-tone-separated tree context clustering.  The analysis of tone correctness of the average-voice-based speech model and the intonation analysis issues are anticipated to be studied in the future.