The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Tools for Speech Analysis Julia Hirschberg CS4995/6998 Thanks to Jean-Philippe Goldman, Fadi Biadsy.
Can a prosodic pattern induce/ reduce the perception of a lower- class suburban accent in French? Philippe Boula de Mareüil 1 & Iryna Lehka-Lemarchand.
Splice: From vowel offset to vowel onset FIG 3. Example of stimulus spliced from the repetitive syllables. EXPERIMENT 2 (Voicing ID) METHOD Speech materials:
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Sentence Durations and Accentedness Judgments ABSTRACT Talkers in a second language can frequently be identified as speaking with a foreign accent. It.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Prosodic marking of appositive relative clause types in spoken discourse: pragmatic and phonetic analyses of a British English corpus Cyril Auran & Rudy.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Praat Fadi Biadsy.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.
A Text-to-Speech Synthesis System
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Tone sensitivity & the Identification of Consonant Laryngeal Features by KFL learners 15 th AATK Annual Conference Hye-Sook Lee -Presented by Hi-Sun Kim-
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Synthesis of Child Speech With HMM Adaptation and Voice Conversion Oliver Watts, Junichi Yamagishi, Member, IEEE, Simon King, Senior Member, IEEE, and.
Vergina: A Modern Greek Speech Database for Speech Synthesis Alexandros Lazaridis Theodoros Kostoulas Todor Ganchev Iosif Mporas Nikos Fakotakis Artificial.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
SPEECH SYNTHESIS --AusTalk Zhijie Shao Master of Computer Science Supervisor: Trent Lewis.
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
3308 First Language acquisition Acquisition of sounds Perception Sook Whan Cho Fall, 2012.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
HMM training strategy for incremental speech synthesis.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
Praat: doing phonetics by computer Introductory tutorial Kyuchul Yoon Division of English Kyungnam University.
17th International Conference on Infant Studies Baltimore, Maryland, March 2010 Language Discrimination by Infants: Discriminating Within the Native.
ARTIFICIAL NEURAL NETWORKS
Text-To-Speech System for English
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Studying Intonation Julia Hirschberg CS /21/2018.
Speech and Language Processing
Speech Perception CS4706.
Voice source characterisation
A System for Hybridizing Vocal Performance
Tools for Speech Analysis
Looking at Spectrogram in Praat cs4706, Jan 30
Presentation transcript:

The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS & KASS

2 Goals 1.Synthesize Masan utterances from matching Seoul utterances by prosody cloning 2.Examine the role of prosody in the authentication of synthetic Masan utterances (Listening experiment)

3 Background Differences among dialects –Segmental differences Fricative differences in the time domain (Lee, 2002) –Busan fricatives have shorter frication/aspiration intervals than for Seoul Fricative differences in the frequency domain (Kim et al., 2002) –The low cutoff frequency of Kyungsang fricatives was higher than for Cholla fricatives (> 1,000 Hz) –Non-segmental or prosodic differences Intonation or fundamental frequency (F0) contour difference Intensity contour difference Segment durational difference Voice quality difference

4 Synthesis Simulating (by prosody cloning) Masan dialect from Seoul dialect The simulated Masan utterances will have –the speech segments of Seoul dialect –the prosody of Masan dialect F0 contour Intensity contour Segmental duration

5 Evaluation Through a listening experiment Stimuli consist of –#1. Authentic, but synthetic, Masan utterance –#2. Seoul utterance with Masan segmental durations (D) –#3. Seoul utterance with Masan F0 contour (F) –#4. Seoul utterance with Masan intensity contour (I) –#5. Seoul utterance with Masan durations and F0 contour (D+F) –#6. Seoul utterance with Masan durations and intensity contour (D+I) –#7. Seoul utterance with Masan F0 contour and intensity contour (F+I) –#8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I) (1) 동대구에 볼 일이 없습니다. (2) 바다에 보물섬이 없다

6 Prosody transfer (PSOLA algorithm) Three aspects of the prosody –Fundamental frequency (F0) contour –Intensity contour –Segmental durations Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990) –Implemented in Praat (Boersma, 2005) –Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2007)

7 Prosody transfer (PSOLA algorithm) Procedures for full prosody transfer –Align segments btw/ Masan and Seoul utterances –Make the segment durations of the two identical –Make the two F0 contours identical –Make the two intensity contours identical

8 Prosody transfer (PSOLA algorithm) Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical ㅂㅏㄹㅏㅁ “… 바람 …” Masan ㅏㅏ Seoul stretch shrink ㅂㄹㅁ

9 Prosody transfer (PSOLA algorithm) ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan F0 Seoul F0 Make the two F0 contours identical

10 Prosody transfer (PSOLA algorithm) Seoul intensity ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan intensity Make the two intensity contours identical

11 Synthetic (simulated) Masan stimulus

12 Synthetic authentic Masan stimulus

13 Listening experiment 16 stimuli (8 + 8) Presented to 13 Masan/Changwon listeners –On a scale of 1 (worst) to 10 (best) –Used Praat ExperimentMFC object –Allowed repetition of stimulus: up to 10 times

14 Listening experiment

15 Results & Conclusion Histogram of listener responses

16 Results & Conclusion F0 contour transfer 1 … listener responses … 10

17 Results & Conclusion Seoul utterances with Masan prosody D F I DF DI FI DFI Masan

18 Results & Conclusion Main effects of –Segmental durations; F(1,12)=11.53, p=0.005 –F0 contour; F(1,12)=141.12, p= Regression analysis

19 Results & Conclusion Prosody cloning not sufficient for dialect simulation –(Sub)Segmental differences may be at work –Quality of synthetic stimuli F0 contour transfer (from Masan to Seoul) –Most influential on shifting perception from Seoul to Masan utterances

20 References [1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan dialect on fricatives”, Speech Sciences, Vol.9/3, pp , [2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngeal fiberscope”, Speech Sciences, Vol.9/2, pp.25-47, [3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’ utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature, Vol.25(4). pp , [4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, [5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International, Vol.5, 9/10, pp , 2005.