Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.

Slides:



Advertisements
Similar presentations
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Tools for Speech Analysis Julia Hirschberg CS4995/6998 Thanks to Jean-Philippe Goldman, Fadi Biadsy.
Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Prosodic marking of appositive relative clause types in spoken discourse: pragmatic and phonetic analyses of a British English corpus Cyril Auran & Rudy.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
Tools for Speech Analysis 2 How do we choose? What kind of data? Which task?
Sound and Speech. The vocal tract Figures from Graddol et al.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Praat Fadi Biadsy.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
LE 460 L Acoustics and Experimental Phonetics L-13
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
Automatic Pitch Tracking January 16, 2013 The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Tone sensitivity & the Identification of Consonant Laryngeal Features by KFL learners 15 th AATK Annual Conference Hye-Sook Lee -Presented by Hi-Sun Kim-
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Vergina: A Modern Greek Speech Database for Speech Synthesis Alexandros Lazaridis Theodoros Kostoulas Todor Ganchev Iosif Mporas Nikos Fakotakis Artificial.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean.
3308 First Language acquisition Acquisition of sounds Perception Sook Whan Cho Fall, 2012.
Introduction to Computational Linguistics
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
HMM training strategy for incremental speech synthesis.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
Praat: doing phonetics by computer Introductory tutorial Kyuchul Yoon Division of English Kyungnam University.
Mr. Darko Pekar, Speech Morphing Inc.
Text-To-Speech System for English
Studying Intonation Julia Hirschberg CS /21/2018.
Speech and Language Processing
Tools for Speech Analysis
Looking at Spectrogram in Praat cs4706, Jan 30
Presentation transcript:

Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam University The Autumn Conference of The Association of Modern British & American Language & Literature University of Ulsan,

2 Table of Contents Background & motivation Goal of the current work Prosody transfer (PSOLA algorithm) Preparation of stimuli Listening test & evaluation Future work

3 Background & motivation Differences among dialects –Segmental differences Fricative differences in the time domain (Lee, 2002) –Busan fricatives have shorter frication/aspiration intervals than for Seoul Fricative differences in the frequency domain (Kim et al., 2002) –The low cutoff frequency of Kyungsang fricatives was higher than for Cholla fricatives (> 1,000 Hz) –Non-segmental or prosodic differences Intonation or fundamental frequency (F0) contour difference Intensity contour difference Segment durational difference Voice quality difference

4 Background & motivation Concatenative text-to-speech (TTS) synthesizers –Concatenation-based –Concatenation units: e.g. diphones –Concatenation units from pre-recorded utterances of a particular dialect –No need for modeling segmental properties (cf. formant-based synthesizers) Strength/Weakness –Usually single dialect

5 Background & motivation To build a multi-dialectal TTS synthesizer –Concatenation units: Multiple dialects –User-selectable dialects Question: –Scenario A: A multi-dialectal TTS system containing multiple concatenation units from all the dialects involved –Scenario B: Use the concatenation units from a single dialect and simulate the other dialects

6 Background & motivation The answer has implications on the cost and the complexity of building multi-dialect TTS systems. Scenario B –Simpler & cheaper –Need for simulating the segmental/non-segmental aspects of the other dialects involved. –Scenario A may be the ultimate solution Concatenative TTS systems –Since modeling the segmental aspects of the concatenation units in the frequency domain can be difficult, the non-segmental or prosodic aspects should be manipulated.

7 Background & motivation The imaginary TTS system (Scenario B) Concatenation units from dialect 1 Dialect 2Dialect 3Dialect 4 Simulate prosodic aspects

8 Background & motivation The questions are; Would the simulated dialects be good enough? In other words, Would the segmental effects be negligible in perceiving the simulated dialects as authentic?

9 Goal of the current work The goal is to test the viability of this scenario with an imaginary system: –Simulate Masan dialect with Seoul dialect The simulated Masan dialect will have –the speech segments of Seoul dialect –the prosody of Masan dialect (F0, intensity, duration) –the voice source of Masan dialect (not tested)

10 Goal of the current work The imaginary system would have –the concatenation units from Seoul dialect and –the ‘near-perfect’ prosody-generating module and –have to simulate the other dialects, e.g. Masan dialect The imaginary TTS system will be implemented with –the recorded utterances of Seoul dialect –the Masan prosody (F0, intensity, duration) from recorded Masan utterances –the voice source of recorded Masan utterances (not tested)

11 Prosody transfer (PSOLA algorithm) Three aspects of the prosody –Fundamental frequency (F0) contour –Intensity contour –Segmental durations Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990) –Implemented in Praat (Boersma, 2005) –Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2006)

12 Prosody transfer (PSOLA algorithm) PSOLA algorithm –Windowing pitch periods of the original signal –Rearranging windowed pitch periods to Stretch/shrink the signal (involves adding/deleting windowed pitch periods) Change, i.e. increase/decrease the F0 of the signal (involves adding/deleting windowed pitch periods)

13 Prosody transfer (PSOLA algorithm) original waveform windowed waveform shortened waveform waveform with lower F0

14 Prosody transfer (PSOLA algorithm) Prosody transfer using the PSOLA algorithm –Align segments btw/ Masan and Seoul utterances –Make the segment durations of the two identical –Make the two F0 contours identical –Make the two intensity contours identical

15 Prosody transfer (PSOLA algorithm) Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical ㅂㅏㄹㅏㅁ “… 바람 …” Masan ㅏㅏ Seoul stretch shrink ㅂㄹㅁ

16 Prosody transfer (PSOLA algorithm) ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan F0 Seoul F0 Make the two F0 contours identical

17 Prosody transfer (PSOLA algorithm) Seoul intensity ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan intensity Make the two intensity contours identical

18 Preparation of test stimuli

19 Preparation of control stimuli

20 Preparation of experiment stimuli 바다에 보물섬이 없다 교수님 가시는 길이 구미로 … 동대구에 볼 일이 없습니다 쌀 사고 난 후에 와라 바람이 불어서 먼지가 많다 싸기는 해 보여도, 비싸기는 … 서울에 사는 삼촌이 왔다 Masan dialect prosody-donor (A) prosody-recipient (B) Seoul dialect prosody-recipient (C) prosody-recipient (D) 7 test stimuli (used) test stimuli (not used) 7 control stimuli (used)

21 Listening test & evaluation 14 test/control stimuli normalized & randomized Presented to 4 Masan listeners for magnitude estimation –On a scale of 1 (bad) to 10 (best) –Qualitatively assessed –Used Praat experimentMFC object –Repetition of each stimulus : up to 10 times (User can press “replay” button)

22 Listening test & evaluation

23 Listening test & evaluation

24 Future work Carefully control the phonological, morphological, and syntactic aspects of the test sentences Try the voice source (as opposed to the filter) of Masan utterances

25 Future work Compare spectra btw/ Masan and Seoul /i/ –window length 50 msec. 바람이 H1 & H2

26 Original Masan dialect Original Seoul dialect Simulated Masan dialect: Seoul segments + Masan prosody Simulated Masan dialect: Seoul segments + Masan prosody + Masan voice source

27 Appendix 바다에 보물섬이 없다 교수님 가시는 길이 구미로 … 동대구에 볼 일이 없습니다 쌀 사고 난 후에 와라 바람이 불어서 먼지가 많다 싸기는 해 보여도, 비싸기는 … 서울에 사는 삼촌이 왔다 Seoul dialect prosody-donor (A) prosody-recipient (B) Masan dialect prosody-recipient (C) prosody-recipient (D) test stimuli control stimuli

28 References [1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan dialect on fricatives”, Speech Sciences, Vol.9/3, pp , [2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngeal fiberscope”, Speech Sciences, Vol.9/2, pp.25-47, [3] Kyuchul Yoon, “Swapping native and non-native speakers' prosody using PSOLA algorithm”, Proceedings of the Korean Society of Phonetic Sciences and Speech Technology, Spring Conference, pp.77-81, [4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication, 9:n 5-6, [5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International, Vol.5, 9/10, pp , 2005.