A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon 2005. 8 Linguistics Department The Ohio State University.

Slides:



Advertisements
Similar presentations
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Advertisements

Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Can a prosodic pattern induce/ reduce the perception of a lower- class suburban accent in French? Philippe Boula de Mareüil 1 & Iryna Lehka-Lemarchand.
Splice: From vowel offset to vowel onset FIG 3. Example of stimulus spliced from the repetitive syllables. EXPERIMENT 2 (Voicing ID) METHOD Speech materials:
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
Detecting Prosody Improvement in Oral Rereading Minh Duong and Jack Mostow Project LISTEN Carnegie Mellon University The research.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Linguistic Phonetics in the UCLA Phonetics Lab Pat Keating Sound to Sense / June 11, 2004.
Language Acquisition Species-specific, species-universal accomplishment Central issue for cognitive science Important distinction between language comprehension.
Sound and Speech. The vocal tract Figures from Graddol et al.
Chapter three Phonology
Joint Prosody Prediction and Unit Selection for Concatenative Speech Synthesis Ivan Bulyko and Mari Ostendorf Electrical Engineering Department University.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Phonology Katie Burns Title III Resource Teacher.
Prosody and NLP Seminar by Nikhil: Adith: Prachur: 06D05011 We have a presentation this Friday ?
Toshiba Update 14/09/2005 Zeynep Inanoglu Machine Intelligence Laboratory CU Engineering Department Supervisor: Prof. Steve Young A Statistical Approach.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Segment Duration and Vowel Quality in German Lexical Stress Perception Klaus J. Kohler University of Kiel, Germany Paper presented at Speech Prosody 2012.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Infant Speech Perception & Language Processing. Languages of the World Similar and Different on many features Similarities –Arbitrary mapping of sound.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Comprehension of Grammatical and Emotional Prosody is Impaired in Alzheimer’s Disease Vanessa Taler, Shari Baum, Howard Chertkow, Daniel Saumier and Reported.
Tone sensitivity & the Identification of Consonant Laryngeal Features by KFL learners 15 th AATK Annual Conference Hye-Sook Lee -Presented by Hi-Sun Kim-
VOT trumps other measures in predicting Korean children’s early mastery of tense stops Eun Jong Kong Mary E. Beckman Jan Edwards LSA2010 January 7 th.
Is phonetic variation represented in memory for pitch accents ? Amelia E. Kimball Jennifer Cole Gary Dell Stefanie Shattuck-Hufnagel ETAP 3 May 28, 2015.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Results 1.Boundary shift Japanese vs. English perceptions Korean vs. English perceptions 1.Category boundary was shifted toward boundaries in listeners’
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Vergina: A Modern Greek Speech Database for Speech Synthesis Alexandros Lazaridis Theodoros Kostoulas Todor Ganchev Iosif Mporas Nikos Fakotakis Artificial.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.
Speech Perception 4/4/00.
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
5aSC5. The Correlation between Perceiving and Producing English Obstruents across Korean Learners Kenneth de Jong & Yen-chen Hao Department of Linguistics.
Growing up Bilingual: One System or Two? Language differentiation and speech perception in infancy.
Ch 3 Slide 1 Is there a connection between phonemes and speakers’ perception of phonetic differences? (audibility of fine distinctions) Due to phonology,
K-ToBI Labeling Conventions Sun-Ah Jun, Linguistics, UCLA Version 3.1, November Presented.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean.
3308 First Language acquisition Acquisition of sounds Perception Sook Whan Cho Fall, 2012.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
What infants bring to language acquisition Limitations of Motherese & First steps in Word Learning.
INTONATION (Chapter 17).
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Language Perception.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
The 157th Meeting of Acoustical Society of America in Portland, Oregon, May 21, pSW35. Confusion Direction Differences in Second Language Production.
Studying Intonation Julia Hirschberg CS /21/2018.
Representing Intonational Variation
Studying Intonation Julia Hirschberg CS /21/2018.
The American School and ToBI
Detecting Prosody Improvement in Oral Rereading
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Representing Intonational Variation
Representing Intonational Variation
Presentation transcript:

A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University

2 Allophonic variations Defined mostly in terms of neighboring segments. e.g. Allophones of /t/ in English /t/ [t] [t h ][ ʔ ][ ɾ ] “stop” “top” “kitten” “little”

3 Segmental positions Determined in most cases within a word by its 1. neighboring segments and 2. word boundaries, i.e. word-initial/final 3. presence/absence of stress

4 Korean Tone & Break Indices (K - ToBI) (Prosody labeling conventions) IP: Intonational PhraseH: high tone AP: Accentual PhraseL: low tone W: Prosodic Word (PW)T: tone (could be H or L) σ: syllable%: boundary tone (e.g. H%, L%, HL%, etc.)

5 Word-initial positions in K-ToBI

6 Conventional segmental positions word-initial word-final

7 Segmental positions in K-ToBI PW-initial AP-initial IP-initial  PW-initial AP-initial  PW-initial  PW-medial  Three types of word-initial positions in K-ToBI !

8 Allophonic variations: an extended view Defined mostly in terms of neighboring segments. Need to be examined with respect to its prosodic constituency in K-ToBI.

9 Productions studies on Korean and other languages Korean Jun (’93,’98): lenis stop voicing, obstruent nasalization, VOT of /p h / Cho & Keating (’01): segmental properties of /t, t h, t*, n/ Kim (’01): segmental properties of /s h, s*/ Yoon (’03): subsegmental durations of /s h, s*/ Other languages Smith (’97): American /z/ Pierrehumbert & Talkin (’92), Pierrehumbert (’95): English /h/ and / ʔ / Fougeron (’01): French segments /t, k, s, l, n, i, a/ Keating et al. (’98): /t, n/ of Korean, English, French & Taiwanese

10 Productions studies on Korean and other languages – summary of results Korean AP is the domain of lenis stop voicing, post-obstruent tensing (Jun). IP is the domain of obstruent nasalization (Jun). VOT of /p h /: AP-initial > PW-initial > PW-medial (Jun). Consonants initial to higher prosodic domains are ‘stronger’ (Cho, Keating, Kim). Non-uniform variations in durations of subsegmental units (Yoon). Other languages American English /z/ is devoiced differently in different positions (Smith). English /h/ and / ʔ / produced differently in different word-/phrase-level prosody. (P & T) Articulation of initial segments varied depending on the prosodic level of the constituent, i.e. initial to an IP, AP, W or syllable. (Fougeron) There is phrasal/prosodic conditioning of articulation across the four languages. (Keating et al.)

11 Need for a perception study, but how? As the production studies show, Korean speakers seem to encode prosodic categories, i.e. IP, AP, PW, etc., in domain-initial segments. Do speakers decode the encodings? Are the encodings perceptible? How do we test it? One way to test it is to use a concatenative TTS system so that one can synthesize sentences by manipulating phone-sized units, i.e. diphones. (Festival Speech Synthesis System)

12 Need for a perception study, but how? Key idea: Synthesize a set of two sentences, differing only in terms of their domain-initial segment compositions. IP-initial  AP-initial  PW-initial  PW-medial 

13 Need for a perception study, but how? Test stimuli: 1 st set: good AP: composed of prosodically appropriate synthetic units bad AP: composed of prosodically inappropriate units (Replace  with  ) 2 nd set: good PW: composed of prosodically appropriate synthetic units bad PW: composed of prosodically inappropriate units (Replace  with  ) IP-initial  AP-initial  PW-initial  PW-medial 

14 Prosodic diphones IP-initial <p-a  AP-initial [p-a  PW-initial {p-a  PW-medial p-a  6,503 prosodic diphones needed to synthesize any Korean utterance. 예 ) … #-< ㅂ, < ㅂ - ㅏ, ㅏ - ㄷ, ㄷ - ㅏ, ㅏ - ㄹ, ㄹ - ㅗ ], ㅗ ]-[ ㅂ, [ ㅂ - ㅏ, …

15 Design & synthesis of test stimuli 96 stimuli (phrases) synthesized from the Festival system (Durations and F0 contours copied from natural utterances). All were composed of either two AP’s or two PW’s. All contained one target site, where an AP/PW-initial segment was replaced with a PW-medial segment. 24 good AP: phrases with intact diphones. 24 bad AP : phrases whose target site segment (AP-initial segment) was replaced with a PW-medial segment 24 good PW: phrases with intact diphones 24 bad PW : phrases whose target site segment (PW-initial segment) was replaced with a PW-medial segment

16 Design & synthesis of test stimuli Synthesis of a sample stimulus (Praat script)Praat script natural utterance diphone sequences from Festival fundamental frequency (F0) contour and segmental durations copied from natural utterance intensity contour copied from natural utterance Prototype system lacks duration & F0 generation module  Get help from natural utterances.

17 Design & synthesis of test stimuli Sample stimuli target site segment: /p/

18 Design & synthesis of test stimuli More sample stimuli target segmentgood APbad APgood PWbad PW /p/ /t/ /k/ /p h / /t h / /t*/ /t ʃ / /t ʃ h / /s h /

19 Results & conclusion 80 listeners (37 women and 43 men): native speakers of Korean, average age of 30.6, grew up in Korea until at least 18 years old. Two types of tests in three tasks Intelligibility: dictation task  wrote down what they heard in hangul Naturalness: rating & preference task  rate one version wrt/ the other and choose one over the other Statistical analyses showed that listeners performed better in the dictation task with “good” versions of the stimuli. They also liked/rated better the “good” versions. Segmental encoding of prosodic domains/categories is perceptible to Korean listeners.