1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean.

Slides:



Advertisements
Similar presentations
CNBH, Physiology Department, Cambridge University 2. Experimental procedure The experiment is a 2AFC paradigm design in which.
Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Tools for Speech Analysis Julia Hirschberg CS4995/6998 Thanks to Jean-Philippe Goldman, Fadi Biadsy.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Looking at Spectrogram in Praat cs4706, Jan 30 Fadi Biadsy.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Praat Fadi Biadsy.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Topics covered in this chapter
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Tone sensitivity & the Identification of Consonant Laryngeal Features by KFL learners 15 th AATK Annual Conference Hye-Sook Lee -Presented by Hi-Sun Kim-
Prepared by: Waleed Mohamed Azmy Under Supervision:
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Roman Kálecký UČO: Segmental features  Sounds  Speach Trainer 3D Suprasegmental features and accents  Speak English  SpeakAP  Accentuate!
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Phonetics: consonants
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Phonetics: More applicaitons Raung-fu Chung Southern Taiwan University
2014 Development of a Text-to-Speech Synthesis System for Yorùbá Language Olúòkun Adédayọ̀ Tolulope Department of Computer Science.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
Praat: doing phonetics by computer Introductory tutorial Kyuchul Yoon Division of English Kyungnam University.
ARTIFICIAL NEURAL NETWORKS
University of Silesia Acoustic cues for studying dental fricatives in foreign-language speech Arkadiusz Rojczyk Institute of English, University of Silesia.
Voice source characterisation
Tools for Speech Analysis
Presentation transcript:

1 Current Interests 2007~2008 (Unfinished papers & Premature ideas) 1.Identifying frication & aspiration noise in the frequency domain: The case of Korean alveolar lax fricatives 2.The role of prosody in dialect synthesis and authentication 3.Synthesis & evaluation of prosodically exaggerated utterances 4.Determining the weights of prosodic components in prosody evaluation 5.Difference database of prosodic features for automatic prosody evaluation 6.Transforming Korean alveolar lax fricatives into tense 7.Gender transformation of utterances

1. Identifying frication & aspiration noise in the frequency domain: The case of Korean alveolar lax fricatives Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

3 Korean lax alveolar fricatives Two different types of noise

4 Algorithm

5 Change of energy distribution in the frequency domain over time Energy distribution on a frame-by-frame basis (e.g. 5 msec) Sums of band energy across the reference (e.g. low cutoff) frequency criterionValue variable determines the boundary Assumption: Same criteronValue for same speaker

6 How Praat script works See Demo

7 How Praat script works

8 Experiment The list of words used in the experiment. The words marked with * was also used in the repeated series experiment. The numbers in parentheses represent the number of repetition during the recording.

9 Results & Conclusion The histogram of differences between the manually inserted and automatically inserted boundaries for the repeated series experiment. X-axis in msec. Human 1 vs. Script 1 Repeated

10 Results & Conclusion The outlier from. The difference was 6.4 msec. The m and a represents manual and automatic respectively.

11 Results & Conclusion The histogram of differences between the manually inserted and automatically inserted boundaries for the non-repeated series experiment with 53 words. X-axis in msec. Human 1 vs. Script 1 Non-repeated Human 2 vs. Script 2 Non-repeated The same-speaker-same-criterionValue assumption holds!

12 Results & Conclusion The histogram of differences between the two phoneticians and the two automated scripts for the non-repeated series experiment with 53 words. X-axis in msec. Human 1 vs. Human 2 Non-repeated Script 1 vs. Script 2 Non-repeated

13 Results & Conclusion The summary of the means and the standard deviations of the differences from the two experiments. The numbers are given in msec.

14 Results & Conclusion The automated identification of the boundary (labeled auto) between /s/ and /h/ in the phrase Miss Henry produced by a female native speaker of English. The f and v represent the beginnings of /s/ and the vowel following /h/.

15 References [1] Boersma, Paul Praat, a system for doing phonetics by computer. Glot International 5(9/10). pp [2] Yoon, Kyuchul A production and perception experiment of Korean alveolar fricatives. Speech Sciences. 9(3). pp [3] Yoon, Kyuchul Durational correlates of prosodic categories: The case of two Korean voiceless coronal fricatives. Speech Sciences. 12(1). pp

2. The role of prosody in dialect synthesis and authentication Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

17 Goals 1.Synthesize Masan utterances from matching Seoul utterances by prosody cloning 2.Examine the role of prosody in the authentication of synthetic Masan utterances (Listening experiment)

18 Background Differences among dialects –Segmental differences Fricative differences in the time domain (Lee, 2002) –Busan fricatives have shorter frication/aspiration intervals than for Seoul Fricative differences in the frequency domain (Kim et al., 2002) –The low cutoff frequency of Kyungsang fricatives was higher than for Cholla fricatives (> 1,000 Hz) –Non-segmental or prosodic differences Intonation or fundamental frequency (F0) contour difference Intensity contour difference Segment durational difference Voice quality difference

19 Synthesis Simulating (by prosody cloning) Masan dialect from Seoul dialect The simulated Masan utterances will have –the speech segments of Seoul dialect –the prosody of Masan dialect F0 contour Intensity contour Segmental duration

20 Evaluation Through a listening experiment Stimuli consist of –#1. Authentic, but synthetic, Masan utterance –#2. Seoul utterance with Masan segmental durations (D) –#3. Seoul utterance with Masan F0 contour (F) –#4. Seoul utterance with Masan intensity contour (I) –#5. Seoul utterance with Masan durations and F0 contour (D+F) –#6. Seoul utterance with Masan durations and intensity contour (D+I) –#7. Seoul utterance with Masan F0 contour and intensity contour (F+I) –#8. Seoul utterance with Masan durations, F0 contour and intensity contour (D+F+I) (1) 동대구에 볼 일이 없습니다. (2) 바다에 보물섬이 없다 Listen to Stimuli

21 Prosody transfer (PSOLA algorithm) Three aspects of the prosody –Fundamental frequency (F0) contour –Intensity contour –Segmental durations Pitch-Synchronous OverLap and Add (PSOLA) algorithm (Mouline & Charpentier, 1990) –Implemented in Praat (Boersma, 2005) –Use of a script for semi-automatic segment-by-segment manipulation (Yoon, 2007)

22 Prosody transfer (PSOLA algorithm) Procedures for full prosody transfer –Align segments btw/ Masan and Seoul utterances –Make the segment durations of the two identical –Make the two F0 contours identical –Make the two intensity contours identical

23 Prosody transfer (PSOLA algorithm) Align segments btw/ Masan and Seoul utterances Make the segment durations of the two utterances identical ㅂㅏㄹㅏㅁ “… 바람 …” Masan ㅏㅏ Seoul stretch shrink ㅂㄹㅁ

24 Prosody transfer (PSOLA algorithm) ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan F0 Seoul F0 Make the two F0 contours identical

25 Prosody transfer (PSOLA algorithm) Seoul intensity ㅂㅏㄹㅏㅁ Masan Seoul ㅂㅏㄹㅏㅁ Masan intensity Make the two intensity contours identical

26 Synthetic (simulated) Masan stimulus

27 Synthetic authentic Masan stimulus

28 Listening experiment 16 stimuli (8 + 8) Presented to 13 Masan/Changwon listeners –On a scale of 1 (worst) to 10 (best) –Used Praat ExperimentMFC object –Allowed repetition of stimulus: up to 10 times

29 Listening experiment See Demo

30 Results & Conclusion Histogram of listener responses

31 Results & Conclusion F0 contour transfer 1 … listener responses … 10

32 Results & Conclusion Seoul utterances with Masan prosody D F I DF DI FI DFI Masan

33 Results & Conclusion Main effects of –Segmental durations; F(1,12)=11.53, p=0.005 –F0 contour; F(1,12)=141.12, p= Regression analysis

34 Results & Conclusion Prosody cloning not sufficient for dialect simulation –(Sub)Segmental differences may be at work –Quality of synthetic stimuli F0 contour transfer (from Masan to Seoul) –Most influential on shifting perception from Seoul to Masan utterances

35 References [1] Kyung-Hee Lee, “Comparison of acoustic characteristics between Seoul and Busan dialect on fricatives”, Speech Sciences, Vol.9/3, pp , [2] Hyun-Gi Kim, Eun-Young Lee, and Ki-Hwan Hong, “Experimental phonetic study of Kyungsang and Cholla dialect using power spectrum and laryngeal fiberscope”, Speech Sciences, Vol.9/2, pp.25-47, [3] Kyuchul Yoon, “Imposing native speakers’ prosody on non-native speakers’ utterances: The technique of cloning prosody”, Journal of the Modern British & American Language & Literature, Vol.25(4). pp , [4] E. Moulines and F. Charpentier, “Pitch synchronouswaveform processing techniques for text-to-speech synthesis using diphones”, Speech Communication, 9 5-6, [5] P. Boersma, “Praat, a system for doing phonetics by computer”, Glot International, Vol.5, 9/10, pp , 2005.

3. Synthesis & evaluation of prosodically exaggerated utterances: A preliminary study Kyuchul Yoon School of English Language & Literature Yeungnam University Spring 2008 Joint Conference of KSPS & KASS

37 Contents Synthesis & evaluation of human utterances with exaggerated prosody Synthesis of exaggerated prosody –Useful for presenting native utterances to students –The definition of prosody “exaggeration” –The algorithm Evaluation of exaggerated prosody –Useful for evaluating learner utterances –The algorithm & an experiment

38 Teaching & evaluating prosody Teaching language prosody –The need for “exaggeration” of native utterances –How to define “exaggeration” Evaluating language prosody –Given the native version of an utterance, evaluate learner’s atypical prosody –How to measure the differences btw/ the native and learner utterances

39 Exaggerating native prosody Exaggeration of the F0 contour –One way would be to make the pitch peaks/valleys higher/lower Exaggeration of the intensity contour –One way would be to manipulate the intensity contour of the pitch peaks(or valleys) Exaggeration of the segmental durations –One way would be to manipulate the segmental durations of the pitch peaks(or valleys) See Demo

40 Exaggerating native prosody The fundamental frequency (F0) contour of an utterance Marianna!. F0

41 Exaggerating native prosody Intensity The intensity contour of an utterance Marianna!.

42 Exaggerating native prosody Duration The segmental durations of an utterance Marianna! before and after the exaggeration.

43 Algorithm: prosody exaggeration Definition of prosody exaggeration –F0 contour Make pitch peaks/valleys higher/lower in Hz values –Intensity contour Make pitch peaks higher in dB values –Segmental durations Make pitch peaks longer in times values

44 Algorithm: prosody exaggeration F0

45 Algorithm: prosody exaggeration Intensity

46 Algorithm: prosody exaggeration Durations

47 How Praat script works

48 How Praat script works F0 Intensity Durations

49 How Praat script works Original F0 Durations Intensity F0 Durations

50 Evaluating learner prosody Assumes the existence of the native version Evaluates the learner versions Evaluation of the F0 & intensity contours –Is preceded by duration manipulation: The durations of the matching segments of the two utterances are made identical [3] –Is preceded by F0/intensity normalization & F0 smoothing The mean difference is added/subtracted to/from learner utterance –Is followed by pitch/intensity point-to-point comparison Evaluation of segmental durations –Done without any duration manipulation. Segment-to- segment comparison Evaluation measure: Euclidean distance metric

51 Algorithm: prosody evaluation Before & after duration manipulation native learner before learner after

52 Algorithm: prosody evaluation F0 point-to-point comparison btw/ native and learner native learner after Normalization & smoothing were performed in prior steps

53 Algorithm: prosody evaluation Intensity point-to-point comparison btw/ native and learner native learner after Normalization was performed in prior steps

54 Algorithm: prosody evaluation Duration segment-to-segment comparison btw/ native and learner native learner before P = (p1, p2, p3,..., pn) and Q = (q1, q2, q3,..., qn) in Euclidean n-space Euclidean distance metric for evaluation measure

55 A pilot experiment native learner after D/F/I cloning An ideal case: Three Euclidean distances (Ed) should be minimum Ed1: F0 contour Ed2: Intensity contour Ed3: Segment durations

56 Creation of Stimuli: F0 F0: -100Hz to +100Hz with a 10Hz interval  21 stimuli Evaluation of the stimuli against the F0 contour of the native utterance native learner after D cloning     

57 Creation of Stimuli Intensity: -25dB to +25dB with a 5dB interval  11 stimuli Evaluation of the stimuli against the intensity contour of the native utterance native learner after D cloning +   +

58 Creation of Stimuli Duration: 0.25, 0.50, 0.75, 1.00, 1.50, 2.00, 2.50, 3.00 times the original  8 stimuli Evaluation of the stimuli against the segment durations of the native utterance native learner +  + 

59 Results & Conclusion

60 Results & Conclusion

61 Results & Conclusion

62 Results & Conclusion Prosody exaggeration –Can be a tool for teaching language prosody –Can be used to test measures for evaluating prosody Limitation of the current prosody evaluation –Native utterances should exist to yield measures TTS systems with advanced prosody models could be helpful to process any learner utterances –“Weights” of the three separate measures (F0/intensity/duration) need to be determined Experiments with human evaluators could provide the weights

63 References [1] Boersma, Paul Praat, a system for doing phonetics by computer. Glot International 5(9/10). pp [2] Moulines, E. & F. Charpentier Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9. pp [3] Yoon, K Imposing native speakers' prosody on non-native speakers' utterances: The technique of cloning prosody. Journal of the Modern British & American Language & Literature 25(4). pp

64 4. Determining the weights of prosodic components in prosody evaluation Problem –Raw components vs. Abstracted concepts –F0, intensity, duration vs. Rhythm, tempo, etc. Determine the weights of prosodic components in prosody evaluation –Use raw units: F0, intensity, duration –Use cloning of prosody (problem of unequal number of segments) –Create an “other-things-being-equal” environment –Evaluation of Each raw prosodic component Overall prosodic fluency –Compare & Assess the weights of each component in prosody evaluation

65 Stimuli (4) Determining the weights of prosodic components in prosody evaluation Given (a) model native utterance(s) and its learner version –Human evaluator evaluates the learner utterance in terms of its prosodic fluency = Overall Prosody Score (from the unmodified learner utterance) Manipulate the learner utterance to create an “other-things-being-equal” environment so that the learner utterance is the same as its native version except for –(1) Its F0 contour (learner utterance version 1) –(2) Its intensity contour (learner utterance version 2) –(3) Its segmental durations (learner utterance version 3) Evaluate the manipulated learner utterances –(1) F0 score (from learner version 1) –(2) Intensity score (from learner version 2) –(3) Duration score (from learner version 3) Hypothesis: Overall prosody score =  * (F0 score) +  * (Intensity score) +  * (Duration score) Repeat the evaluation for other utterances from the same learner to solve the equation Verify the coefficients with unevaluated utterances from the same learner If the hypothesis holds, make the prosody evaluation process automatic

66 Stimuli “The dancing queen likes only the apple pies” Native (5061_02) Learner (1047_02) Evaluate overall prosody with respect to the native version (Overall Prosody Score)

67 Stimuli “The dancing queen likes only the apple pies” Native Learner_DI Learner_DF Now has the native durations/intensity. Evaluate F0 contour (F0 Score) Now has the native durations/F0 contour. Evaluate intensity contour (Intensity Score)

68 Stimuli “The dancing queen likes only the apple pies” Native Learner_FI Now has the native F0/intensity. Evaluate segmental durations (Duration Score) Overall prosody score =  * (F0 score) +  * (Intensity score) +  * (Duration score)

69 5. Difference database of prosodic features for automatic prosody evaluation Given (a) model native utterance(s) and its learner version, get difference values of –(1) F0 contour –(2) intensity contour –(3) segmental durations between the two utterances Use techniques & scripts used in –(3) Synthesis & evaluation of prosodically exaggerated utterances Store difference values of each prosodic feature for each learner utterance in a database Use the database to develop algorithms for automatic prosody scoring Pilot study: labeled sentences from KT_K-SEC corpus

70 Pilot data (5) Difference database of prosodic features for automatic prosody evaluation

71 nativelearnernumFramesframeNotimenativedBlearnerdBdiffdB 5053_02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav Sums of squares of diffdB's is Square root of the sums is 205 Intensity difference Pilot data (5) Difference database of prosodic features for automatic prosody evaluation

72 nativelearnernumSegssegNonativeSegIDlearnerSegIDtimeStartnativeDurrationormNativeDurlearnerDurnormDiffDur 5053_02.TextGrid1044_02.TextGrid331SILSIL _02.TextGrid1044_02.TextGrid332dhdh _02.TextGrid1044_02.TextGrid333axax _02.TextGrid1044_02.TextGrid334SILSIL _02.TextGrid1044_02.TextGrid335dddd _02.TextGrid1044_02.TextGrid336aeae _02.TextGrid1044_02.TextGrid337nnnn _02.TextGrid1044_02.TextGrid338ssss _02.TextGrid1044_02.TextGrid339ihih _02.TextGrid1044_02.TextGrid3310ngng _02.TextGrid1044_02.TextGrid3311kkkk Sums of squares of diffDur's is Square root of the sums is 243 Duration difference Pilot data (5) Difference database of prosodic features for automatic prosody evaluation

73 nativelearnernumFramesframeNotimenativeF0learnerF0diffF0 5053_02.wav1044_02.wav undefined----undefined----undefined _02.wav1044_02.wav undefined----undefined----undefined _02.wav1044_02.wav undefined----undefined----undefined _02.wav1044_02.wav undefined----undefined----undefined-- … 5053_02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav _02.wav1044_02.wav … Sums of squares of diffF0's is Square root of the sums is 486 F0 difference Pilot data (5) Difference database of prosodic features for automatic prosody evaluation

74 6. Transforming Korean alveolar lax fricatives into tense Goal –Test factors that distinguish / ㅅ / from / ㅆ / Type of factors –Consonantal: noise durations, center of gravity –Vocalic: formant/bandwidth switching –Prosodic: clone F0/intensity/durations, switch source signals

75 Pilot data (6) Transforming Korean alveolar lax fricatives into tense  사자 vs. 싸자 

76 Pilot data (6) Transforming Korean alveolar lax fricatives into tense 사자 싸자 사자 Prosody: Durations F0 Intensity

77 Pilot data (6) Transforming Korean alveolar lax fricatives into tense 사자 싸자 사자 Prosody + Formants Bandwidths

78 Things to do –Try the reverse: manipulate / ㅆ / to simulate / ㅅ / –Try this with other lax/tense pairs of stops 사  싸, 다  따, 바  빠, 가  까 –Try switching the source signal Listening experiments –[1] Render /ssa/ from /sha/ (1) prosody (2) formant/bandwidth(3) source –(1)+(2): shift?, (1)+(3): shift?, (1)+(2)+(3): shift?, (1)+(2)+undo(1): see effect of (2) only, (1)+(3)+undo(1): see effect of (3) only, (1)+(2)+(3)+undo(1): see the effects of (2) and (3) only –[2] Render /sha/ from /ssa/ (1) prosody(2) formant/bandwidth(3) source –(1)+(2): shift?, (1)+(3): shift?, (1)+(2)+(3): shift?, (1)+(2)+undo(1): ?, (1)+(3)+undo(1): ?, (1)+(2)+(3)+undo(1): ? –[3] Statistical analyses of formants/bandwidths Examine post-consonantal vowels in terms of their formants/bandwidths for any possible intra/inter-consonantal differences Identify the portion of the vowels that contributes to the distinction of lax/tense consonants, e.g ½, ¼ from the vowel onset Design (6) Transforming Korean alveolar lax fricatives into tense

79 7. Gender transformation of utterances Examine male vs. female utterances in terms of prosodic & segmental differences –Identify factors that differ –Refer to Praat’s change gender… under Convert button –Verify with synthesizing Prosody manipulation –F0/intensity/durations/source Segment manipulation –Formant frequencies & bandwidths