Download presentation
Presentation is loading. Please wait.
Published byLawrence Robertson Modified over 9 years ago
1
Thai Intonation in Four Emotions Apiluck Tumtavitikul and Kanlayarat Thitikannara Linguistics Dept., Kasetsart University Bangkok, Thailand 10900 email: fhumalt@ku.ac.th fhumalt@ku.ac.th The ICSTLL-39, Seattle, Washington, September 15-17, 2006
2
Tumtavitikul & Thitikarnnara 2006 Introduction Emotions are universally expressed both verbally and non-verbally. Emotional speech is usually coupled with facial expressions and may also include body gestures. Emotions are universally expressed both verbally and non-verbally. Emotional speech is usually coupled with facial expressions and may also include body gestures.
3
Tumtavitikul & Thitikarnnara 2006 Emotion involves our entire being, the brain, the heart beat rate, blood pressure, muscle movements, etc. Emotion involves our entire being, the brain, the heart beat rate, blood pressure, muscle movements, etc. Emotions can be pleasant or unpleasant. Emotions can be pleasant or unpleasant.
4
Tumtavitikul & Thitikarnnara 2006 Our Quests Our Quests How do we express emotions in speech in a tonal language such as Thai? How do we express emotions in speech in a tonal language such as Thai? Other than words, what are the features in speech that are cues for emotions expressed by the speaker and taken up by the listener? Other than words, what are the features in speech that are cues for emotions expressed by the speaker and taken up by the listener? Are these acoustic cues universal or language specific? Are these acoustic cues universal or language specific?
5
Tumtavitikul & Thitikarnnara 2006 Assumptions Intonation is the cue for emotional speech. Intonation is the cue for emotional speech. There is uniqueness in an overall pattern of intonation for each type of emotional speech. There is uniqueness in an overall pattern of intonation for each type of emotional speech.
6
Tumtavitikul & Thitikarnnara 2006 Investigations Investigations Four types of emotions most commonly found in 32 languages; Anger, Surprise, Happiness, Sadness. Four types of emotions most commonly found in 32 languages; Anger, Surprise, Happiness, Sadness. Subjects: two radio performers, male and female aged 30-40 years. Subjects: two radio performers, male and female aged 30-40 years.
7
Tumtavitikul & Thitikarnnara 2006 Test tokens: 4 one-word and 3 multiple- word utterances all said in neutral and in four emotions, five times each, a total of 350 (7x5x5x2) utterances were investigated. Test tokens: 4 one-word and 3 multiple- word utterances all said in neutral and in four emotions, five times each, a total of 350 (7x5x5x2) utterances were investigated.
8
Tumtavitikul & Thitikarnnara 2006 The acoustic measurments and calculations made for each type of utterance in each type of emotion: The acoustic measurments and calculations made for each type of utterance in each type of emotion: - average Fo (Hertz), - average Fo (Hertz), - Fo range (semitone), - Fo range (semitone), - maximum, minimum amplitude (dB), - maximum, minimum amplitude (dB), - average amplitude (dB), - average amplitude (dB), - utterance duration (second), - utterance duration (second), - speaking rate (utterance/second) - speaking rate (utterance/second)
9
Tumtavitikul & Thitikarnnara 2006 Figure 1: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for one-word utterances (a proper noun CVVN).
10
Tumtavitikul & Thitikarnnara 2006 Figure 2: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for one-word utterances (a proper noun CVVN).
11
Tumtavitikul & Thitikarnnara 2006 Figure 3: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for three-word utterances “ คุณแนนมา ”.
12
Tumtavitikul & Thitikarnnara 2006 Figure 4: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for three-word utterances “ คุณแนนมา ”.
13
Tumtavitikul & Thitikarnnara 2006 Results: Average Utterance Duration DurationNeutralAngerSurpriseHappinessSadness Average 0.590.520.490.730.61 S.D. 0.36 (n =70) 0.35 (n = 70) 0.32 (n =70) 0.35 (n=70) 0.42 (n=70) Table 1: Average utterance duration (in second) for all types of utterances combined in neutral speech and in each emotional type; anger, surprise, happiness, and sadness. (male and female combined)
14
Tumtavitikul & Thitikarnnara 2006 Neutral Anger Surprise Happiness Sadness 0.59 0.52 0.49 0.73 0.61 Figure 5:: Average utterance duration (in second) for all types of utterances combined (male and female combined)
15
Tumtavitikul & Thitikarnnara 2006 Speaking Rate happiness > sadness > neutral > anger > surprise slowest ------------> fastest speaking rate (utterance/second)
16
Tumtavitikul & Thitikarnnara 2006 Pitch Range NeutralAngerSurpriseHappinessSadness Average 5.407.115.487.495.49 S.D. 2.51 (n=70) 2.20 (n=70) 2.12 (n=70) 3.40 (n=70) 1.75 (n=70) Table 2: Average Pitch Range (in semi-tone) for all types of utterances combined in neutral speech and in each emotional type; anger, surprise, happiness, and sadness. (male and female combined)
17
Tumtavitikul & Thitikarnnara 2006 Neutral Anger Surprise Happiness Sadness 5.40 7.11 5.48 7.49 5.49 Figure 6: Average Pitch Range (in semi-tone) for all types of utterances combined in neutral speech and in each emotional type; anger, surprise, happiness, and sadness. (male and female combined)
18
Tumtavitikul & Thitikarnnara 2006 Pitch Range happiness > anger > sadness/surprise > neutral largest --------------> smallest pitch range (semitones)
19
Tumtavitikul & Thitikarnnara 2006 Average Pitch Table 3: Average F0 (in Hertz) for one-word utterances (2 proper names and 2 verbs combined) and multiple-word utterances (3, 5 and 6 combined). (male and female) SpeakerUtterancesNeutralAngerSurpriseHappinessSadness mone-word118.75210.35200.12208.70125.94 multiple- word 133.62228.27227.28226.72128.84 fone-word205.67260.04204.67189.74193.53 207.47235.72209.11217.22202.21
20
Tumtavitikul & Thitikarnnara 2006 Average Pitch anger > surprise / happiness > neutral/sadness highest --------------> lowest pitch (Hertz)
21
Tumtavitikul & Thitikarnnara 2006 Maximum & Minimum Amplitude Table 4: Maximum and Minimum Amplitude (dB) for all utterances combined in all types of emotion. (male and female) SpeakerAmplitudeNeutralAngerSurpriseHappinessSadness MaleMaximum 72.8483.6075.4679.0771.54 Minimum 46.5150.7850.4544.4743.10 FemaleMaximum 67.3075.8572.9770.7367.80 Minimum 35.9346.5542.4238.7933.14
22
Tumtavitikul & Thitikarnnara 2006 Figure 7: Maximum and Minimum Amplitude (dB) for all utterances combined in all types of emotion. (male and female) Neutral Anger Surprise Happiness Sadness Max Min 72.8 4 83. 60 67. 30 75. 85 75.4 6 72. 97 79. 07 70. 73 71. 54 67. 80 46. 51 50. 78 35. 93 46.5 5 50. 45 42.4 2 44.4 7 38. 79 43. 10 33. 14
23
Tumtavitikul & Thitikarnnara 2006 Average Amplitude Table 5: Average Amplitude (dB) for all utterances combined in each type of emotions. (male and female) SpeakerAmplitudeNeutralAngerSurpriseHappinessSadness Male X 62.3272.4065.8767.2458.71 S.D. S.D. 2.04 (n=35) 3.10 (n=35) 2.70 (n=35) 3.87 (n=35) 3.16 (n=35) Female X X 57.8066.5862.8961.8854.19 S.D. S.D. 1.82 (n=35) 3.70 (n=35) 3.09 (n=35) 2.72 (n=35) 4.08 (n=35)
24
Tumtavitikul & Thitikarnnara 2006 Figure 8: Average Amplitude (dB) for all utterances combined in each type of emotions. (male and female) 62.3 2 57.8 0 72.4 0 66. 58 65. 87 62. 89 67.2 4 61.8 8 58. 71 54. 19
25
Tumtavitikul & Thitikarnnara 2006 Average Amplitude anger > happiness/surprise > neutral > sadness highest -----------> lowest amplitude (dB)
26
Tumtavitikul & Thitikarnnara 2006 Discussion Luksaneeyanawin (1983) in her studies of Thai intonation found that in expressing anger, the average pitch is either higher (or lower) than neutral speech, with a wider pitch range, either longer (or shorter) duration and a very high degree of loudness. For surprise, the average pitch is higher than neutral speech, with a narrower pitch range, either shorter (or longer) duration and a higher (or lower) degree of loudness. Luksaneeyanawin (1983) in her studies of Thai intonation found that in expressing anger, the average pitch is either higher (or lower) than neutral speech, with a wider pitch range, either longer (or shorter) duration and a very high degree of loudness. For surprise, the average pitch is higher than neutral speech, with a narrower pitch range, either shorter (or longer) duration and a higher (or lower) degree of loudness.
27
Tumtavitikul & Thitikarnnara 2006 Cahn (1988) in her studies of emotions expressed in English intonation found that anger is expressed in a faster and louder speech with a larger pitch range and higher average pitch than in neutral speech. The pitch contour fluctuates more with greatest energy found in higher frequencies. And for grief, the speech is found to be slow with low pitch and weak high frequencies. Cahn (1988) in her studies of emotions expressed in English intonation found that anger is expressed in a faster and louder speech with a larger pitch range and higher average pitch than in neutral speech. The pitch contour fluctuates more with greatest energy found in higher frequencies. And for grief, the speech is found to be slow with low pitch and weak high frequencies.
28
Tumtavitikul & Thitikarnnara 2006 The data found in our two Thai speakers as shown in tables 1-5 and figures 1-8 above are, more or less, in agreement with Luksaneeyanawin (1983) for anger and surprise. Our data are also in agreement with Cahn (1988) for anger and sadness. (Happiness is yet to be compared). The data found in our two Thai speakers as shown in tables 1-5 and figures 1-8 above are, more or less, in agreement with Luksaneeyanawin (1983) for anger and surprise. Our data are also in agreement with Cahn (1988) for anger and sadness. (Happiness is yet to be compared).
29
Tumtavitikul & Thitikarnnara 2006 In sum, for our two Thai speakers, In sum, for our two Thai speakers, Anger is expressed with highest amplitude, highest average pitch, larger pitch range, and faster speaking rate when compared with other emotional types and neutral speech. Anger is expressed with highest amplitude, highest average pitch, larger pitch range, and faster speaking rate when compared with other emotional types and neutral speech. Conclusion
30
Tumtavitikul & Thitikarnnara 2006 Surprise is expressed with a higher average pitch, higher amplitude, and shorter duration than neutral speech. Our data show a smaller pitch range when compared with other emotional types but larger than neutral speech for surprise. Surprise is expressed with a higher average pitch, higher amplitude, and shorter duration than neutral speech. Our data show a smaller pitch range when compared with other emotional types but larger than neutral speech for surprise.
31
Tumtavitikul & Thitikarnnara 2006 Sadness is expressed with lowest average pitch, lowest amplitude, medium average pitch range, and slower speaking rate when compared with other types of emotion. Sadness is expressed with lowest average pitch, lowest amplitude, medium average pitch range, and slower speaking rate when compared with other types of emotion. The average pitch for sadness is comparable to that of neutral speech in Thai. The average pitch for sadness is comparable to that of neutral speech in Thai.
32
Tumtavitikul & Thitikarnnara 2006 Happiness is found to be expressed with slowest speaking rate, largest pitch range, higher average pitch and higher amplitude than neutral speech in our Thai speakers. This is yet to be compared with other data.
33
Tumtavitikul & Thitikarnnara 2006 For Fo contour, there is no uniformity in the present data for each type of the emotional speech observed in either male or female speaker. For Fo contour, there is no uniformity in the present data for each type of the emotional speech observed in either male or female speaker.
34
Tumtavitikul & Thitikarnnara 2006 Unlike the Fo contour of neutral declarative statements (Tumtavitikul and Thitinarnnara 2006) where the Highs and Lows conform to the universal tendency of Fo declination in declarative sentences (Hirst and Di Crito 1988) and the Highs and Lows are clearly rule-based even with focus shifted. Unlike the Fo contour of neutral declarative statements (Tumtavitikul and Thitinarnnara 2006) where the Highs and Lows conform to the universal tendency of Fo declination in declarative sentences (Hirst and Di Crito 1988) and the Highs and Lows are clearly rule-based even with focus shifted.
35
Tumtavitikul & Thitikarnnara 2006 The Highs and Lows in the present data do not form a unified pattern in any type of the four emotions observed. The shapes of the contour vary greatly among the same type of emotion in both male and female speakers. The Highs and Lows in the present data do not form a unified pattern in any type of the four emotions observed. The shapes of the contour vary greatly among the same type of emotion in both male and female speakers.
36
Tumtavitikul & Thitikarnnara 2006 This is not surprising since emotions are subject to mood and attitude as well as environments. Emotional pitch contours may vary greatly within and among individuals (Ladefoged 2006). This is not surprising since emotions are subject to mood and attitude as well as environments. Emotional pitch contours may vary greatly within and among individuals (Ladefoged 2006).
37
Tumtavitikul & Thitikarnnara 2006 It may be induced from the comparable features found in Luksaneeyanawin, our Thai and Cahn’s English data for the three types of emotional speech, anger, surprise and sadness, that emotional speech may have a universal tendency of expression. The unifomity of expression may not be found in the intonation contour per se but in other acoustic cues e.g., speaking rate, F0 range, average F0, amplitude contour.
38
Tumtavitikul & Thitikarnnara 2006 This is readily explainable since emotions which affect humans mentally and psychologically are impulses or natural responses to stimuli.
39
Tumtavitikul & Thitikarnnara 2006 Ververidis, Kotropoulos and Pitas (2004) in their studies of automatic classification of emotional speech in Danish five emotional states; anger, happiness, neutral, sadness, and surprise, reported accuracy rates of classification between 20-52%. Ververidis, Kotropoulos and Pitas (2004) in their studies of automatic classification of emotional speech in Danish five emotional states; anger, happiness, neutral, sadness, and surprise, reported accuracy rates of classification between 20-52%. Implications
40
Tumtavitikul & Thitikarnnara 2006 The set of features used for speech generations in Ververidis, Kotropoulos and Pitas (2004) were mainly abstracted from the pitch and energy contours of natural speech. The set of features used for speech generations in Ververidis, Kotropoulos and Pitas (2004) were mainly abstracted from the pitch and energy contours of natural speech.
41
Tumtavitikul & Thitikarnnara 2006 Vroomen, Collier and Mozziconacci (1993) in their acoustics studies of intonation in emotional speech in Dutch found that intonation and duration are sufficient to express emotions. By re-synthsizing the intonation contour copied from natural speech, with proper time alignments, onto monotonous utterances. The rule-based manipulations of pitch and duration were found to suffice in representing emotions. Vroomen, Collier and Mozziconacci (1993) in their acoustics studies of intonation in emotional speech in Dutch found that intonation and duration are sufficient to express emotions. By re-synthsizing the intonation contour copied from natural speech, with proper time alignments, onto monotonous utterances. The rule-based manipulations of pitch and duration were found to suffice in representing emotions.
42
Tumtavitikul & Thitikarnnara 2006 The parameters in Vroomen, Collier and Mozziconacci (1993) for their synthesis were the type of emotion targeted, excursion size, the key in the register, and the optimal duration of the utterances. The parameters in Vroomen, Collier and Mozziconacci (1993) for their synthesis were the type of emotion targeted, excursion size, the key in the register, and the optimal duration of the utterances.
43
Tumtavitikul & Thitikarnnara 2006 Our findings may have implications for Thai speech technology. Utterances can be manipulated in the dimensions of time, average fundamental frequency and fundamental frequency range as well as amplitude to represent emotional speech according to the type of emotion targeted. Our findings may have implications for Thai speech technology. Utterances can be manipulated in the dimensions of time, average fundamental frequency and fundamental frequency range as well as amplitude to represent emotional speech according to the type of emotion targeted.
44
Tumtavitikul & Thitikarnnara 2006 Conversely, these features can be used as parameters for automatic recognition/classification of emotional speech. Further studies are necessary to derive the rules governing the alignments of these parameters with the utterances. Conversely, these features can be used as parameters for automatic recognition/classification of emotional speech. Further studies are necessary to derive the rules governing the alignments of these parameters with the utterances.
45
Tumtavitikul & Thitikarnnara 2006 Anger Source: Emoticon for Angry (www.danwade.com) > ; - ( http://smiley.smileycentral.com/download
46
Tumtavitikul & Thitikarnnara 2006 Surprise Source: Emoticon for Gasp (www.danwade.com) = - 0 http://smiley.smileycentral.com/download
47
Tumtavitikul & Thitikarnnara 2006 Happiness Source: Emoticon for Smile (www.danwade.com) : - ) http://smiley.smileycentral.com/download
48
Tumtavitikul & Thitikarnnara 2006 Sadness Our suggested emotion symbol for sadness <:-( or :-( http://smiley.smileycentral.com/download
49
Tumtavitikul & Thitikarnnara 2006 Acknowledgement Acknowledgement This paper is a part of the research project, Thai Intonation in Different Emotions, funded by Kasetsat University Research and Development Institute grant no. This paper is a part of the research project, Thai Intonation in Different Emotions, funded by Kasetsat University Research and Development Institute grant no. ส - ข ( มน.) 1.47 to Apiluck Tumtavitikul ส - ข ( มน.) 1.47 to Apiluck Tumtavitikul
50
Tumtavitikul & Thitikarnnara 2006 References Cahn, Janet. 1988. From Sad to Glad: Emotional Computer Voices. Proceedings of Speech Tech '88, Voice Input/ Ouput Applications Conference and Exhibition. New York City. April, 1988. Pages 35- 37. Cahn, Janet. 1988. From Sad to Glad: Emotional Computer Voices. Proceedings of Speech Tech '88, Voice Input/ Ouput Applications Conference and Exhibition. New York City. April, 1988. Pages 35- 37. Dimitrios Ververidis, Constantine Kotropoulos, and Ioannis Pitas. 2004. Automatic Emotional Speech Classification. Proceedings of the International Conference on Acoustics, Speech and Dimitrios Ververidis, Constantine Kotropoulos, and Ioannis Pitas. 2004. Automatic Emotional Speech Classification. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2004. Montreal, Canada. Signal Processing (ICASSP) 2004. Montreal, Canada. Hirst, D. and Di Cristo, A. 1998. A Survey of Intonation Systems. Intonation Systems, ed. by D. Hirst and A. Di Cristo. Cambridge. Camgridge University. Hirst, D. and Di Cristo, A. 1998. A Survey of Intonation Systems. Intonation Systems, ed. by D. Hirst and A. Di Cristo. Cambridge. Camgridge University. Kanlayarat, Thitikannara. 2006. Thai Intonation in Different Kanlayarat, Thitikannara. 2006. Thai Intonation in Different Emotions. Unpublished MA Thesis, Kasetsart University. Emotions. Unpublished MA Thesis, Kasetsart University.
51
Tumtavitikul & Thitikarnnara 2006 Ladfoged, Peter. 2006. A Course in Phonetics, 5th ed. Boston: Thomson Wadsworth. Ladfoged, Peter. 2006. A Course in Phonetics, 5th ed. Boston: Thomson Wadsworth. Luksaneeyanawin, Sudaporn. 1983. Intonation in Thai. Unpublished Doctoral Dissertation, University of Luksaneeyanawin, Sudaporn. 1983. Intonation in Thai. Unpublished Doctoral Dissertation, University of Edinburgh. Edinburgh. Tumtavitikul, Apiluck and Thitikannara, Kanlayarat. 2006. Stress and Intontaion in Thai. Journal of Language and Tumtavitikul, Apiluck and Thitikannara, Kanlayarat. 2006. Stress and Intontaion in Thai. Journal of Language and Linguistics 24.2.59-76. Ververidis, Dimitrios and Kotropoulos, Constantine. 2003. A Ververidis, Dimitrios and Kotropoulos, Constantine. 2003. A State of the Art Review on Emotional Speech Databases. (http://poseidon.dsd.auth.gr/ EN/index.html) Vroomen, Jean, Collier, Rene, and Sylie Mozziconacci. 1993. Duration and Intonation in Emotional Speech. Proceedings of Eurospeech 1993 (pp.577–580 ). Berlin, Germany. Vroomen, Jean, Collier, Rene, and Sylie Mozziconacci. 1993. Duration and Intonation in Emotional Speech. Proceedings of Eurospeech 1993 (pp.577–580 ). Berlin, Germany.
52
Thai Intonation in Four Emotions Apiluck Tumtavitikul and Kanlayarat Thitikannara Linguistics Dept., Kasetsart University Bangkok, Thailand 10900 email: fhumalt@ku.ac.th fhumalt@ku.ac.th http://pirun.ku.ac.th/~fhumalt
53
Tumtavitikul & Thitikarnnara 2006 Figure 9: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for five-word utterances “ คุณแนนมาหาคุณ ”..
54
Tumtavitikul & Thitikarnnara 2006 Figure 10: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for five-word utterances “ คุณแนนโทรมาหาคุณ ”.
55
Tumtavitikul & Thitikarnnara 2006 Figure 11: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for six-word utterances “ คุณแนนโทรมาหาคุณ ”.
56
Tumtavitikul & Thitikarnnara 2006 Figure 12: Stylized Fundamental Frequency (in Hertz) and neutralized duration (in Percentage) for six-word utterances “ คุณแนนโทรมาหาคุณ ”.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.