Download presentation
Presentation is loading. Please wait.
Published byIris Melton Modified over 9 years ago
1
Kannada Text to Speech Synthesis Systems: Emotion Analysis By D.J. RAVI Research Scholar, JSS Research Foundation, S.J College of Engg, Mysore-06
2
Outline Introduction Phonetic Nature of Kannada language Prosodic Feature Values Time Duration Intensity Pitch Result Analysis Conclusions References
3
Introduction Inclusion of Emotional aspects into speech will improve the Naturalness of speech synthesis system. The different emotions like Sadness, Anger, Happiness are manifested in speech as prosodic elements like Time Duration, Pitch & Intensity. The prosodic values corresponding to different emotions are analyzed at word as well as phonemic level, using speech analysis and manipulation tool PRAAT. This paper presents the emotional analysis of the prosodic features such as time duration, pitch and intensity of Kannada speech.
4
Our Analysis shows that time duration variation for different emotions at word level are: Anger < Neutral < Happiness < Sadness Time Duration is least for Anger and highest for Sadness. Where as Anger > Happiness > Neutral > Sadness Intensity is highest for Anger and least for Sadness. Also the Time Duration variation at phonemic level is large for Vowels compared to Consonants. The Pitch contour is almost flat for Neutral speech hence shows bigger variation for different emotions.
5
Kannada is a Dravidian Language & phonetic in nature having a written form that has direct correspondence to the spoken form. The phonemes are divided into two types: Vowel (swaras) & Consonant (vyanjanas) Kannada language 13 Vowels & 34 basic consonants Phonetic Nature of Kannada language
6
Vowels (Swaras) Independently existing letters Consonants (Vyanjanas) Dependent on vowels to take a independent form of the Consonant.
7
Consonant (Vyanjana) + Vowel (matra) --> Letter (Akshara) Kagunitha : The combination of consonant phoneme and a vowel phoneme produces a syllable. Consonant phoneme + Vowel phoneme = > Syllable
8
A universal character set Provides a unique number for each character in a language Supports all platforms & all the languages Unicode
9
Kannada Unicode
10
Basic units to Word (Pada)
11
ConsonantsBilabialLabio Dental DentalRetroflexPalatalVelarGl ott al vlvd vlvdvlvdvlvdvlvdvl Plosives Un pb tdṭḍ kg As phbh thdhṭhḍh khgh Affricates Un čj As čhjh Nasals m n ṇ ṅ Fricatives s ṣ š h LiquidsLate rals l ḷ Trill r Semi vowels v y Table 1 : The phonemes are categorized according to the method of speech production and articulation The column wise arrangement is according to the manner of articulation, whereas the row wise arrangement is according to the method of speech production. The phonetic nature of the language and the systematic categorization of the alphabet set can be effectively used for analysis and modeling.
12
Prosody as related to language, refers to aspects like rhythm, melody and stress. These features are quantity (duration), stress (intensity) and intonation (pitch). Phonemes need to be categorized into groups based on position and context. Each syllable is broken down into combinations of vowels and consonants. The durational patterns of the resultant phonemes at Word Initial, Medial & Final position are analyzed. Prosody
13
InitialMedialFinal 11 ms 9 ms 8 ms
14
The waveform, pitch contour, time duration and average intensity of the word /ba illi/ (come here) uttered in different emotions, by the same person is shown in Figure 1. From the plot it can be seen that the prosodic features show distinct variation for different emotions in comparison with neutral speech. Prosodic Feature Values
15
Figure 1 shows that the time duration is least for anger and highest for sadness of the sentence / ba illi / ( come here ) for different emotions. In comparison with neutral speech (606ms), the duration of the speech increases for happiness (750ms) and sadness (1.106sec), but it reduces considerably for anger (447ms). Angry < Neutral < Happy < Sadness. The duration pattern varies from person to person, but different emotions show general trends. Time Duration
16
WordsEmotionSpeakers 123 / yelli / (Where) Anger927895 Happiness122126110 Sadness138141121 / appa / (Father) Anger837279 Happiness112101102 Sadness132144129 Table 2 gives the duration of the speech of the three speakers, uttering two words in different emotions, as percentage in terms of neutral speech. Neutral speech is taken as 100% and the duration of speech with each emotion is given, in terms of the duration of neutral speech (% duration = duration with emotion x 100 / neutral duration). It can be seen that even though the percentage is different for the three speakers, the general trend is same for each of the emotions. Table 2: Duration of words (ms) uttered by different speakers in different emotions (% change in comparison with neutral speech)
17
SentenceEmotionninnahesaruenu /ninna hesaru enu/ (What is your name) Anger96.259878 Happiness121.56112.56105.8 Sadness185.62129.26121.65 Table 3 gives the duration of different words (ms) in a sentence, /ninna hesaru enu/ (What is your name) in different emotions, as percentage in terms of neutral speech. Here also it can be seen that different emotions show general trends. Table 3: Duration of different words (ms) in a sentence for different emotions (% change in comparison with neutral speech)
18
EmotionPhonemesTotal Duration (ms) appa Anger85140205430 Neutral132163221516 Happiness173170236579 Sadness233196256685 Table 4 gives the duration values of phonemes in the word / appa / (vowels /a/ and consonant /p/). It can be seen that phonemes also follow the general trend of duration variation for different emotions. Table 4: Duration of Phonemes (ms) in the word /appa/ (father) for different emotions.
19
Figure 2: Duration (ms) change of word /appa/ (father) for different emotions Figure 3: Duration (ms) change of vowels /a/ and consonant /p/ in the word /appa/ (father) with four different emotions
20
SamplesEmotionIntensity / ba illi/ (come here) Anger113.50 Happiness110.90 Sadness98.90 /basava bandidana/ (has basava come) Anger115.26 Happiness100.32 Sadness94.98 From Figure 1, it is seen that anger emotion is articulated with maximum intensity where as sadness has minimum intensity. i.e. Anger > happiness > neutral > sadness. Table 5 confirms that the average intensity variation for different emotions is least for sadness and maximum for anger. Intensity Table 5: Average Intensity (dB) variation for different emotions (% in comparison with neutral speech)
21
SamplesEmotionPitch / ba illi/ (come here) Anger101.970 Happiness100.384 Sadness120.519 / basava bandidana / (has basava come) Anger131.240 Happiness140.320 Sadness142.590 Pitch From Figure 4, Figure 5 & Figure 6 the pitch contour of neutral speech is almost flat and is of minimum value. The following three figures show pitch contours for each emotional type sentence with its corresponding emotionless sentence. Pitch Table 6: Average Pitch (Hz) variation for different emotions (% in comparison with neutral speech)
22
Anger emotionless (Why did you do this) Anger emotion Figure 4 :
23
Happiness Emotion (What a beautiful flower) Happiness Emotionless Figure 5 :
24
Sadness Emotion ( I am extremely unhappy) Sadness Emotionless Figure 4 :
25
Result Analysis For instance to stimulate anger Duration has to be reduced while increasing pitch and intensity. Similarly to stimulate sadness Duration and pitch has to be increased while reducing intensity. Due to the phonetic categorization of the alphabet set, rules need to be framed only for each category of phonemes. The phonemes in each category share similar phonetic features. This reduces the complexity of prosodic modeling as well as the framing of rules for synthesis. Rules can be framed for different phonemes for prosodic modifications from phonemic level analysis.
26
From the manner of articulation of different emotions it can be recognized that, the rise time and fall time can capture a lot of emotion information more than any other prosodic parameter. For anger speech Duration is lowest and intensity is highest. whereas for sadness speech Duration is highest and intensity is lowest.
27
The duration % of different emotions, in comparison with neutral speech, calculated for different words, spoken by different speakers, shows that the duration of words is highest for sadness followed by happiness and neutral and is smallest for anger. The pitch contour is almost flat for neutral. The average pitch value for emotional speech is higher compared to neutral speech. The intensity level of a word is lowest for sadness and highest for anger. The phoneme level analysis on duration shows that it is the vowels that capture the emotional variation more compared to consonants. Conclusions
28
This can be used effectively for framing rules for emotional speech synthesis. Incorporating these durational effects in speech synthesis system, will produce a better speech compared to the system without using this knowledge.
29
References I.R. Murray, M.D. Edgington, D. Campion, etc. “Rule-Based Emotion Synthesis Using Concatenated Speech,” Proc. of ISCA Workshop on Speech and Emotion, Belfast, North Ireland, pp. 173-177, 2000. X X.J. Ma, W. Zhang, W.B. Zhu, etc, “Probability based Prosody Model for Unit Selection,” Proc. of. ICASSP’04, Montreal, Canada, pp. 649-652, May 2004. Pascal van Lieshout, Ph.D. ”PRAAT”, Oral Dynamics Lab V. 4.2.1, October 7, 2003. D.J.Ravi and Sudarshan Patilkulkarni “Kannada Text-To-Speech Systems: Duration Analysis” Proc. of ISCO 2009, Coimbatore. pp. 53. D.J.Ravi and Sudarshan Patilkulkarni “Speaker Dependent Duration Analysis of Vowels and consonants for Kannada Text-To-Speech Systems” Proc.Of NICE 2009, Bangalore. pp. 95-99. D.J.Ravi and Sudarshan Patilkulkarni “Time Duration Variation Analysis of Vowels and Consonants for KannadaText to Speech Systems.” "Journal of Advance Research in Computer Engineering: An International Journal", July to December 2009 Deepa P.Gopinath, Sheeba P.S and Achuthsankar S. Nair, “Emotional Analysis for Malayalam Text to Speech Synthesis Systems” SETIT 2007.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.