Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts.

Similar presentations

Presentation on theme: "Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts."— Presentation transcript:

1 Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts and applications, Augsburg University, Germany Applied Computer Science, Bielefeld University, Germany

2 Emotion Recognition System Feature extraction Input Classification Result Training data

3 Research questions 1.Does a large number of features provided to the selection algorithm enable the selection of a better feature set? 2.Which analysis units can be calculated automatically in an online system and still give good results? 3.How do feature sets for acted and realistic data differ?

4 Overview Feature extraction: –Segment length –Feature calculation –Feature selection Databases Results Conclusions

5 Feature extraction

6 Segment length Features are computed over signal segments Difficulty: –Features can be computed more accurate for long segments –Emotions can be short and change quickly Possible segments: –Whole utterances –Larger pauses as segment borders –Words, syllables, word in context (1 or 2 left and right) –Fixed length, e.g. 0.5, 1, 2 seconds

7 Feature calculation Features based on pitch, energy + 1st & 2nd derivatives and 12 MFCCs + 1st & 2nd derivatives Looking at basic values, only minima or maxima, as well as distances, differences, slopes between adjacent extrema Mean, minimum, maximum,... of time segments Some others, such as normalised pitch and pauses Oriented at Oudeyer, 2003

8 Feature selection Correlation-based feature selection from Weka data mining software from University of Waikato, New Zealand (Witten & Frank, 2000) Reduction from 1280 to ~ 90-160 features

9 Databases

10 Acted speech database Database from TU Berlin for emotional speech synthesis (Sendelmeier, 2001) –Recorded from actors –High quality –10 speakers; 20 min –7 emotions Spontaneous speech database SmartKom Database from U. of Munich (Steininger et. al., 2002) –Wizard-of-Oz scenario –Mid quality –~80 speakers; 3h20min net; few emotions exhibited –11 user states

11 Results

12 Which analysis units can be computed automatically in an online system and still give good results?

13 Does a large number of features provided to the selection algorithm yield a better feature set?

14 Does a large number of features provided to the selection algorithm yield a better feature set cont. Reduced feature set almost always better than full feature set Features perform comparable to Batliner et al., 2003, on SmartKom data, but our features are computed completely automatically, while some of theirs were determined manually Selected features are not necessarily those one would expect

15 How do feature sets for acted and realistic data differ? Important features for acted emotions: –Basic pitch –Pauses (for sadness) Important features for WOZ emotions: –MFCCs (mainly low coefficients and 1st derivatives) –Extrema of pitch and energy

16 Conclusions Automatic segment extraction showed not to be a disadvantage Big feature set provided to the selection algorithm might compensate for the disadvantages of completely automatically computed features Feature sets for acted and WOZ emotions overlap little  looking at acted data when building an emotion recognizer for spontaneous emotions may not make sense

Download ppt "Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts."

Similar presentations

Ads by Google