Download presentation
Presentation is loading. Please wait.
2
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance Technology Center Stuttgart Sony International (Europe) GmbH
3
Advanced Technology Center Stuttgart State of the ArtState of the Art MotivationMotivation GoalGoal ApproachApproach ResultsResults ConclusionsConclusions Future ResearchFuture Research Content
4
Advanced Technology Center Stuttgart State of the Art Database: Professional actors –Not really spontaneous speech. –Exaggerated emotion following stereotypes. Features: Prosody features. –Easy to calculate –Representing only one dimension in the emotional space: Arousal. –Pleasure dimension related to Voice Quality features. State of the Art Motivation Goal Approach Results Conclusions Future Research
5
Advanced Technology Center Stuttgart Activation-Evaluation Theory VERY ACTIVE VERY PASSIVE VERY POSITIVE VERY NEGATIVE Evaluation Voice Quality Activation Prosody happy excited afraid serene terrified bored furious sad pleased interested relaxed disgusted depressed content despairing delighted exhilarated blissful neutral angry State of the Art Motivation Goal Approach Results Conclusions Future Research
6
Advanced Technology Center Stuttgart Acoustic: Based on the speech signal. Ex. Intonation rising or falling, accents, stress. Linguistic (lexical, syntactic, semantic) Ex. syllable accent, sentence structure, etc. Prosody Features “Komm wir spielen” (bored ) “Komm wir spielen” (happy) State of the Art Motivation Goal Approach Results Conclusions Future Research
7
Advanced Technology Center Stuttgart Voice Quality Features Phonatory Quality: auditory qualities that arise from variation in the source signal. Ex. Glottal spectrum Articulatory precision: vocal tract properties. Ex. Formant structure Speech State of the Art Motivation Goal Approach Results Conclusions Future Research
8
Advanced Technology Center Stuttgart Goal Spontaneous Emotion Recognizer, –Language and speaker independent. –Only acoustic information. –No stereotyped speech. New view of automatic emotion recognition. –Need of taken into account, at least, the second emotion dimension. –Relation of the emotional dimensions with different types of features. –Application: emotional space region recognition. State of the Art Motivation Goal Approach Results Conclusions Future Research
9
Advanced Technology Center Stuttgart Sequential classifiers – First classifier: Arousal dimension, prosody features High (= happy/angry) Medium (= neutral) Low (= sad/bored) – Second classifier: Pleasure dimension, quality features Final decision Target scenario: Sony entertainment robot AIBO “One day with AIBO” How to provoke emotions? – Context action – Automatic labeling: Happy, bored, sad, angry and neutral Data: – 14 speakers – ~ 40 commands/emotion Prosody features – Logarithmic F0 & derivative – Energy – Durational aspects – Jitter & tremor Quality features – Formants – Harmonic to noise ratio – Spectral energy distribution – Voice to unvoiced energy ratio – Glottal flow Approach Database Feature Calculation Classification State of the Art Motivation Goal Approach Results Conclusions Future Research
10
Advanced Technology Center Stuttgart SPEAKER DEPENDENT - AROUSAL. Speaker dependent / Prosody Features discrimination in the AROUSAL axis emotions groups according to the position in the axis : – high level:happy + angry – medium level:neutral – low level: bored + sad State of the Art Motivation Goal Approach Results Conclusions Future Research
11
Advanced Technology Center Stuttgart SPEAKER DEPENDENT - AROUSAL. Average recognition rate: 84% No confusion along the arousal dimension. Confusability only with neutral emotion. – Intermediate position – Database properties State of the Art Motivation Goal Approach Results Conclusions Future Research
12
Advanced Technology Center Stuttgart SPEAKER DEPENDENT - PLEASURE. Discrimination between happy and angry: 74% Discrimination between bored and sad: 66% Speaker-dependent happy-angry, sad-bored classification More distance between happy and angry than between sad and bored in the pleasure axis. State of the Art Motivation Goal Approach Results Conclusions Future Research
13
Advanced Technology Center Stuttgart SPEAKER INDEPENDENT - AROUSAL. Average recognition rate: 59.3% Neutral recognition rate close to chance. Need of “real” neutral. State of the Art Motivation Goal Approach Results Conclusions Future Research
14
Advanced Technology Center Stuttgart SPEAKER INDEPENDENT - AROUSAL Original test. (emotional neutrals) 61% New test. (new neutrals) 77% Training with new neutrals State of the Art Motivation Goal Approach Results Conclusions Future Research
15
Advanced Technology Center Stuttgart SPEAKER INDEPENDENT: PLEASURE. Average recognition rate: ~ 60% Quality features very speaker dependent. Discrimination between happy and angry better than between bored and sad. State of the Art Motivation Goal Approach Results Conclusions Future Research
16
Advanced Technology Center Stuttgart Prosody features Arousal –Not enough Quality features Pleasure –Further research needed Application: –Find a place in the emotional space + additional information = emotional state “Pure” neutral very ambiguous. –In general emotional expression very contingent upon environment. Appropriate emotional database crucial Conclusions State of the Art Motivation Goal Approach Results Conclusions Future Research
17
Advanced Technology Center Stuttgart Future Research Speaker independent voice quality features Improvement of the estimation reliability. Different features in different vowels. Pleasure dimension: –Quality features, but also …... some prosody features. –Classification design: speaker dependencies “Speaker identification” Specific models: age, gender, … Feature selection State of the Art Motivation Goal Approach Results Conclusions Future Research
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.