Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.

Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech Emotion Recognition Combining Acoustic and Semantic Analyses Institute for Human-Machine Communication Technische Universität München

Slide -2-   System Overview   Emotional Speech Corpus   Acoustic Analysis   Semantic Analysis   Stream Fusion   Results Outline Outline

Slide -3- System Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -4-   Emotion set: Anger, disgust, fear, joy, neutrality, sadness, surprise   Corpus 1: Practical course   404 acted samples per emotion    13 speakers (1 female)   Recorded within one year   Corpus 2: Driving simulator   5 00 spontaneous emotion samples   200 acted samples (disgust, sadness) Emotional Speech Corpus

Slide -6- Acoustic Analysis   Low-level features   Pitch contour (AMDF, low-pass filtering)   Energy contour   Spectrum   Signal   High-level features   Statistic analysis of contours   Elimination of mean, normalization to standard dev.   Duration of one utterance (1-5 seconds)

Slide -7- Acoustic Analysis   Feature selection (1/2)   Initial set of 200 statistical features   Ranking 1: Single performance of each feature (nearest-mean classifier)   Ranking 2: Sequential Forward Floating Search wrapping by nearest-mean classifier

Slide -8- Acoustic Analysis   Feature selection (2/2)   Top 10 features Acoustic FeatureSFFS-RankSingle Perf. Pitch, maximum gradient131.5 Pitch, standard deviation of distance between reversal points 223.0 Pitch, mean value325.6 Signal, number of zero-crossings416.9 Pitch, standard deviation527.6 Duration of silences, mean value617.5 Duration of voiced sounds, mean value718.5 Energy, median of fall-time817.8 Energy, mean distance between reversal points 919.0 Energy, mean of rise-time1017.6

Slide -9- Acoustic Analysis   Classification   Evaluation of various classification methods 33 features Classifier Error, % Speaker indep.Speaker dep. kMeans57.0527.38 kNN30.4117.39 GMM25.1710.88 MLP26.869.36 SVM23.887.05 ML-SVM18.719.05 Output: Vector of (pseudo-) recognition confidences

Slide -10- Acoustic Analysis   Classification   Multi-Layer Support Vector Machines acoustic feature vector ang, ntl, fea, joy / dis, sur, sad ang, ntl / fea, joy dis, sur / sad ang / ntl fea / joy dis / sur angntlfeajoy sad dissur  No confidence vector to forward to fusion

Slide -12- Semantic Analysis   ASR-Unit   HMM-based   1300 words german vocabulary   No language model   5-best phrase hypotheses   Recognition confidences per word   Example output (first hypothesis): Ican‘tstandthiseverytraytraffic-jam 69.334.672.120.036.115.955.8

Slide -13- Semantic Analysis   Conditions   Natural language   Erroneous speech recognition   Uncertain knowledge   Incomplete knowledge   Superfluous knowledge  Probabilistic spotting approach  Bayesian Belief Networks

Slide -14- Semantic Analysis Bayesian Belief Networks   Acyclic graph of nodes and directed edges   One state variable per node (here states, )   Setting node-dependencies via cond. probability matrices   Setting initial probabilities in root nodes   Observation A causes evidence in a child node (i.e. is known)   Inference to direct parent nodes and finally to root nodes Bayes‘ rule :

Slide -15- Semantic Analysis   Emotion modelling... I I_hateBadAdhorrence first_person Joy Negative Positive Disgust Inputlevel Words Superwords Phrases Super- phrases Disgust I can‘t stand this nasty every tray traffic-jam can‘tstandnasty cannotstandbaddisgusting Interpretation Good Anger Clustering Sequence Handling Clustering Clustering Spotting I_like... Output: Vector of “real“ recognition confidences

Slide -16- System Overview F&F of HMC Overview Speech signal Prosodic features ASR-unit Semantic interpretation (Bayesian Networks) Classifier(SVM) Stream fusion (MLP) Emotion

Slide -17- Stream Fusion   Pairwise mean   Discriminative fusion applying MLP   Input layer: 2 x 7 confidences   Hidden layer: 100 nodes   Output layer: 7 recognition confidences

Slide -18- Results Results Emotion angdisfeajoyntlsadsurMean %95.561.378.775.178.562.168.374.2 Acoustic recognition rates (SVM): Semantic recognition rates: Emotion angdisfeajoyntlsadsurMean %78.471.253.457.756.035.065.559.6

Slide -19- Results Results Emotion angdisfeajoyntlsadsurMean %98.078.788.395.998.291.795.892.0 Recognition rates after discriminative fusion: Acoustic Information Language Information Fusion by means Fusion by MLP %74.259.683.192.0 Overview:

Slide -20- Summary Summary   Acted Emotions   7 discrete emotion categories   Prosodic feature selection via   Singe feature performance   Sequential forward floating search   Evaluative comparision of different classifiers   Outperforming SVMs   Semantic analysis applying Bayesian Networks   Significant gain by discriminative stream fusion

Slide -21-

Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.

Similar presentations

Presentation on theme: "Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.

Similar presentations

Presentation on theme: "Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech."— Presentation transcript:

Similar presentations

About project

Feedback