Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,

Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker, Günther Palm Stefan Scherer | 24.09.2007 | LREC 2008 Institute of Neural Information Processing Ulm University stefan.scherer@uni-ulm.de

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 2 Motivation Why stress recognition from speech? –Safety and usability purposes –More efficient and natural interfaces –Several existing applications are based on speech only (call center applications) Existing problems: –Existing databases are limited –Stress induced by increasing workload missing –Choice of representative features difficult

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 3 Experimental Setup

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 4 Experimental Setup – Summary Direct planes towards corresponding exit Four types of questions (personal, enumerations, general knowledge, Jeopardy) Difficulty levels differ in plane speed, number of planes and exit sizes Points are earned or lost and current score is color coded One game lasts 10 minutes Self-assessment of experienced stress is questioned three times

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 5 Evaluation and Labeling of Recordings Everybody reacts differently towards stress No common labels available for the recordings → Second labeling experiment to obtain fuzzy labels for each of the recordings

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 6 Evaluation and Labeling of Recordings SpeakerMeanP 25 P 75 Self-Assess.Crashes 135.824471/2/40/4/13 241.925592/4/?0/4/30 345.229.5617/6/81/10/37 431.020401/1/20/2/16 543.225617/8/90/3/28 643.023604/4/60/3/26 731.221371/3/70/1/23 833.221411/1/3-40/0/8 938.023511/1-2/50/6/31 1035.722491/2/50/3/11 1149.631.75657/9/105/9/17 1249.132654/4/?0/5/27 1343.426621/3/46/22/38 1432.122412/5/81/1/26 1541.626562/3/70/2/19

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 7 Evaluation and Labeling of Recordings Spearman correlation tests: – Mean vs. self-assessment – Mean vs. crashes – Self-assessment vs. crashes ρ p-value M vs. SA0.610.01 M vs. C0.680.005 C vs. SA0.400.13

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 8 Automatic Stress Recognition Biologically motivated features –Representing the rate of change of frequency –Representative features –Robust against noisy conditions Echo state networks –Easy to train using direct pseudo inverse method –Using sequential characteristics of features –Robust against noisy conditions

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 9 Utilized Features Motivation –Pitch not always easy to extract –Statistics of Pitch may not suffice –Preliminary experiments show worse performance –Goal: representative features, that do not need to be aggregated over time Modulation spectrum based features –Representing the rate of change of frequency –Extracted at 25 Hz

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 10 Modulation Spectrum Features Rate of change of frequency Standard procedures: FFT and Mel filtering Most prominent energies are observed between 2 and 16 Hz

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 11 Waveform Spectrogram Modulation Spectrogram Time

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 12 Echo State Networks Recurrent artificial neural network Dynamic reservoir represents history → echo state property W out are the connections that need to be adapted using pseudo inverse method

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 13 Experiments and Results No „true“ label → mean for each utterance of all labelers as target 10 fold cross validation Human labelers vs. ESN – ESN outperforms labelers MSEME Labeler 10.2840.421 Labeler 20.1510.281 Labeler 30.2910.422 Labeler 40.2410.384 Labeler 50.2110.365 ESN0.0840.235

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 14 Conclusions Experimental setup to record speech data with different levels of stress Large vocabulary dataset is available (with additional video material and mouse movement data) Method to label the individual stressed utterances by humans Automatic stress recognizer based on recurrent neural networks → outperforming human labelers in accuracy

Emotion Recognition from Speech: Stress Experiment | 24.09.2007 Page 15 Thank you, for your attention!

Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,

Similar presentations

Presentation on theme: "Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,

Similar presentations

Presentation on theme: "Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,"— Presentation transcript:

Similar presentations

About project

Feedback