Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker, Günther Palm Stefan Scherer | | LREC 2008 Institute of Neural Information Processing Ulm University
Emotion Recognition from Speech: Stress Experiment | Page 2 Motivation Why stress recognition from speech? –Safety and usability purposes –More efficient and natural interfaces –Several existing applications are based on speech only (call center applications) Existing problems: –Existing databases are limited –Stress induced by increasing workload missing –Choice of representative features difficult
Emotion Recognition from Speech: Stress Experiment | Page 3 Experimental Setup
Emotion Recognition from Speech: Stress Experiment | Page 4 Experimental Setup – Summary Direct planes towards corresponding exit Four types of questions (personal, enumerations, general knowledge, Jeopardy) Difficulty levels differ in plane speed, number of planes and exit sizes Points are earned or lost and current score is color coded One game lasts 10 minutes Self-assessment of experienced stress is questioned three times
Emotion Recognition from Speech: Stress Experiment | Page 5 Evaluation and Labeling of Recordings Everybody reacts differently towards stress No common labels available for the recordings → Second labeling experiment to obtain fuzzy labels for each of the recordings
Emotion Recognition from Speech: Stress Experiment | Page 6 Evaluation and Labeling of Recordings SpeakerMeanP 25 P 75 Self-Assess.Crashes /2/40/4/ /4/?0/4/ /6/81/10/ /1/20/2/ /8/90/3/ /4/60/3/ /3/70/1/ /1/3-40/0/ /1-2/50/6/ /2/50/3/ /9/105/9/ /4/?0/5/ /3/46/22/ /5/81/1/ /3/70/2/19
Emotion Recognition from Speech: Stress Experiment | Page 7 Evaluation and Labeling of Recordings Spearman correlation tests: – Mean vs. self-assessment – Mean vs. crashes – Self-assessment vs. crashes ρ p-value M vs. SA M vs. C C vs. SA
Emotion Recognition from Speech: Stress Experiment | Page 8 Automatic Stress Recognition Biologically motivated features –Representing the rate of change of frequency –Representative features –Robust against noisy conditions Echo state networks –Easy to train using direct pseudo inverse method –Using sequential characteristics of features –Robust against noisy conditions
Emotion Recognition from Speech: Stress Experiment | Page 9 Utilized Features Motivation –Pitch not always easy to extract –Statistics of Pitch may not suffice –Preliminary experiments show worse performance –Goal: representative features, that do not need to be aggregated over time Modulation spectrum based features –Representing the rate of change of frequency –Extracted at 25 Hz
Emotion Recognition from Speech: Stress Experiment | Page 10 Modulation Spectrum Features Rate of change of frequency Standard procedures: FFT and Mel filtering Most prominent energies are observed between 2 and 16 Hz
Emotion Recognition from Speech: Stress Experiment | Page 11 Waveform Spectrogram Modulation Spectrogram Time
Emotion Recognition from Speech: Stress Experiment | Page 12 Echo State Networks Recurrent artificial neural network Dynamic reservoir represents history → echo state property W out are the connections that need to be adapted using pseudo inverse method
Emotion Recognition from Speech: Stress Experiment | Page 13 Experiments and Results No „true“ label → mean for each utterance of all labelers as target 10 fold cross validation Human labelers vs. ESN – ESN outperforms labelers MSEME Labeler Labeler Labeler Labeler Labeler ESN
Emotion Recognition from Speech: Stress Experiment | Page 14 Conclusions Experimental setup to record speech data with different levels of stress Large vocabulary dataset is available (with additional video material and mouse movement data) Method to label the individual stressed utterances by humans Automatic stress recognizer based on recurrent neural networks → outperforming human labelers in accuracy
Emotion Recognition from Speech: Stress Experiment | Page 15 Thank you, for your attention!