Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker,

Slides:



Advertisements
Similar presentations
Chrisantha Fernando & Sampsa Sojakka
Advertisements

Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Context Awareness System and Service SCENE JS Lee 1 An Energy-Aware Framework for Dynamic Software Management in Mobile Computing Systems.
Facial feature localization Presented by: Harvest Jang Spring 2002.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speaker Adaptation for Vowel Classification
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Why is ASR Hard? Natural speech is continuous
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
SOMTIME: AN ARTIFICIAL NEURAL NETWORK FOR TOPOLOGICAL AND TEMPORAL CORRELATION FOR SPATIOTEMPORAL PATTERN LEARNING.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
1 EmuPlayer Music Recommendation System Based on User Emotion Using Vital-sensor KMSF- sunny 親: namachan さん.
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
Presented by Tienwei Tsai July, 2005
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Keystroke Recognition using WiFi Signals
Simulation Of A Cooperative Protocol For Common Control Channel Implementation Prepared by: Aishah Thaher Shymaa Khalaf Supervisor: Dr.Ahmed Al-Masri.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Multimodal Information Analysis for Emotion Recognition
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
Mining Reference Tables for Automatic Text Segmentation Eugene Agichtein Columbia University Venkatesh Ganti Microsoft Research.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Playing GWAP with strategies - using ESP as an example Wen-Yuan Zhu CSIE, NTNU.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
Skeleton Based Action Recognition with Convolutional Neural Network
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
APPLICATION OF CLUSTER ANALYSIS AND AUTOREGRESSIVE NEURAL NETWORKS FOR THE NOISE DIAGNOSTICS OF THE IBR-2M REACTOR Yu. N. Pepelyshev, Ts. Tsogtsaikhan,
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Emotional Intelligence Vivian Tseng, Matt Palmer, Jonathan Fouk Group #41.
Olivier Siohan David Rybach
Topic: Waveforms in Noesis
University of Rochester
ARTIFICIAL NEURAL NETWORKS
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Intent-Aware Semantic Query Annotation
Generalizations of Markov model to characterize biological sequences
Declarative Transfer Learning from Deep CNNs at Scale
Deep Cross-media Knowledge Transfer
Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, Zhen Ren
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Human-object interaction
Artificial Intelligence 2004 Speech & Natural Language Processing
Presenter: Shih-Hsiang(士翔)
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Combating Replay Attacks Against Voice Assistants
Presentation transcript:

Emotion Recognition from Speech: Stress Experiment Stefan Scherer, Hansjörg Hofmann, Malte Lampmann, Martin Pfeil, Steffen Rhinow, Friedhelm Schwenker, Günther Palm Stefan Scherer | | LREC 2008 Institute of Neural Information Processing Ulm University

Emotion Recognition from Speech: Stress Experiment | Page 2 Motivation Why stress recognition from speech? –Safety and usability purposes –More efficient and natural interfaces –Several existing applications are based on speech only (call center applications) Existing problems: –Existing databases are limited –Stress induced by increasing workload missing –Choice of representative features difficult

Emotion Recognition from Speech: Stress Experiment | Page 3 Experimental Setup

Emotion Recognition from Speech: Stress Experiment | Page 4 Experimental Setup – Summary Direct planes towards corresponding exit Four types of questions (personal, enumerations, general knowledge, Jeopardy) Difficulty levels differ in plane speed, number of planes and exit sizes Points are earned or lost and current score is color coded One game lasts 10 minutes Self-assessment of experienced stress is questioned three times

Emotion Recognition from Speech: Stress Experiment | Page 5 Evaluation and Labeling of Recordings Everybody reacts differently towards stress No common labels available for the recordings → Second labeling experiment to obtain fuzzy labels for each of the recordings

Emotion Recognition from Speech: Stress Experiment | Page 6 Evaluation and Labeling of Recordings SpeakerMeanP 25 P 75 Self-Assess.Crashes /2/40/4/ /4/?0/4/ /6/81/10/ /1/20/2/ /8/90/3/ /4/60/3/ /3/70/1/ /1/3-40/0/ /1-2/50/6/ /2/50/3/ /9/105/9/ /4/?0/5/ /3/46/22/ /5/81/1/ /3/70/2/19

Emotion Recognition from Speech: Stress Experiment | Page 7 Evaluation and Labeling of Recordings Spearman correlation tests: – Mean vs. self-assessment – Mean vs. crashes – Self-assessment vs. crashes ρ p-value M vs. SA M vs. C C vs. SA

Emotion Recognition from Speech: Stress Experiment | Page 8 Automatic Stress Recognition Biologically motivated features –Representing the rate of change of frequency –Representative features –Robust against noisy conditions Echo state networks –Easy to train using direct pseudo inverse method –Using sequential characteristics of features –Robust against noisy conditions

Emotion Recognition from Speech: Stress Experiment | Page 9 Utilized Features Motivation –Pitch not always easy to extract –Statistics of Pitch may not suffice –Preliminary experiments show worse performance –Goal: representative features, that do not need to be aggregated over time Modulation spectrum based features –Representing the rate of change of frequency –Extracted at 25 Hz

Emotion Recognition from Speech: Stress Experiment | Page 10 Modulation Spectrum Features Rate of change of frequency Standard procedures: FFT and Mel filtering Most prominent energies are observed between 2 and 16 Hz

Emotion Recognition from Speech: Stress Experiment | Page 11 Waveform Spectrogram Modulation Spectrogram Time

Emotion Recognition from Speech: Stress Experiment | Page 12 Echo State Networks Recurrent artificial neural network Dynamic reservoir represents history → echo state property W out are the connections that need to be adapted using pseudo inverse method

Emotion Recognition from Speech: Stress Experiment | Page 13 Experiments and Results No „true“ label → mean for each utterance of all labelers as target 10 fold cross validation Human labelers vs. ESN – ESN outperforms labelers MSEME Labeler Labeler Labeler Labeler Labeler ESN

Emotion Recognition from Speech: Stress Experiment | Page 14 Conclusions Experimental setup to record speech data with different levels of stress Large vocabulary dataset is available (with additional video material and mouse movement data) Method to label the individual stressed utterances by humans Automatic stress recognizer based on recurrent neural networks → outperforming human labelers in accuracy

Emotion Recognition from Speech: Stress Experiment | Page 15 Thank you, for your attention!