Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Slides:



Advertisements
Similar presentations
Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation
Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Results: Word prominence detection models Each feature set increases accuracy over the 69% baseline accuracy. Word Prominence Detection using Robust yet.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Evaluation.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Evaluation.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CLASSIFICATION: Ensemble Methods
Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Domain Act Classification using a Maximum Entropy model Lee, Kim, Seo (AAAI unpublished) Yorick Wilks Oxford Internet Institute and University of Sheffield.
Diane Litman Learning Research & Development Center
Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.
Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Experiences with Undergraduate Research (Natural Language Processing for Educational Applications) Professor Diane Litman University of Pittsburgh.
Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.
Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.
circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.
Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.
Language Identification and Part-of-Speech Tagging
Towards Emotion Prediction in Spoken Tutoring Dialogues
Automatic Hedge Detection
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Low Level Cues to Emotion
Presentation transcript:

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour (e.g. captures the big changes in uttering the word “great.”) Word Level Previous work uses mostly features computed over the entire turn. Efficient but offers a coarse approximation of the pitch contour. Turn Level Problems classifying the overall turn emotion (One prediction) Turn-level is simple: Labeling granularity = turn One set of features per turn Example student turn: “The force of the truck” Turn-level speech: Overall turn prediction: (One set) Uncertain Turn-level feature set: “t he force of the truck” …… extract predict The force of the truck Word-level is more complicated: Label granularity mismatch: label at turn level, features at word level Variable number of features per turn Turn-level speech: Overall turn prediction: (Five sets) Uncertain Word-level feature set: extract predict The force of the truck ………… “the”“force”“of”“the”“truck” … (One prediction) Techniques to solve this problem Technique 1: Word-level emotion model (WLEM) Train: word-level model with turn’s emotion label Predict: emotion label of each word Combine: majority voting of predictions Overall turn prediction: Non-uncertain (3/5) Word-level feature set: ………… “the”“force”“of”“the”“truck” … Non-uncertain Uncertain Non-uncertain Word-level predictions: Issues: Turn  Word level labeling assumption Majority voting is a very simple scheme predict combine Combine: Concatenate features from 3 words (first, middle, last) into a conglomerate feature set Train & predict: turn-level model with turn’s emotion Overall turn prediction: Non-uncertain Word-level feature set: … … … … “the”“force”“of”“the”“truck” … PSSU feature set: Technique 2: Predefined subset of sub-turn units (PSSU) “the”“of”“truck” … … … Predict combine Corpus ITSPOKE dialogues Domain: qualitative physics tutoring Backend: WHY2-Atlas, Sphinx2 speech recognition, Cepstral text-to-speech PreviousCurrent # of turns # of words words/turns Emotional classification Emotional/ Non-emotional Uncertain/ Non-uncertain Class distribution 129/91 (E/nE)2189/7665 (U/nU) Baseline58.64%77.79% [1] showed that the WLEM method works better than turn-level Used in [2] at breath-group level but not at word level Recall/Precision Overall prediction accuracy Experimental Results Future work Turn-levelWord-level (WLEM) Word-level (PSSU) (0.09)82.53 (0.07)84.11 (0.05) Comparison of recall and precision for predicting uncertain turns Turn-level: Medium recall/precision WLEM: Best recall, lowest precision Tends to over-generalize PSSU: Good recall, best precision Much less over-generalization, overall best choice WLEM word-level slightly improves upon turn-level (+0.56%) PSSU word-level show a much better improvement (+2.14%) Overall, PSSU is best according to this metric as well Affective computing – direction for improving spoken dialogue systems Emotion detection (prediction) Emotion handling Detecting emotion: train a classifier on features extracted from user turns. Types of features: Amplitude Pitch Lexical Duration We concentrate on Pitch features to detect Uncertainty Baseline: 77.79% [1] M. Rotaru and D. Litman, "Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues," Proceedings of Interspeech, [2] J. Liscombe, J. Hirschberg, and J. J. Venditti, "Detecting Certainness in Spoken Tutorial Dialogues,” Proceedings of Interspeech, Many alterations could further improve these techniques: Annotate each individual word for certainty instead of whole turns Include other features pictured above: lexical, amplitude, etc. Try predicting in a human-human dialogue context Better combination techniques (e.g. confidence weighting) More selective choices for PSSU than the middle word of the turn (e.g. longest word in the turn, ensuring the word chosen has domain-specific content) Approximations of pitch contours ? Approximation of pitch contour Poster by Greg Nicholas. Adapted from paper by Greg Nicholas, Mihai Rotaru, & Diane Litman Issues: Might lose details from discarded words Corpus comparison with previous study [1]