Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation
The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Detecting missrecognitions Predicting with prosody.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Multimodal Information Analysis for Emotion Recognition
Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.
PIER Research Methods Protocol Analysis Module Hua Ai Language Technologies Institute/ PSLC.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Introduction to Computational Linguistics
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
Emotions from text: machine learning for text-based emotion prediction Cecilia Alm, Dan Roth, Richard Sproat UIUC, Illinois HLT/EMPNLP 2005.
Automatic recognition of discourse relations Lecture 3.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,
Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.
Applying Deep Neural Network to Enhance EMPI Searching
Investigating Pitch Accent Recognition in Non-native Speech
Towards Emotion Prediction in Spoken Tutoring Dialogues
Dialogue-Learning Correlations in Spoken Dialogue Tutoring
Studying Intonation Julia Hirschberg CS /21/2018.
Detecting Prosody Improvement in Oral Rereading
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Discourse Structure in Generation
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs Ai, Litman,Forbes-Riley, Rotaru, Tretreault & Purandare. Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs Ai, Litman,Forbes-Riley, Rotaru, Tretreault & Purandare. - By Satyajeet Shaligram. Emotions in Tutoring systems Confidence and Confusion

Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Intelligent tutoring systems: An intelligent tutoring system (ITS) is any computer system that provides direct customized instruction or feedback to students, i.e. without the intervention of human beings, whilst performing a task. General trend of moving from text based interactive systems to spoken dialogue systems. Provides an arena to apply emotion detection in speech! - What emotions would be particularly interesting? Auto Tutor online! Intelligent tutoring systems: An intelligent tutoring system (ITS) is any computer system that provides direct customized instruction or feedback to students, i.e. without the intervention of human beings, whilst performing a task. General trend of moving from text based interactive systems to spoken dialogue systems. Provides an arena to apply emotion detection in speech! - What emotions would be particularly interesting? Auto Tutor online! Amusement Contempt Contentment Embarrassment Excitement Guilt Pride-in-achievement Relief Satisfaction Sensory pleasure Shame Amusement Contempt Contentment Embarrassment Excitement Guilt Pride-in-achievement Relief Satisfaction Sensory pleasure Shame Anger Disgust Fear Happiness Sadness Surprise Anger Disgust Fear Happiness Sadness Surprise

Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Important questions to consider: 1.Do human’s use such (emotional) information when tutoring students? 2.Does detection of certainness aid in student learning? Important questions to consider: 1.Do human’s use such (emotional) information when tutoring students? 2.Does detection of certainness aid in student learning?

Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Corpus: 1.Human-human spoken dialogues collected for the development of ITSPOKE dialogues from 17 subjects (7 female, 10 male) 3.Student and tutor were each recorded with different microphones and each channel was manually transcribed and segmented into turns student turns (about 400 turns per subject) 5.Averaging 2.3 seconds in length Corpus: 1.Human-human spoken dialogues collected for the development of ITSPOKE dialogues from 17 subjects (7 female, 10 male) 3.Student and tutor were each recorded with different microphones and each channel was manually transcribed and segmented into turns student turns (about 400 turns per subject) 5.Averaging 2.3 seconds in length

Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all.

Annotation: 1.Labels used: uncertain, certain, neutral and mixed 2.Label distribution: 64.2% neutral, 18.4% certain, 13.6% uncertain, 3.8% mixed 3.Inter-labeler agreement: Average kappa score = 0.52 (moderate agreement) 4.The labels used in this study are those from a single labeler? Annotation: 1.Labels used: uncertain, certain, neutral and mixed 2.Label distribution: 64.2% neutral, 18.4% certain, 13.6% uncertain, 3.8% mixed 3.Inter-labeler agreement: Average kappa score = 0.52 (moderate agreement) 4.The labels used in this study are those from a single labeler?

Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Sample annotation:

Tutor responses to student certainness: Dialogue acts: 1.Short answer question (ShortAnsQ) 2.Long answer question (LongAnsQ) 3.Deep answer question (DeepAnsQ) 4.Directives (RD) 5.Restatements or rewordings of student answers (Rst) 6.Tutor hints (Hint) 7.Tutor answers in the face of wrong student failure (Bot) 8.Novel information (Exp) 9.Review of past arguments (Rcp) 10.Direct positive feedback (Pos) 11.Direct negative feedback (Neg) Tutor responses to student certainness: Dialogue acts: 1.Short answer question (ShortAnsQ) 2.Long answer question (LongAnsQ) 3.Deep answer question (DeepAnsQ) 4.Directives (RD) 5.Restatements or rewordings of student answers (Rst) 6.Tutor hints (Hint) 7.Tutor answers in the face of wrong student failure (Bot) 8.Novel information (Exp) 9.Review of past arguments (Rcp) 10.Direct positive feedback (Pos) 11.Direct negative feedback (Neg) Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all.

Tutor responses to student certainness: Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Dialogue acts: 1.Short answer question (ShortAnsQ) 2.Long answer question (LongAnsQ) 3.Deep answer question (DeepAnsQ) 4.Directives (RD) 5.Restatements or rewordings of student answers (Rst) 6.Tutor hints (Hint) 7.Tutor answers in the face of wrong student failure (Bot) 8.Novel information (Exp) 9.Review of past arguments (Rcp) 10.Direct positive feedback (Pos) 11.Direct negative feedback (Neg) Dialogue acts: 1.Short answer question (ShortAnsQ) 2.Long answer question (LongAnsQ) 3.Deep answer question (DeepAnsQ) 4.Directives (RD) 5.Restatements or rewordings of student answers (Rst) 6.Tutor hints (Hint) 7.Tutor answers in the face of wrong student failure (Bot) 8.Novel information (Exp) 9.Review of past arguments (Rcp) 10.Direct positive feedback (Pos) 11.Direct negative feedback (Neg)

Features: Turn features:  57 acoustic-prosodic features  t_cur – those extracted from the current turn only  Fundamental frequency, intensity, speaking rate, turn duration etc  15 in total  t_cxt -  42 features in total  contextual information provided by dialogue history  Tracks how student prosody changes over time  rate of change of t_cur features between current and previous turn  rate of change of t_cur features between current and first turn  if t_cur features have been monotonically increasing over last 3 turns  Total count of dialogue turns, preceding student turns etc. Features: Turn features:  57 acoustic-prosodic features  t_cur – those extracted from the current turn only  Fundamental frequency, intensity, speaking rate, turn duration etc  15 in total  t_cxt -  42 features in total  contextual information provided by dialogue history  Tracks how student prosody changes over time  rate of change of t_cur features between current and previous turn  rate of change of t_cur features between current and first turn  if t_cur features have been monotonically increasing over last 3 turns  Total count of dialogue turns, preceding student turns etc. Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Automatically extracted using Praat!

Features: Breath group features:  Extraction of smaller, more prosodically coherent segmentation  Roughly approximates intonational phrases  Contiguous segments of speech bounded by silence with a minimum length of 200 ms  Average of 2.5 BGs per student turn  15 features extracted per BG (similar to those in the t_cur features set) Features: Breath group features:  Extraction of smaller, more prosodically coherent segmentation  Roughly approximates intonational phrases  Contiguous segments of speech bounded by silence with a minimum length of 200 ms  Average of 2.5 BGs per student turn  15 features extracted per BG (similar to those in the t_cur features set) Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all.

Classification Experiments:  WEKA machine learning software package  Adaboost using C4.5 decision tree learner  Training 90% (6100) and 10% (687)  Classification task: certain, neutral and uncertain Classification Experiments:  WEKA machine learning software package  Adaboost using C4.5 decision tree learner  Training 90% (6100) and 10% (687)  Classification task: certain, neutral and uncertain Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all.

Conclusions:  addition of contextual features aids classification  BGs can be reliably predicted using a semi-automated algorithm  bg_cur performed better than turn_cur  Both types of features contain useful information Conclusions:  addition of contextual features aids classification  BGs can be reliably predicted using a semi-automated algorithm  bg_cur performed better than turn_cur  Both types of features contain useful information Detecting Certainness in Spoken Tutorial Dialogues – Liscombe, Hirschberg et all. Future research:  Studying the relationship between the two feature sets  Annotate corpus for certainness for breath groups.  Inclusion of non-acoustic-prosodic features e.g. lexical features Future research:  Studying the relationship between the two feature sets  Annotate corpus for certainness for breath groups.  Inclusion of non-acoustic-prosodic features e.g. lexical features

Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs Ai, Litman,Forbes-Riley, Rotaru, Tretreault & Purandare. Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs Ai, Litman,Forbes-Riley, Rotaru, Tretreault & Purandare.

Key Idea: “In an application-oriented spoken dialog system where the user and the system complete some specific task together, we believe that user emotions are not only impacted by the factors that come directly from the dialog, but also by the progress of the task, which can be measured by metrics representing system and user performance.” Key Idea: “In an application-oriented spoken dialog system where the user and the system complete some specific task together, we believe that user emotions are not only impacted by the factors that come directly from the dialog, but also by the progress of the task, which can be measured by metrics representing system and user performance.” Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all. Features used: 1.Lexical 2.Prosodic 3.Identification features 4.Dialogue acts 5.Different levels of contextual features Features used: 1.Lexical 2.Prosodic 3.Identification features 4.Dialogue acts 5.Different levels of contextual features Domain or task specific features!

Which emotions to detect? Full blown emotions. Oudeyer, P., “Novel useful features and algorithms for the recognition of emotions in human speech”, in Proc. Speech Prosody, Is this always possible? Relevant? Is collapsing emotions into simpler categories useful? Easier? Which emotions to detect? Full blown emotions. Oudeyer, P., “Novel useful features and algorithms for the recognition of emotions in human speech”, in Proc. Speech Prosody, Is this always possible? Relevant? Is collapsing emotions into simpler categories useful? Easier? Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all.

Corpus:  100 dialogues  2252 student turns & 2854 tutor turns  20 students (Distribution?)  Using the ITSPOKE tutoring system Annotation:  4 tags: certain, uncertain, mixed & neutral  mixed + uncertain -> uncertain & certain + neutral -> ‘not-uncertain’  Kappa for binary distribution = 0.68 Corpus:  100 dialogues  2252 student turns & 2854 tutor turns  20 students (Distribution?)  Using the ITSPOKE tutoring system Annotation:  4 tags: certain, uncertain, mixed & neutral  mixed + uncertain -> uncertain & certain + neutral -> ‘not-uncertain’  Kappa for binary distribution = 0.68 Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all.

Classification:  WEKA software toolkit  Adaboost with J48 decision tree  10 fold cross validation  Automatic feature extraction Classification:  WEKA software toolkit  Adaboost with J48 decision tree  10 fold cross validation  Automatic feature extraction Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all. Features:  Student utterances (treated as bag of words)  automatically extracted prosodic features for pitch, energy duration, tempo, pausing etc.  12 as raw features (from above)  12 as normalized features  24 as running totals and averages Features:  Student utterances (treated as bag of words)  automatically extracted prosodic features for pitch, energy duration, tempo, pausing etc.  12 as raw features (from above)  12 as normalized features  24 as running totals and averages

System/User performance features:  Subtopics such as serve as student performance indicators.  Revisit counts for subtopics  Nested subtopics, as in Grosz & Sidner theory of discourse structure  Depth of a tutoring session, average tutoring depth  Essay revisions -> helps model user satisfaction.  Quality of student answer (correct, incorrect, partially correct)  Percentage of correct answers  Key words counts  Student Pretest scores  Quality of student answers System/User performance features:  Subtopics such as serve as student performance indicators.  Revisit counts for subtopics  Nested subtopics, as in Grosz & Sidner theory of discourse structure  Depth of a tutoring session, average tutoring depth  Essay revisions -> helps model user satisfaction.  Quality of student answer (correct, incorrect, partially correct)  Percentage of correct answers  Key words counts  Student Pretest scores  Quality of student answers Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all.

Results: Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all.

Results: Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all. Ai, Litman, Riley… et all (2006) Liscombe, Hirschberg et al. (2005)

Future directions:  System/user performance features can be generalized to information providing dialogue systems e.g. flight booking dialogue progress -> Number of ‘slots’ filled prior student knowledge -> past experience simple sentences -> low expectation  Apply features to human-human tutoring dialogues  What is the best triggering mechanism for allowing a computer tutor to adapt its dialog? Future directions:  System/user performance features can be generalized to information providing dialogue systems e.g. flight booking dialogue progress -> Number of ‘slots’ filled prior student knowledge -> past experience simple sentences -> low expectation  Apply features to human-human tutoring dialogues  What is the best triggering mechanism for allowing a computer tutor to adapt its dialog? Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs – Ai et all.

The end…well almost…

And finally…The end