Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA 15260 USA.

Slides:

Advertisements

Similar presentations

Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.

Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Outline Why study emotional speech?

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

Emotional Grounding in Spoken Dialog Systems Jackson Liscombe Giuseppe Riccardi Dilek Hakkani-Tür

Detecting missrecognitions Predicting with prosody.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, Institute for Intelligent.

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

Experimental Evaluation of Learning Algorithms Part 1.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Dr. Diane Litman Associate Professor, Computer Science Department and Research.

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

1 Computation Approaches to Emotional Speech Julia Hirschberg

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Usability Evaluation, part 2. REVIEW: A Test Plan Checklist, 1 Goal of the test? Specific questions you want to answer? Who will be the experimenter?

Performance Comparison of Speaker and Emotion Recognition

Diane Litman Learning Research & Development Center

Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Experiences with Undergraduate Research (Natural Language Processing for Educational Applications) Professor Diane Litman University of Pittsburgh.

Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

University of Rochester

Towards Emotion Prediction in Spoken Tutoring Dialogues

For Evaluating Dialog Error Conditions Based on Acoustic Information

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

Speaker Identification:

Low Level Cues to Emotion

Automatic Prosodic Event Detection

Presentation transcript:

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA

Motivation  Bridge Learning Gap between Human Tutors and Computer Tutors  Our Approach: Add emotion prediction and adaptation to ITSPOKE, our Intelligent Tutoring SPOKEn dialogue system

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Human-Computer Excerpt Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27:dammit (ASR: it is) Tutor28 :Could you please repeat that? Student29 :same (ASR: i same) Tutor30 :Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31:zero (ASR: the zero) Tutor32 :Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario Student33:oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34:Fine. Are there any other forces acting on the apple as it falls? Student35:no why are you doing this again (ASR: no y and to it yes) Tutor36:Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37:downward you computer (ASR: downward you computer)

Outline Data and Emotion Annotation Machine Learning Experiments –extract linguistic features from student turns –use different feature sets to predict emotions 19-36% relative reduction of baseline error –comparison with human tutoring

ITSPOKE Dialogue Corpus – 100 spoken tutoring dialogues (physics problems) with ITSPOKE on average, 19.4 minutes and 25 student turns – 20 subjects university students who have never taken college physics and who are native speakers

Emotion Annotation Scheme (Sigdial’04)  ‘Emotion’: emotions/attitudes that may impact learning  Annotation of Student Turns  Emotion Classes negative e.g. uncertain, bored, irritated, confused, sad positive e.g. confident, enthusiastic neutral no weak or strong expression of negative or positive emotion

Example Annotated Excerpt ITSPOKE: What happens to the velocity of a body when there is no force acting on it? Student: dammit (NEGATIVE) ASR: it is ITSPOKE : Could you please repeat that? Student: same (NEUTRAL) ASR: i same

Agreement Study  333 student turns, 15 dialogues  2 annotators (the authors) NegativeNeutralPositive Negative89306 Neutral Positive619

Emotion Classification Tasks  Negative, Neutral, Positive Kappa =.4, Weighted Kappa =.5 Focus of this talk  Negative, Non-Negative Kappa =.5  Emotional, Non-Emotional Kappa =.3  Results on par with prior research Kappas of in (Ang et al. 2002; Narayanan 2002; Shafran et al. 2003)

Feature Extraction per Student Turn  Three feature types 1.Acoustic-prosodic 2.Lexical 3.Identifiers  Research questions –Relative utility of acoustic-prosodic, lexical and identifier features –Impact of speech recognition –Comparison with human tutoring (HLT/NAACL, 2004)

Feature Types (1) Acoustic-Prosodic Features  4 pitch (f0) : max, min, mean, standard dev.  4 energy (RMS) : max, min, mean, standard dev.  4 temporal: turn duration (seconds) pause length preceding turn (seconds) tempo (syllables/second) internal silence in turn (zero f0 frames)  available to ITSPOKE in real time

Feature Types (2) Word Occurrence Vectors  Human-transcribed lexical items in the turn  ITSPOKE-recognized lexical items

Feature Types (3) Identifier Features  student id  student gender  problem id

Machine Learning Experiments  Weka software: Boosted decision trees –gave best results in pilot studies (ASRU 2003)  Baseline: Majority class (neutral)  Methodology: 10 runs of 10-fold cross validation  Evaluation Metric: Accuracy  Datasets: –Agreed (202/333 turns where annotators agreed) –Consensus (all 333 turns after annotators resolved disagreements)

Acoustic-Prosodic vs. Lexical Features (Agreed Turns)  Baseline = 46.52% Feature Set-ident speech55.49% lexical52.66% speech+lexical62.08%  Both acoustic-prosodic (“speech”) and lexical features significantly outperform the majority baseline  Combining feature types yields an even higher accuracy

Adding Identifier Features (Agreed Turns)  Baseline = 46.52% Feature Set-ident+ident speech55.49%62.03% lexical52.66%67.84% speech+lexical62.08%63.52%  Adding identifier features improves all results  With identifier features, lexical information now yields the highest accuracy

Using Automatic Speech Recognition (Agreed Turns)  Baseline = 46.52% Feature Set-ident+ident lexical52.66%67.84% ASR57.95%65.70% speech+lexical62.08%63.52% speech+ASR61.22%62.23%  Surprisingly, using ASR output rather than human transcriptions does not particularly degrade accuracy

Summary of Results (Agreed Turns)

Summary of Results (Consensus Turns) - Using consensus rather than agreed data decreases predictive accuracy for all feature sets, but other observations generally hold

Comparison with Human Tutoring (Agreed Turns) - In human tutoring dialogues, emotion prediction (and annotation) is more accurate and based on somewhat different features

Related Research in Emotional Speech uElicited Speech (Polzin & Waibel 1998; Oudeyer 2002; Liscombe et al. 2003) uNaturally-Occurring Speech (Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003) uOur Work  naturally-occurring tutoring data  analysis of comparable human and computer corpora

Current Directions  Develop adaptive strategies for ITSPOKE –annotate human tutor turns –evaluate ITSPOKE with emotion adaptation  Co-training to address annotation bottleneck –Maeireizo, Litman, and Hwa: Saturday poster

Summary  Recognition of annotated student emotions in spoken computer tutoring dialogues  Feature sets containing acoustic-prosodic, lexical, and/or identifier features yield significant improvements in predictive accuracy compared to majority class baselines –role of differing feature types and speech recognition errors –comparable analysis of human tutoring dialogues –paper contains details regarding two other emotion prediction tasks  This research is a first step towards implementing emotion prediction and adaptation in ITSPOKE

Thank You! Questions?

Example Annotated Excerpt ITSPOKE: What else do you need to know to find the box's acceleration? Student: the direction (NEGATIVE, UNCERTAIN) ASR: add directions ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force (POSITIVE, CONFIDENT) ASR: force ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity (NEGATIVE, UNCERTAIN) ASR: velocity ITSPOKE : Could you please repeat that? Student: velocity (NEGATIVE, IRRITATED) ASR: velocity

Prior Research: Affective Computer Tutoring (Kort and Reilly and Picard., 2001): propose a cyclical model of emotion change during learning; developing a non-dialog computer tutor that will use eye-tracking/facial features to predict emotion and support movement into positive emotions. (Aist and Kort and Reilly and Mostow and Picard, 2002): Adding human-provided emotional scaffolding to an automated reading tutor increases student persistence (Evens et al, 2002): for CIRCSIM: computer dialog tutor for physiology problems; hypothesize adaptive strategies for recognized student emotional states; e.g. if detecting frustration, system should respond to hedges and self-deprecation by supplying praise and restructuring the problem. (de Vicente and Pain, 2002): use human observation about student motivational states in videod interaction with non-dialog computer tutor to develop rules for detection (Ward and Tsukahara, 2003): spoken dialog computer “tutor-support” uses prosodic and contextual features of user turn (e.g. “on a roll”, “lively”, “in trouble”) to infer appropriate response as users remember train stations. Preferred over randomly chosen acknowledgments (e.g. “yes”, “right” “that’s it”, “that’s it ”,… ) (Conati and Zhou, 2004): use Dynamic Bayesian Networks) to reason under uncertainty about abstracted student knowledge and emotional states through time, based on student moves in non-dialog computer game, and to guide selection of “tutor” responses.  Most will be relevant to developing ITSPOKE adaptation techniques

Experimental Procedure  Students take a physics pretest  Students read background material  Students use the web and voice interface to work through up to 10 problems with either ITSPOKE or a human tutor  Students take a post-test