Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Slides:

Advertisements

Similar presentations

Quantitative and Scientific Reasoning Standard n Students must demonstrate the math skills needed to enter the working world right out of high school or.

Advertisements

Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

1 RUNNING a CLASS (2) Pertemuan Matakuliah: G0454/Class Management & Education Media Tahun: 2006.

Computer Science Department Jeff Johns Autonomous Learning Laboratory A Dynamic Mixture Model to Detect Student Motivation and Proficiency Beverly Woolf.

Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Outline Why study emotional speech?

High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.

Detecting missrecognitions Predicting with prosody.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, Institute for Intelligent.

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Cristina Conati Department of Computer Science University of British Columbia Plan Recognition for User-Adaptive Interaction.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

Working Styles Majority Rules Minority (Subcommittee) Averaging Expert Authority – No discussion Authority – with Discussion Consensus.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Dr. Diane Litman Associate Professor, Computer Science Department and Research.

A Successful Dialogue without Adaptation S: Hi, this is AT&T Amtrak schedule system. This is Toot. How may I help you? U: I want a train from Baltimore.

Peer review systems, e.g. SWoRD [1], need intelligence for detecting and responding to problems with students’ reviewing performance E.g. problem localization.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Diane Litman Learning Research & Development Center

Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Experiences with Undergraduate Research (Natural Language Processing for Educational Applications) Professor Diane Litman University of Pittsburgh.

Lexical, Prosodic, and Syntactics Cues for Dialog Acts.

Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Towards Emotion Prediction in Spoken Tutoring Dialogues

Conditional Random Fields for ASR

For Evaluating Dialog Error Conditions Based on Acoustic Information

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

High Frequency Word Entrainment in Spoken Dialogue

Presentation transcript:

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer Science Department University of Pittsburgh

Overview Corpora and Emotion Annotation Scheme student emotional states in spoken tutoring dialogues Analyses our scheme is reliable in our domain our emotion labels can be accurately predicted Motivation incorporating emotional processing can decrease performance gap between human and computer tutors (e.g. Coles, 1999; Aist et al., 2002) Goal implementation of emotion prediction and adaptation in our computer tutoring spoken dialogue system to improve performance

Prior Research on Emotional Speech uActor- or Native-Read Speech Corpora (Polzin and Waibel 1998; Oudeyer 2002; Liscombe et al. 2003) many emotions, multiple dimensions acoustic/prosodic predictors uNaturally-Occurring Speech Corpora (Litman et al. 2001; Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003) Kappas around 0.6; fewer emotions (e.g. E / -E) acoustic/prosodic + additional predictors uFew address the spoken tutoring domain

(Demo: Monday, 4:15pm!)

Spoken Tutoring Corpora  ITSPOKE Computer Tutoring Corpus 105 dialogs (physics problems), 21 subjects  Corresponding Human Tutoring Corpus 128 dialogs (physics problems), 14 subjects  Experimental Procedure 1) Students take a physics pretest 2) Students read background material 3) Students use the web and voice interface to work up to 10 physics problems with ITSPOKE or human tutor 4) Students take a post-test

Emotion Annotation Scheme for Student Turns in Spoken Tutoring Dialogs  ‘Emotion’: emotions/attitudes that may impact learning  Perceived, Intuitive expressions of emotion  Relative to other turns in Context and tutoring Task  3 Main Emotion Classes negative strong expressions of e.g. uncertain, bored, irritated, confused, sad; question turns positive strong expressions of e.g. confident, enthusiastic neutral no strong expression of negative or positive emotion; grounding turns

Emotion Annotation Scheme for Student Turns in Spoken Tutoring Dialogs  3 Minor Classes weak negative weak expressions of negative emotions weak positive weak expressions of positive emotions mixed strong expressions of positive and negative emotions case 1) multi-utterance turns case 2) simultaneous expressions  Specific Emotion Labels: uncertain, confused, confident, enthusastic, …

Annotated Dialog Excerpt: Human Tutoring Corpus Tutor: Suppose you apply equal force by pushing them. Then uh what will happen to their motion ? Student: Um, the one that’s heavier, uh, the acc-acceleration won’t be as great. (WEAK NEGATIVE, UNCERTAIN) Tutor: The one which is… Student: Heavier (WEAK NEGATIVE, UNCERTAIN) Tutor: Well, uh, is that your common- Student: Er I’m sorry, I’m sorry, the one with most mass. (POSITIVE, CONFIDENT) Tutor: (lgh) Yeah, the one with more mass will- if you- if the mass is more and force is the same then which one will accelerate more? Student: Which one will move more? (NEGATIVE, CONFUSED)

Analyses of Emotion Annotation Scheme  2 annotators: 10 human tutoring dialogs, 9 students, 453 student turns  Machine-learning method in (Litman&Forbes, 2003) (HLT/NAACL’04: Tuesday, 2:20pm) learning algorithm: boosted decision trees predictors: acoustic, prosodic, lexical, dialogue, and contextual features  Analyses optimize annotation for: inter-annotator reliability predictability  use for constructing adaptive tutoring strategies to increase student learning

6 Analyses of Emotion Annotation  3 Levels of Annotation Granularity NPN Negative, Positive, Neutral (Litman&Forbes, 2003) NnN Negative, Non-Negative (Lee et al., 2001) positives and neutrals are conflated as Non-Negative EnE Emotional, Non-Emotional (Batliner et al., 2000) negatives and positives are conflated as Emotional neutrals are Non-Emotional  2 Possible Conflations of Minor Classes Minor  Neutral: conflate minor and neutral classes Weak  Main: conflate weak and negative/positive, conflate mixed and neutral classes

Analysis 1a: NPN Minor  Neutral  385/453 agreed turns (84.99%, Kappa 0.68) NegativeNeutralPositive Negative9064 Neutral Positive0515 Predictive accuracy: 84.75% (10x10 cross-validation) Baseline (majority = neutral) accuracy: 72.74% Relative improvement: 44.06%

Analysis 2a: NnN Minor  Neutral  420/453 agreed turns (92.72%, Kappa 0.80) NegativeNon-Negative Negative9010 Non-Negative23330 Predictive accuracy: 86.83% (10 x 10 cross-val) Baseline (majority = nN) accuracy: 78.57% Relative improvement of 38.54%

Analysis 3b: EnE Weak  Main  350/453 agreed turns (77.26%, Kappa 0.55) EmotionalNon-Emotional Emotional16919 Non-Emotional84181 Predictive accuracy: 86.14% (10 x 10 cross-val) Baseline (majority = non-emo) accuracy: 51.71% Relative improvement of 71.30%

Summary of the 6 Analyses KAPPAACCURACYBASELINEREL. IMP. minor  neutral NPN %72.74%44.06% NnN %78.57%38.54% EnE %71.98%46.72% weak  main NPN %53.24%55.71% NnN %72.21%38.61% EnE %51.71%71.30%  Tradeoff: reliability, predictability, annotation granularity

Extensions to the 6 Analyses: Consensus Labeling  Ang et al., 2002: consensus-labeling increases data set to include the difficult student turns  Original annotators revisit disagreements and through discussion try to achieve a consensus label  Consensus: 445/453 turns (99.12%, 8 discarded)  Machine-learning results: predictive accuracy decreases across 6 analyses still better than baseline

Extensions to the 6 Analyses: Including Minor Emotion Classes  Only last 5 dialogs fully annotated for Minor Classes  142/211 agreed turns (67.30%, Kappa 0.54) negw.negneuw.posposmixed neg w.neg neu w.pos pos mixed112104

Extensions to the 6 Analyses: Specific Emotion Labels  Only last 5 dialogs fully annotated  66 turns agreed negative (weak or strong) 45/66 agreed for specific negative label (5) uncertain > confused > bored, sad, irritated (68.18%, Kappa 0.41)  13 turns agreed positive (weak or strong) 13/13 agreed for specific positive label (2) confident > enthusastic (100%, Kappa 1.0)

ITSPOKE Computer Tutoring Corpus ITSPOKE: What else do you need to know to find the box's acceleration? Student: the direction (NEGATIVE, UNCERTAIN) ASR: add directions ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force (POSITIVE, CONFIDENT) ASR: force ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity (NEGATIVE, UNCERTAIN) ASR: velocity ITSPOKE : Could you please repeat that? Student: velocity (NEGATIVE, IRRITATED) ASR: velocity

ITSPOKE Computer Tutoring Corpus  Differences from human tutoring corpus make annotation and prediction more difficult Computer inflexibility limits emotion expression and recognition shorter student turns, no groundings, no questions, no problem references, no student initiative, …

ITSPOKE Computer Tutoring Corpus  (Litman & Forbes-Riley, ACL`04): 333 turns, 15 dialogs, 10 subjects Best reliability and predictability: NnN, weak  main 78% agreed turns (Kappa 0.5) 73% accuracy (RI 36%): subset of predictors Predictability: add log features, word-level features Reliability: strength disagreements across 6 classes can often be viewed as shifted scales Neg weak Neg Neu weak Pos Pos turn 1 A B turn 2 A B turn 3 AB

Conclusions and Current Directions  Emotion annotation scheme is reliable and predictable in human tutoring corpus  Tradeoff between inter-annotator reliability, predictability, and annotation granularity  ITSPOKE corpus shows differences that make annotation and prediction more difficult  Next steps: 1) label human tutor reactions to 6+ analyses of emotional student turns, 2) determine which analyses best trigger adaptation and improve learning, 3) develop adaptive strategies for ITSPOKE

Affective Computing Systems  Emotions play a large role in human interaction (how is as important as what we say) (Cowie et al., 2002; psychology, linguistics, biology) Affective Computing: add emotional processing to spoken dialog systems to improve performance Good adaptation requires good prediction: focus of current work (read or annotated natural speech)  Emotion impacts learning. e.g. poor learning  negative emotions; negative emotions  poor learning (Coles, 1999; psychology studies) Affective Tutoring: add emotional processing to computer tutoring systems to improve performance Non-dialog Typed dialog Spoken dialog Few yet annotate/predict/adapt to emotions in spoken dialogs Adaptive strategies: human tutor, AC research, AT hypotheses

Prior Research: Affective Computer Tutoring (Kort, Reilly and Picard., 2001): propose a cyclical model of emotion change during learning; develop non-dialog computer tutor that uses eye-tracking/ facial features to predict emotion and support change to positive emotions. (Aist, Kort, Reilly, Mostow & Picard, 2002): Adding human emotional scaffolding to automated reading spoken dialog tutor increases student persistence (Evens et al, 2002): CIRCSIM, a computer typed dialog tutor for physiology problems; hypothesize adaptive strategies for recognized student emotional states; e.g. if detecting frustration, system should respond to hedges and self-deprecation by supplying praise and restructuring the problem. (de Vicente and Pain, 2002): use human observation of student motivation in videod interaction with non-dialog computer tutor to develop detection rules. (Ward and Tsukahara, 2003): spoken dialog computer “tutor” uses prosodic/etc features of user turn (e.g. “on a roll”, “lively”, “in trouble”) to infer appropriate response as users recall train stations. Preferred over randomly chosen acknowledgments (e.g. “yes”, “right” “that’s it”, “that’s it ”) (Conati and Zhou, 2004): use Dynamic Bayesian Networks) to reason under uncertainty about abstracted student knowledge and emotional states through time, based on student moves in non-dialog computer game, and to guide selection of “tutor” responses.

Sub-Domain Emotion Annotation: Adaptation Information for ITSPOKE  3 Sub-Domains PHYS emotions pertaining to the physics material being learned e.g. uncertain if “freefall” is correct answer TUT emotions pertaining to the tutoring process: attitudes towards the tutor or being tutored e.g. tired, bored with tutoring session NLP emotions pertaining to ITSPOKE NLP processing e.g. frustrated or amused by speech recognition errors PHYS = main/common strong emotions in human tutoring corpus

Example Adaptation Strategies in ITSPOKE PHYS: EnE if E, ask for student contribution e.g. “Are you ok so far?” NnN Only respond to negative emotions e.g. engage in a sub-dialog to solidify NPN Respond to positives too e.g. if positive and correct, move on NLP: if negative, apologize; redo sound check

Excerpt: Annotated Human-Human Spoken Tutoring Dialogue Tut: The only thing asked is about the force whether the force uh earth pulls equally on sun or not that's the only question Stud: Well I think it does but I don't know why I d-don't I do they move in the same direction I do-don't… (NEGATIVE, CONFUSED) Tut: You see again you see they don't have to move. If a force acts on a body- Stud: It- (WEAK POSITIVE, ENTHUSIASTIC) Tut: It does not mean that uh uh I mean it will um- Stud: If two forces um apply if two forces react on each other then the force is equal it's the Newton’s third law (POSITIVE, CONFIDENT) Tut: Um you see the uh actually in this case the motion is there but it is a little complicated motion this is orbital motion Stud: Mm-hm (WEAK POSITIVE, ENTHUSIASTIC) Tut: And uh just as- Stud: This is the one where they don't touch each other that you were talking about before (MIXED, ENTHUSIASTIC + UNCERTAIN) Tut: Yes just as earth orbits around sun Stud: Mm-hm (NEUTRAL)

Wavesurfer ( H-H Transcription &) Annotation

Perceived Emotion Cues (post-annotation)  Negative Clues: lexical expressions of uncertainty or confusion (Qs, “I don’t know”), disfluencies (“um”, I do-don’t), pausing, rising intonation, slow tempo  Positive Clues: lexical expressions of certainty or confidence, (“right”, “I know”), little pausing, loud speech, fast tempo  Neutral Clues: moderate tempo, loudness, pausing, etc, as well as lexical groundings (“mm- hm”, “ok”)

Analysis 1b: NPN Weak  Main  340/453 agreed turns (75.06%, Kappa 0.60) NegativeNeutralPositive Negative11299 Neutral Positive0547 Predictive accuracy: 79.29% (10 x 10 cross-val) Baseline (majority = neutral) accuracy: 53.24% Relative improvement: 55.71%

Analysis 2b: NnN Weak  Main  403/453 agreed turns (88.96%, Kappa 0.74) NegativeNon-Negative Negative11218 Non-Negative32291 Predictive accuracy: 82.94% (10 x 10 cross-val) Baseline (majority = non-neg) accuracy: 72.21% Relative improvement of 38.61%

Analysis 3a: EnE Minor  Neutral  389/453 agreed turns (85.87%, Kappa 0.67) EmotionalNon-Emotional Emotional10911 Non-Emotional53280 Predictive accuracy: 85.07% (10 x 10 cross-val) Baseline (majority = non-emo) accuracy: 71.98% Relative improvement of 46.72%

Analysis 5: Consensus Labeling  445/453 consensus turns (99.12%, 8 discarded) minor  neutralweak  main neg posnegneupos NPN negnon-negnegnon-neg NnN emonon-emoemonon-emo EnE

ITSPOKE: Intelligent Tutoring SPOKEn Dialogue System  Back-end is text-based Why2-Atlas tutorial dialogue system (VanLehn et al., 2002)  Student speech digitized from microphone input; Sphinx2 speech recognizer  Tutor speech played via headphones or speakers; Cepstral text-to-speech synthesizer

Annotated Dialog Excerpt: Human Tutoring Corpus Tutor: Suppose you apply equal force by pushing them. Then uh what will happen to their motion ? Student: Um, the one that’s heavier, uh, the acc-acceleration won’t be as great. (NEGATIVE, UNCERTAIN) Tutor: The one which is… Student: Heavier (NEGATIVE, UNCERTAIN) Tutor: Well, uh, is that your common- Student: Er I’m sorry, I’m sorry, the one with most mass. (POSITIVE, CONFIDENT) Tutor: (lgh) Yeah, the one with more mass will- if you- if the mass is more and force is the same then which one will accelerate more? Student: Which one will move more? (NEGATIVE, CONFUSED) Tutor: Mm which one will accelerate more? Student: The- the one with the least amount of mass (NEGATIVE, UNCERTAIN)