Towards Emotion Prediction in Spoken Tutoring Dialogues Diane Litman, Kate Forbes, and Scott Silliman Learning Research and Development Center and Computer Science Department University of Pittsburgh Pittsburgh, PA 15260 USA
Outline Introduction System and Corpora Pilot Study Summary
Motivation Human tutors listen to both “what” and “how” (e.g. “confident” vs. “uncertain”) Speech supplies acoustic-prosodic information about user state; some spoken dialogue applications already handle “problem” dialogues specially (Ang et al. 2002, Batliner et al. 2003, Litman et al. 2001) Can effectiveness of computer dialogue tutors increase by detecting/adapting to student emotional state (Evens 2001)
ITSPOKE: Intelligent Tutoring SPOKEn Dialogue System “Back-end” is (text-based) Why2-Atlas intelligent tutoring dialogue system (VanLehn et al., 2002) Speech input via Sphinx2 speech recognizer Speech output via Festival text-to-speech synthesizer
ITSPOKE Screen Shot
Parallel Human-Human Corpus Target size - 20 subjects - up to 10 dialogues per subject Size on 5/20/03 - 10 subjects - 86 dialogues - 62 dialogues transcribed - 3066 manually segmented student turns
Human-Human Corpus Transcription and Annotation
Annotating Emotion 14 transcribed dialogues (n=553 student turns) Each student turn was annotated (intuition of 1 coder) with one of 3 general categories: -negative (e.g. ‘uncertain’ or ‘frustrated’): n=141 -positive (e.g. ‘confident’ or ‘certain’): n=167 -neutral/indeterminate: n=248 KVL: first bullet is confusing KVL: You could just put the numbers after the italic bullets above; clearer
Example Annotated Excerpt …6.5 minutes after essay… Tutor: Now this law that force is equal to mass times acceleration, what's this law called? This is uh since this it is a very important basic uh fact uh it is it is a law of physics. Um you have you have read it in the background material. Can you recall it? Student: Um no it was one of Newton's laws but I don't- remember which one. (laugh) (EMOTION = NEGATIVE) Tutor: Right, right- That- is Newton's second law of motion. Student: he I- Ok, because I remember one, two, and three, but I didn't know if there was a different name (EMOTION = POSITIVE) Tutor: Yeah that's right you know Newton was a genius- KVL: This text will not be visible on the 9th floor setup. How about just including the student turns and not the tutor turns on the slide?
Predicting Emotion Ripper (machine learning program) Input: 1) classes to be learned (our 3 emotion categories) 2) names and possible values for a set of features (next slide) 3) training examples with class and feature values (the annotated student turns) Output: an ordered set of if…then rules for classifying future examples
Pilot Machine Learning Results Six turn features - Problem, Student, Duration, StartTime, Transcription, #Words -all features automatically available in real-time Cross-validated error (33.03%) significantly lower than majority class baseline (55.69%) if (duration ≥ 0.65) & (text has “I”) then negative else if (duration ≥ 2.98) then negative else if (duration ≥ 0.93) & (startTime ≥ 297.62) then positive else if (text has “right”) then positive else neutral
Summary and Current Directions 1) Pilot study suggests there are indeed features that can be used to automatically predict emotion in tutoring dialogues Wider variety of features from many knowledge sources (e.g., pitch, amplitude, timing, other acoustic/prosodic, syntactic, semantic, discourse) Reliable Emotion Annotation Guidelines Analysis of Human-Computer Corpus 2) Empirical comparisons with typed tutorial dialogues (Building Educational Applications Using NLP paper)
Text-Feature Ruleset 1 Feature: Text in Turn Figure 2: Text-Feature Ruleset for Emotion Prediction (excerpt from 21 rules)) if (text has “the”) & (text has “don't”) then negative else if (text has “I”) & (text has “don't”) then negative … else if (text has “um”) & (text has “<hn>”) then negative else if (text has “the”) & (text has “<fs>”) then negative else if (text has “right”) then positive else if (text has “so”) then positive else if (text has “(laugh)”) & (text has “that's”) then positive else neutral Estimated mean error and standard deviation: 39.03% +/- 2.40%, based on 25-fold cross-validation