Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Slides:

Advertisements

Similar presentations

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Advertisements

Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

Detecting missrecognitions Predicting with prosody.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Dr. Diane Litman Associate Professor, Computer Science Department and Research.

A Successful Dialogue without Adaptation S: Hi, this is AT&T Amtrak schedule system. This is Toot. How may I help you? U: I want a train from Baltimore.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

National Taiwan University, Taiwan

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

Diane Litman Learning Research & Development Center

Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Experiences with Undergraduate Research (Natural Language Processing for Educational Applications) Professor Diane Litman University of Pittsburgh.

Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.

Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Speech emotion detection General architecture of a speech emotion detection system: What features?

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Investigating Pitch Accent Recognition in Non-native Speech

Towards Emotion Prediction in Spoken Tutoring Dialogues

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Family History Technology Workshop

Low Level Cues to Emotion

Automatic Prosodic Event Detection

Presentation transcript:

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition

2 Breaking down the (long) title using WORD-LEVEL FEATURES STUDENT EMOTIONS SPOKEN TUTORING DIALOGUES TO PREDICT BETTER

3 Tutoring dialogues Education Classroom setting Human (one-on-one) tutoring Computer tutoring (ITS – Intelligent Tutoring Systems) Addressing the learning gap between human and computer tutoring Improve the language understanding module of ITS Incorporate affective reasoning Adding human-provided emotional scaffolding to a reading tutor increases student persistence (Aist et al., 2002)

Human-Computer Excerpt Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27:dammit (ASR: it is) Tutor28 :Could you please repeat that? Student29 :same (ASR: i same) Tutor30 :Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31:zero (ASR: the zero) Tutor32 :Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario Student33:oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34:Fine. Are there any other forces acting on the apple as it falls? Student35:no why are you doing this again (ASR: no y and to it yes) Tutor36:Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37:downward you computer (ASR: downward you computer)

5 Affective reasoning Prerequisites Dialogue-based ITS : Why2 Interaction via speech : ITSPOKE (Intelligent Tutoring SPOKEn dialogue system) Affective reasoning Detect student emotions Handle student emotions

6 Outline using WORD-LEVEL FEATURES STUDENT EMOTIONS SPOKEN TUTORING DIALOGUES TO PREDICT BETTER

7 Student emotions Emotion annotation 3 Main emotion classes Negative - e.g. uncertain, bored, irritated, confused, sad Positive - e.g. confident, enthusiastic Neutral - no strong expression of negative or positive emotion; Predict EnE (Emotional, Non-Emotional) negatives and positives are conflated as Emotional neutrals are Non-Emotional useful for triggering system adaptation (HH corpus analysis) Corpora Human-Human (453 student turns from 10 dialogues) Human-Computer (333 student turns from 15 dialogues)

8 Tutor: Uh let us talk of one car first. Student: ok. (EMOTION = NEUTRAL; Non-Emotional) Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward? Student: The engine. (EMOTION = POSITIVE; Emotional) Tutor: Uh well engine is part of the car, so how can it exert force on itself? Student: um… (EMOTION = NEGATIVE; Emotional) Annotation example

9 Outline using WORD-LEVEL FEATURES STUDENT EMOTIONS SPOKEN TUTORING DIALOGUES TO PREDICT BETTER

10 Predicting student emotion Lexical (word choice) Prosodic Dialogue context Others Tutor: Uh let us talk of one car first. Student: ok. Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward? Student: The engine. Tutor: Uh well engine is part of the car, so how can it exert force on itself? Student: um… , asdakd, Asdhkas, a34334, 324, Features Previous work Mostly turn-level Very few word-level No comparison between turn-level and word-level

11 Why word-level features? Emotion might not be expressed over the entire turn “This is great” Non-emotional words might wash out the effect of emotional words DispiritedHappy

12 Why word-level features? (2) Can approximate pitch contour better at sub-turn levels. Especially for longer turns This is great

13 Extended pitch features set Previous work Min, Max Avg, Stdev Extend with Start, End Regression coefficient and regression error Quadratic regression coefficient from Batliner et al. 2003

14 But wait… Student turn , asdakd, Asdhkas, a34334, 324, Features Turn emotional class Machine learning Word 1 … Word n , asdakd, Asdhkas, a34334, 324, , asdakd, Asdhkas, a34334, 324, … ? Turn-level Word-level , asdakd, Asdhkas, a34334, 324, Turn emotional class Machine learning Sönmez et al., 1998

15 Word-level emotion model Student turn , asdakd, Asdhkas, a34334, 324, Features Machine learning Word 1 … Word n , asdakd, Asdhkas, a34334, 324, , asdakd, Asdhkas, a34334, 324, … Word-level emotion … Turn-level Word-level Turn emotional class

16 Word-level emotion model Training phase Each word labeled with turn class Extra features to identify the position of the word in the turn (distance in words from the beginning and end of the turn) Learn emotion model at the word level Test phase Predict each word class based on the learned model Use majority/weighted voting to label the turn based on its word classes Ties are broken randomly

17 Outline using WORD-LEVEL FEATURES STUDENT EMOTIONS SPOKEN TUTORING DIALOGUES TO PREDICT BETTER

18 Questions to answer Will word level feature work better than turn level features for emotion prediction? Yes If yes, where does the advantage comes from? Better prediction of longer turns Is there a feature set that offers robust performance? Yes. Combination of pitch and lexical features at word level.

19 Experiment setup Two contrasting corpora Two contrasting learners (WEKA) IB1 – nearest neighbor classifier ADA – boosted decision trees

20 Feature sets Only pitch and lexical features 6 sets of features Turn level: Lex-Turn – only lexical Pitch-Turn – only pitch PitchLex-Turn – lexical and prosodic Word level: Lex-Word – only lexical + positional Pitch-Word – only pitch + positional PitchLex-Word – lexical and prosodic + positional Baseline: majority class 10 x 10 cross validation

21 Results – IB1 on HH Word-level features significantly outperform turn-level features Word-level better than turn-level on longer turns Best performers: Lex-Word, PitchLex-Word

22 Results – ADA on HH Turn-level performance increases a lot Word-level significantly better than turn-level on features sets with pitch Word-level better than turn-level on longer turns but the difference is smaller Best performers: Lex-Turn, Lex-Word, PitchLex-Word

23 Results – IB1 on HC Word-level features significantly outperform turn-level features Lexical information less helpful than on HH corpus Word-level better than turn-level on longer turns Best performers: Pitch-Word, PitchLex-Word

24 Results – ADA on HC Difference not significant anymore IB1 better than ADA on word-level features ADA has bigger variance on this corpus Word-level better than turn-level on longer turns but the difference is smaller Best performers: Pitch-Turn, Pitch-Word, PitchLex-Turn, PitchLex-Word

25 Discussion Lexical features at turn and word-level are similar Performance dependent on corpus and learner Pitch features differ significantly Word-level better than turn-level (4/6) PitchLex-Word a consistent best performer Our best accuracies comparable with previous work

26 Conclusions & Future work Word-level better than turn-level for emotion prediction Even under a very simple word-level emotion model Word-level better at predicting longer turns PitchLex-Word a consistent best performer Future work: More refined word-level emotion models HMMs Co-training Filter irrelevant words Use the prosodic information left out See if our conclusions generalize on detecting student uncertainty Experiment with other sub-turn units (breath groups)

27 Acknowledgements ITSPOKE group Diane Litman, Hua AI, Kate Forbes-Riley, Beatriz Maeireizo- Tokeshi, Amruta Purandare, Scott Silliman, Joel Tetreault, Art Ward NLP group People who heard my presentation so many times

28 Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

29 Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

30 Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

31 Feature Extraction per Student Turn Five feature types acoustic-prosodic (1) non acoustic-prosodic lexical (2) other automatic (3) manual (4) identifiers (5) Research questions utility of different features speaker and task dependence

32 Feature Types (1) Acoustic-Prosodic Features (normalized)  4 pitch (f0) : max, min, mean, standard dev.  4 energy (RMS) : max, min, mean, standard dev.  4 temporal: turn duration (seconds) pause length preceding turn (seconds) tempo (syllables/second) internal silence in turn (zero f0 frames)  available to ITSPOKE in real time

33 Feature Types (2) Lexical Items  word occurrence vector

34 Feature Types (3) Other Automatic Features: available from ITSPOKE logs  Turn Begin Time (seconds from dialog start)  Turn End Time (seconds from dialog start)  Is Temporal Barge-in (student turn begins before tutor turn ends)  Is Temporal Overlap (student turn begins and ends in tutor turn)  Number of Words in Turn  Number of Syllables in Turn

35 Feature Types (4 ) Manual Features: (currently) available only from human transcription  Is Prior Tutor Question (tutor turn contains “?”)  Is Student Question (student turn contains “?”)  Is Semantic Barge-in (student turn begins at tutor word/pause boundary)  Number of Hedging/Grounding Phrases (e.g. “mm- hm”, “um”)  Is Grounding (canonical phrase turns not preceded by a tutor question)  Number of False Starts in Turn (e.g. acc-acceleration)