Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Slides:

Advertisements

Similar presentations

1 Inducements–Call Blocking. Aware of the Service?

Advertisements

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.

Chapter 1 The Study of Body Function Image PowerPoint

Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.

Author: Julia Richards and R. Scott Hawley

1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.

1 Balloting/Handling Negative Votes September 11, 2006 ASTM Training Session Bob Morgan Brynn Iwanowski.

UNITED NATIONS Shipment Details Report – January 2006.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

XP New Perspectives on Microsoft Office Word 2003 Tutorial 6 1 Microsoft Office Word 2003 Tutorial 6 – Creating Form Letters and Mailing Labels.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×

Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Year 6 mental test 10 second questions

1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.

Solve Multi-step Equations

REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.

EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.

1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)

2 |SharePoint Saturday New York City

Green Eggs and Ham.

IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.

Identifying Our Own Style Extended DISC ® Personal Analysis.

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.

© 2012 National Heart Foundation of Australia. Slide 2.

1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 6 Ethernet Fundamentals.

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M

25 seconds left…...

1 Using one or more of your senses to gather information.

Januar MDMDFSSMDMDFSSS

Analyzing Genes and Genomes

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Intracellular Compartments and Transport

PSSA Preparation.

Essential Cell Biology

By Rasmussen College. 1. What majors or programs do you offer? 2. What is the average length of your programs? 3. What percentage of your students graduate?

Student Interface for Online Testing Training Module Copyright © 2014 American Institutes for Research. All rights reserved.

Student Interface for Online Testing Training Module Copyright © 2014 American Institutes for Research. All rights reserved.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Towards Emotion Prediction in Spoken Tutoring Dialogues

Automatic Prosodic Event Detection

Presentation transcript:

Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Introduction Why is important to detect/handle emotions? Emotion annotation Classification task Previous work

(Spoken) Tutoring dialogues Education Classroom setting Human (one-on-one) tutoring Computer tutoring (ITS – Intelligent Tutoring Systems) Addressing the learning gap between human and computer tutoring Dialogue-based ITS (Ex: Why2) Improve the language understanding module of ITS Incorporate affective reasoning Connection between learning and student emotional state Adding human-provided emotional scaffolding to a reading tutor increases student persistence (Aist et al., 2002) Improve understanding not only of what student says but also how he says it

Human-Computer Excerpt Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit (ASR: it is) Tutor28 : Could you please repeat that? Student29 : same (ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero (ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again (ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer (ASR: downward you computer) KVL: This text will not be visible on the 9th floor setup. How about just including the student turns and not the tutor turns on the slide?

Affective reasoning Prerequisites Affective reasoning Dialogue-based ITS : Why2 Interaction via speech : ITSPOKE (Intelligent Tutoring SPOKEn dialogue system) Affective reasoning Detect student emotions Handle student emotions

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Back-end is Why2-Atlas system (VanLehn et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

Student emotions Emotion annotation Corpora Perceived, intuitive expressions of emotion Relative to other turns in context and tutoring task 3 Main emotion classes Negative - e.g. uncertain, bored, irritated, confused, sad; (question turns) Positive - e.g. confident, enthusiastic Neutral - no strong expression of negative or positive emotion; (grounding turns) Corpora Human-Human (453 student turns from 10 dialogues) Human-Computer (333 student turns from 15 dialogues)

Annotation example Tutor: Uh let us talk of one car first. Student: ok. (EMOTION = NEUTRAL) Tutor: If there is a car, what is it that exerts force on the car such that it accelerates forward? Student: The engine. (EMOTION = POSITIVE) Tutor: Uh well engine is part of the car, so how can it exert force on itself? Student: um… (EMOTION = NEGATIVE) Student: 34-36 Tutor: 27-29

Classification task 3 Levels of Annotation Granularity Agreed subset NPN - Negative, Positive, Neutral NnN - Negative, Non-Negative positives and neutrals are conflated as Non-Negative EnE - Emotional, Non-Emotional negatives and positives are conflated as Emotional neutrals are Non-Emotional useful for triggering system adaptation (HH corpus analysis) Agreed subset Predict the class of each student turn

Previous work - Features Human-Human 5 feature types Acoustic-prosodic amplitude, pitch, duration Lexical Other automatic Manual Identifiers Combinations Current turn Contextual Local – previous two turns Global – all turns so far Human-Computer 3 feature types Acoustic-prosodic amplitude, pitch, duration Lexical Other automatic Manual Identifiers Combinations

Previous work - Results Litman and Forbes, ACL 2004

How to improve? Use word-level features instead of turn-level features Extend the pitch features set Simplified word-level emotion model

Why word-level features? Emotion might not be expressed over the entire turn “This is great” Angry Happy

Why word-level features? (2) Can approximate pitch contour better at sub-turn levels. Especially for longer turns This is great

Extended pitch features set Previous work Min, Max Avg, Stdev Extend with Start, End Regression coefficient and regression error Quadratic regression coefficient from Batliner et al. 2003

But wait… ? Features Student turn … … Machine learning 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Turn emotional class Turn-level Word-level Word 1 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 ? … … Turn emotional class Word n 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Machine learning 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Sönmez et al., 1998

Word-level emotion model Features Machine learning Student turn 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Turn emotional class Turn-level Word-level Word 1 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Word-level emotion … … … Turn emotional class Word n 321654615, asdakd, 342.234234 Asdhkas, a34334, 324,7657755 Word-level emotion

Word-level emotion model Training phase Each word labeled with turn class Extra features to identify the position of the word in the turn (distance in words from the beginning and end of the turn) Learn emotion model at the word level Test phase Predict each word class based on the learned model Use majority/weighted voting to label the turn based on its word classes Ties are broken randomly

Questions to answer Will word level feature work better than turn level features for emotion prediction? Yes If yes, where does the advantage comes from? Better prediction of longer turns Is there a feature set that offers robust performance? Yes. Combination of pitch and lexical features at word level.

Experiments EnE classification, agreed turns Two contrasting corpora Two contrasting learners (WEKA) IB1 – nearest neighbor classifier ADA – boosted decision trees

Feature sets Only pitch and lexical features 6 sets of features Turn level: Lex-Turn – only lexical Pitch-Turn – only pitch PitchLex-Turn – lexical and prosodic Word level: Lex-Word – only lexical + positional Pitch-Word – only pitch + positional PitchLex-Word – lexical and prosodic + positional Baseline: majority class 10 x 10 cross validation

Results – IB1 on HH Word-level features significantly outperform turn-level features Word-level better than turn-level on longer turns Best performers: Lex-Word, PitchLex-Word

Results – ADA on HH Turn-level performance increases a lot Word-level significantly better than turn-level on features sets with pitch Word-level better than turn-level on longer turns but the difference is smaller Best performers: Lex-Turn, Lex-Word, PitchLex-Word

Results – IB1 on HC Word-level features significantly outperform turn-level features Lexical information less helpful than on HH corpus Word-level better than turn-level on longer turns Best performers: Pitch-Word, PitchLex-Word

Results – ADA on HC Difference not significant anymore IB1 better than ADA on word-level features ADA has bigger variance on this corpus Word-level better than turn-level on longer turns but the difference is smaller Best performers: Pitch-Turn, Pitch-Word, PitchLex-Turn, PitchLex-Word

Discussion Lexical features at turn and word-level are similar Performance dependent on corpus and learner Pitch features differ significantly Word-level better than turn-level (4/6) PitchLex-Word a consistent best performer Our best accuracies comparable with previous work

Conclusions & Future work Word-level better than turn-level for emotion prediction Even under a very simple word-level emotion model Word-level better at predicting longer turns PitchLex-Word a consistent best performer Future work: More refined word-level emotion models HMMs Co-training Filter irrelevant words Use the prosodic information left out See if our conclusions generalize on detecting student uncertainty Experiment with other sub-turn units (breath groups)

Feature Extraction per Student Turn Five feature types acoustic-prosodic (1) non acoustic-prosodic lexical (2) other automatic (3) manual (4) identifiers (5) Research questions utility of different features speaker and task dependence

Feature Types (1) Acoustic-Prosodic Features (normalized) 4 pitch (f0) : max, min, mean, standard dev. 4 energy (RMS) : max, min, mean, standard dev. 4 temporal: turn duration (seconds) pause length preceding turn (seconds) tempo (syllables/second) internal silence in turn (zero f0 frames)  available to ITSPOKE in real time

Feature Types (2) Lexical Items word occurrence vector

Feature Types (3) Other Automatic Features: available from ITSPOKE logs Turn Begin Time (seconds from dialog start) Turn End Time (seconds from dialog start) Is Temporal Barge-in (student turn begins before tutor turn ends) Is Temporal Overlap (student turn begins and ends in tutor turn) Number of Words in Turn Number of Syllables in Turn

Feature Types (4) Manual Features: (currently) available only from human transcription Is Prior Tutor Question (tutor turn contains “?”) Is Student Question (student turn contains “?”) Is Semantic Barge-in (student turn begins at tutor word/pause boundary) Number of Hedging/Grounding Phrases (e.g. “mm-hm”, “um”) Is Grounding (canonical phrase turns not preceded by a tutor question) Number of False Starts in Turn (e.g. acc-acceleration)