Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Slides:

Advertisements

Similar presentations

Mihai Rotaru Diane J. Litman DoD Group Meeting Presentation

Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

Detecting missrecognitions Predicting with prosody.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

National Taiwan University, Taiwan

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

Diane Litman Learning Research & Development Center

Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.

Predicting Voice Elicited Emotions

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

Using Prosody to Recognize Student Emotions and Attitudes in Spoken Tutoring Dialogues Diane Litman Department of Computer Science and Learning Research.

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Experience Report: System Log Analysis for Anomaly Detection

Measurement, Quantification and Analysis

Investigating Pitch Accent Recognition in Non-native Speech

Towards Emotion Prediction in Spoken Tutoring Dialogues

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Low Level Cues to Emotion

Automatic Prosodic Event Detection

Evaluation David Kauchak CS 158 – Fall 2019.

Presentation transcript:

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA

2 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions

3 Tutorial dialogue systems for STEM domains may close gap between human and computer tutors Dialogue is a natural and hands-free interaction modality Only a few computer tutors are dialogue-based (Forbes-Riley & Litman 2011; D’Mello et al. 2010; Pon-Barry et al. 2006) Performance can be further improved by responding to affect Focus has been on affect common in customer care and information-seeking applications (e.g., annoyance and frustration (Ang et al. 2002)) Less research on student affect that occurs in tutoring (e.g., boredom, confusion, flow (D’Mello et al. 2010)) Background

4 This Paper Speech-based detection of student uncertainty and disengagement in qualitative physics tutorial dialogue Both states negative correlate with student learning and user satisfaction (Forbes-Riley & Litman 2012) Both states are focus of speech and language research (e.g., Pon- Barry & Shieber 2011; Schuller et al. 2010; Paek & Ju 2008) Compare and contrast the role of prosody in characterization and prediction UNC (uncertain) versus CER (certain) DISE (disengaged) versus ENG (engaged)

5 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions

6 ITSPOKE: speech-enhanced and revised version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rose et al. 2002) Spoken Dialogue Computer Tutor

7 ITSPOKE: speech-enhanced and revised version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rose et al. 2002) Spoken Dialogue Computer Tutor

8 Collected in user study evaluating utility of detecting and adapting to uncertainty (Forbes-Riley & Litman 2011) 7216 turns, 432 dialogues, 72 native English college students with no college physics (6 per student) Turns labeled by 1 annotator. Agreement studies in ITSPOKE corpora on par with prior work (c.f., D’Mello et al., 2008) Spoken Dialogue Corpus Turn LabelTotalPercentKappa Disengaged (DISE)117016%.55 Uncertain (UNC)148321%.62 Uncertain+Disengaged3735%--

9 Annotated Dialogue Example ITSPOKE 1 : Let’s begin by looking at the motion of the man and his keys while he’s holding them. How does his velocity compare to that of his keys? Student 1 : same same same [Disengaged, Certain] … ITSPOKE 12 : What are the forces exerted on the man after he releases his keys? Student 12 : gravity?? [Engaged, Uncertain]

10 Annotation Distribution over Time UNC is highest at beginning of each dialogue

11 Annotation Distribution over Time UNC is highest at beginning of each dialogue DISE increases as session progresses

12 Observations Student uncertainty (UNC) and disengagement (DISE) are common in ITSPOKE dialogues Different features and models will be needed to best characterize and predict UNC/CER, and DISE/ENG

13 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions

14 Prosodic Features From the speech file of each student turn: Experiments with other real-time OpenSmile toolkit features (c.f. Interspeech Paralinguistic Challenge, 2011) have yielded no performance improvements to date Feature TypeFeatures (normalized) Temporalturn duration, prior pause duration Pitchmax, min, mean, std. deviation Energymax, min, mean, std. deviation

15 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions

16 Descriptive Analysis Hypothesis: prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns For each student: For each feature: Calculate mean over UNC, CER, DISE, ENG turns Paired T-Tests across students for feature means UNC versus CER DISE versus ENG

17 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Turn duration Prior pause-1.66*-3.08** Students take significantly longer to answer when DISE versus ENG, and when UNC versus CER

18 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect Turn duration Prior pause-1.66*-3.08** Difference significantly greater for UNC/CER than DISE/ENG

19 Pitch Descriptive Analysis Pitch Feature Mean Diff ENG - DISE Mean Diff CER - UNC max f010.91*9.97* min f mean f04.76*4.91* std. dev. f02.89*5.18* Students have lower max and mean pitch, and pitch is more constant, when DISE versus ENG, and when UNC versus CER

20 Pitch Descriptive Analysis Pitch Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect max f010.91*9.97* min f mean f04.76*4.91* std. dev. f02.89*5.18** Difference in pitch constancy is significantly greater for UNC/CER than DISE/ENG

21 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC max RMS * min RMS<.001* mean RMS * std. dev.RMS.001*.003* Students have lower min energy, and energy is more constant, when DISE versus ENG, and when UNC versus CER

22 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC max RMS * min RMS<.001* mean RMS * std. dev. RMS.001*.003* Only UNC turns are softer than CER turns

23 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect max RMS * min RMS<.001* mean RMS * std. dev.RMS.001*.003** Difference in energy constancy is greater for UNC/CER

24 Affect Prediction Hypothesis: prosodic features have differing combined utility for predicting UNC/CER and DISE/ENG turns Machine learning of manual labels with WEKA software J48 decision tree algorithm Cost matrix penalizes for labeling true DISE/UNC as false Analyses Feature usage as % of decisions for which feature queried in learned models Unweighted avg precision / recall via 10-fold cross validation

25 Temporal Feature Usage FeatureUncertaintyDisengaged Temporal50%72% turn duration0%23% prior pause50%49% In both models, temporal features are most highly queried; prior pause is the root of both trees DISE model uses temporal features more heavily than UNC Only DISE model includes turn duration, which was not discriminative in isolation (for either state)

26 Pitch Feature Usage FeatureUNCDISE Pitch34%16% max f09%4% min f019%4% mean f00%8% std. dev. f06%0% UNC model uses pitch features more heavily than DISE model, and in different relative proportions Min f0 isn’t discriminative in isolation for either state, but is included in both predictive models Std. dev. and mean f0 are discriminative in isolation for both states, but are not included in both predictive models

27 Energy Feature Usage UNC and DISE models use energy features in different relative proportions Max RMS is discriminative in isolation for UNC model, but isn’t included in either predictive model Std. dev. RMS is discriminative in isolation for both states, but is only included in DISE model FeatureUNCDISE Energy15%11% max RMS0% min RMS6%1% mean RMS9%4% std. dev. RMS0%6%

28 Quantitative Results UNC/CER unweighted avg precision=63%, recall=61% DISE/ENG unweighted avg precision=61%, recall=56% Majority Class Baselines unweighted avg precision=40%, recall 50% (CER) unweighed avg precision= 42%, recall=50% (ENG) NOTE: deployed ITSPOKE model adds non-prosodic features, which further improves performance

29 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions

30 Conclusions Uncertain and Disengaged turns differ prosodically from Certain or Engaged turns, but the differences depend somewhat on the affect dimension Disengaged turns have longer response times, lower pitch values, and less pitch and energy variation than engaged turns Uncertain turns also are not as loud as certain turns The best combination of prosodic features for affect prediction also depends on the affect dimension Temporal features are most prominent in both models but DISE model uses them more heavily than UNC, while UNC model uses pitch features more heavily than DISE

31 Current Directions Replicate prosodic analyses on other corpora to explore whether our findings generalize to other domains Level of Interest (Schuller et. al 2010) Uncertainty (Pon-Barry and Shieber 2011) Implementing best predictive models in ITSPOKE to evaluate the utility of detecting and adapting to uncertainty and disengagement

32 Questions? Further Information? Thank You!