Download presentation
Presentation is loading. Please wait.
Published byIsabel Goodwin Modified over 8 years ago
1
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA
2
2 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions
3
3 Tutorial dialogue systems for STEM domains may close gap between human and computer tutors Dialogue is a natural and hands-free interaction modality Only a few computer tutors are dialogue-based (Forbes-Riley & Litman 2011; D’Mello et al. 2010; Pon-Barry et al. 2006) Performance can be further improved by responding to affect Focus has been on affect common in customer care and information-seeking applications (e.g., annoyance and frustration (Ang et al. 2002)) Less research on student affect that occurs in tutoring (e.g., boredom, confusion, flow (D’Mello et al. 2010)) Background
4
4 This Paper Speech-based detection of student uncertainty and disengagement in qualitative physics tutorial dialogue Both states negative correlate with student learning and user satisfaction (Forbes-Riley & Litman 2012) Both states are focus of speech and language research (e.g., Pon- Barry & Shieber 2011; Schuller et al. 2010; Paek & Ju 2008) Compare and contrast the role of prosody in characterization and prediction UNC (uncertain) versus CER (certain) DISE (disengaged) versus ENG (engaged)
5
5 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions
6
6 ITSPOKE: speech-enhanced and revised version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rose et al. 2002) Spoken Dialogue Computer Tutor
7
7 ITSPOKE: speech-enhanced and revised version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rose et al. 2002) Spoken Dialogue Computer Tutor
8
8 Collected in user study evaluating utility of detecting and adapting to uncertainty (Forbes-Riley & Litman 2011) 7216 turns, 432 dialogues, 72 native English college students with no college physics (6 per student) Turns labeled by 1 annotator. Agreement studies in ITSPOKE corpora on par with prior work (c.f., D’Mello et al., 2008) Spoken Dialogue Corpus Turn LabelTotalPercentKappa Disengaged (DISE)117016%.55 Uncertain (UNC)148321%.62 Uncertain+Disengaged3735%--
9
9 Annotated Dialogue Example ITSPOKE 1 : Let’s begin by looking at the motion of the man and his keys while he’s holding them. How does his velocity compare to that of his keys? Student 1 : same same same [Disengaged, Certain] … ITSPOKE 12 : What are the forces exerted on the man after he releases his keys? Student 12 : gravity?? [Engaged, Uncertain]
10
10 Annotation Distribution over Time UNC is highest at beginning of each dialogue
11
11 Annotation Distribution over Time UNC is highest at beginning of each dialogue DISE increases as session progresses
12
12 Observations Student uncertainty (UNC) and disengagement (DISE) are common in ITSPOKE dialogues Different features and models will be needed to best characterize and predict UNC/CER, and DISE/ENG
13
13 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions
14
14 Prosodic Features From the speech file of each student turn: Experiments with other real-time OpenSmile toolkit features (c.f. Interspeech Paralinguistic Challenge, 2011) have yielded no performance improvements to date Feature TypeFeatures (normalized) Temporalturn duration, prior pause duration Pitchmax, min, mean, std. deviation Energymax, min, mean, std. deviation
15
15 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions
16
16 Descriptive Analysis Hypothesis: prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns For each student: For each feature: Calculate mean over UNC, CER, DISE, ENG turns Paired T-Tests across students for feature means UNC versus CER DISE versus ENG
17
17 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Turn duration.08-.03 Prior pause-1.66*-3.08** Students take significantly longer to answer when DISE versus ENG, and when UNC versus CER
18
18 Temporal Descriptive Analysis Significant (*) prosodic differences exist between UNC versus CER turns, and between DISE versus ENG turns Temporal Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect Turn duration.08-.03 Prior pause-1.66*-3.08** Difference significantly greater for UNC/CER than DISE/ENG
19
19 Pitch Descriptive Analysis Pitch Feature Mean Diff ENG - DISE Mean Diff CER - UNC max f010.91*9.97* min f01.151.25 mean f04.76*4.91* std. dev. f02.89*5.18* Students have lower max and mean pitch, and pitch is more constant, when DISE versus ENG, and when UNC versus CER
20
20 Pitch Descriptive Analysis Pitch Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect max f010.91*9.97* min f01.151.25 mean f04.76*4.91* std. dev. f02.89*5.18** Difference in pitch constancy is significantly greater for UNC/CER than DISE/ENG
21
21 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC max RMS.005.011* min RMS<.001* mean RMS.001.002* std. dev.RMS.001*.003* Students have lower min energy, and energy is more constant, when DISE versus ENG, and when UNC versus CER
22
22 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC max RMS.005.011* min RMS<.001* mean RMS.001.002* std. dev. RMS.001*.003* Only UNC turns are softer than CER turns
23
23 Energy Descriptive Analysis Energy Feature Mean Diff ENG - DISE Mean Diff CER - UNC Sig. Diff Across Affect max RMS.005.011* min RMS<.001* mean RMS.001.002* std. dev.RMS.001*.003** Difference in energy constancy is greater for UNC/CER
24
24 Affect Prediction Hypothesis: prosodic features have differing combined utility for predicting UNC/CER and DISE/ENG turns Machine learning of manual labels with WEKA software J48 decision tree algorithm Cost matrix penalizes for labeling true DISE/UNC as false Analyses Feature usage as % of decisions for which feature queried in learned models Unweighted avg precision / recall via 10-fold cross validation
25
25 Temporal Feature Usage FeatureUncertaintyDisengaged Temporal50%72% turn duration0%23% prior pause50%49% In both models, temporal features are most highly queried; prior pause is the root of both trees DISE model uses temporal features more heavily than UNC Only DISE model includes turn duration, which was not discriminative in isolation (for either state)
26
26 Pitch Feature Usage FeatureUNCDISE Pitch34%16% max f09%4% min f019%4% mean f00%8% std. dev. f06%0% UNC model uses pitch features more heavily than DISE model, and in different relative proportions Min f0 isn’t discriminative in isolation for either state, but is included in both predictive models Std. dev. and mean f0 are discriminative in isolation for both states, but are not included in both predictive models
27
27 Energy Feature Usage UNC and DISE models use energy features in different relative proportions Max RMS is discriminative in isolation for UNC model, but isn’t included in either predictive model Std. dev. RMS is discriminative in isolation for both states, but is only included in DISE model FeatureUNCDISE Energy15%11% max RMS0% min RMS6%1% mean RMS9%4% std. dev. RMS0%6%
28
28 Quantitative Results UNC/CER unweighted avg precision=63%, recall=61% DISE/ENG unweighted avg precision=61%, recall=56% Majority Class Baselines unweighted avg precision=40%, recall 50% (CER) unweighed avg precision= 42%, recall=50% (ENG) NOTE: deployed ITSPOKE model adds non-prosodic features, which further improves performance
29
29 Outline Background ITSPOKE: A Spoken Dialogue System for STEM Extracting Prosodic Features from Affect Annotations Characterizing and Predicting Affect Conclusions & Current Directions
30
30 Conclusions Uncertain and Disengaged turns differ prosodically from Certain or Engaged turns, but the differences depend somewhat on the affect dimension Disengaged turns have longer response times, lower pitch values, and less pitch and energy variation than engaged turns Uncertain turns also are not as loud as certain turns The best combination of prosodic features for affect prediction also depends on the affect dimension Temporal features are most prominent in both models but DISE model uses them more heavily than UNC, while UNC model uses pitch features more heavily than DISE
31
31 Current Directions Replicate prosodic analyses on other corpora to explore whether our findings generalize to other domains Level of Interest (Schuller et. al 2010) Uncertainty (Pon-Barry and Shieber 2011) Implementing best predictive models in ITSPOKE to evaluate the utility of detecting and adapting to uncertainty and disengagement
32
32 Questions? Further Information? www.cs.pitt.edu/~litman/itspoke.html Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.