Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

What determines student satisfaction with university subjects? A choice-based approach Twan Huybers, Jordan Louviere and Towhidul Islam Seminar, Institute.

How much data is enough? – Generating reliable policies w/MDP’s Joel Tetreault University of Pittsburgh LRDC July 14, 2006.

The use of a computerized automated feedback system Trevor Barker Dept. Computer Science.

Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Classification and Prediction: Regression Analysis

Click to edit the title text format An Introduction to TuTalk: Developing Dialogue Agents for Learning Studies Pamela Jordan University of Pittsburgh Learning.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Annotating Student Emotional States in Spoken Tutoring Dialogues Diane Litman and Kate Forbes-Riley Learning Research and Development Center and Computer.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Classroom Assessments Checklists, Rating Scales, and Rubrics

Relationship between Physics Understanding and Paragraph Coherence Reva Freedman November 15, 2012.

Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

Bug Localization with Machine Learning Techniques Wujie Zheng

Click to edit the title text format An Introduction to TuTalk: Developing Dialogue Agents for Learning Studies Pamela Jordan University of Pittsburgh Learning.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.

Presenter: Shanshan Lu 03/04/2010

Correlation & Regression Chapter 5 Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have.

Creating Assessments The three properties of good assessments.

Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Intelligent Tutoring Systems Conference (2014)

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

Diane Litman Learning Research & Development Center

Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Applications of Discourse Structure for Spoken Dialogue Systems

Towards Emotion Prediction in Spoken Tutoring Dialogues

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

Mingyu Feng Neil Heffernan Joseph Beck

Presentation transcript:

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University of Pittsburgh Pittsburgh, PA USA

2 Outline u Overview u Spoken Dialogue Computer Tutoring Data u Metacognitive Metrics based on Student Uncertainty and Correctness labels u Do Metacognitive Metrics Predict Learning? u Conclusions, Future Work

3 Background u Metacognition: important measure of performance and learning u Uncertainty: metacognitive state in tutorial dialogue research u Signals learning impasses (e.g., VanLehn et al., 2003) u Correlates with learning (Litman & Forbes-Riley, 2009; Craig et al., 2004) u Computer tutor responses improve performance (Forbes-Riley & Litman, 2010; Aist et al., 2002; Tsukahara & Ward, 2001) u Complex metrics: combine dimensions (uncertainty, correctness) u Learning impasse severity (Forbes-Riley et al., 2008) u Knowledge monitoring accuracy (Nietfeld et al., 2006) u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009)

4 Our Research u Prior work: do metacognitive metrics predict learning in a wizarded spoken dialogue tutoring corpus? (Litman & Forbes-Riley, 2009) u Computed on manually-labeled uncertainty and correctness u All four complex metrics predicted learning u Current Work: Do metrics also predict learning in a comparable fully automated corpus? u One set computed on real-time automatic (noisy) labels u One set computed on post-experiment manual labels u Most complex metrics still predict learning (noisy or manual) u Worthwhile/Feasible to remediate noisy metacognitive metrics

5 Spoken Dialogue Computer Tutoring Data u ITSPOKE: speech-enhanced, modified version of Why2-Atlas qualitative physics tutor (VanLehn, Jordan, Rosé et al., 2002) u Two prior controlled experiments evaluated utility of responding to uncertainty over and above correctness u Uncertainty and incorrectness are learning impasses (opportunities to learn) (e.g., VanLehn et al., 2003) u Enhanced ITSPOKE: response contingent on student turn’s combined uncertainty and correctness labels (impasse state) u Details in Forbes-Riley & Litman, 2010 u Procedure: reading, pretest, 5 problems, survey, posttest

6 Spoken Dialogue Computer Tutoring Data u 1 st Experiment: ITSPOKE-WOZ corpus (wizarded) u 405 dialogues, 81 students u speech recognition, uncertainty, correctness labeling by human u 2 nd Experiment: ITSPOKE-AUTO corpus (fully-automated) u 360 dialogues, 72 students u Manually transcribed and labeled after experiment u Speech recognition accuracy = 74.6% (Sphinx2) u Correctness accuracy: 84.7% (TuTalk (Jordan et al., 2007)) u Uncertainty accuracy: 80.3% (logistic regression model built with speech/dialogue features, trained on ITSPOKE-WOZ corpus)

7 ITSPOKE-AUTO Corpus Excerpt t1: […] How does the man’s velocity compare to that of the keys? sAUTO: his also the is same as that of his keys incorrect+certain sMANU: his velocity is the same as that of his keys correct+uncertain t2: […] What forces are exerted on the man after he releases his keys? sAUTO: the only force is incorrect+certain sMANU: the only force is incorrect+uncertain t3: […] What’s the direction of the force of gravity on the man? sAUTO: that in the pull in the man vertically down correct+certain sMANU: gravity will be pulling the man vertically down correct+certain

8 Metacognitive Performance Metrics u Metrics computed using four equations that combine Uncertainty and Correctness labels in different ways u Metrics computed per student (over all 5 dialogues) u Two sets of metrics: u one set used real-time automatic (noisy) labels (-auto) u one set used post-experiment manual labels (-manu) u Metrics represent inferred (tutor-perceived) values, because uncertainty labeled by system/human judge u For each metric, we computed a Partial Pearson’s correlation with posttest, controlled for pretest

9 u Average Learning Impasse Severity (Forbes-Riley & Litman, 2008) u Uncertainty and incorrectness are learning impasses u We distinguish four impasse states: all combinations of binary uncertainty (UNC, CER) and correctness (INC, COR) u We rank impasse states by severity based on impasse awareness u We label state of each turn and compute average impasse severity State:INC_CERINC_UNCCOR_UNC COR_CER Severity: most (3) less (2) least (1) none (0) Metacognitive Performance Metrics

10 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics

11 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Denominator sums over all cases

12 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness agree

13 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness are at odds

14 u Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006) u Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) u HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness u Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) u We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Scores range from -1 (no accuracy) to 1 (perfect accuracy)

15 minus u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC Metacognitive Performance Metrics

16 u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC Metacognitive Performance Metrics Denominator sums over all cases minus

17 u Bias (Kelemen et al., 2000; Saadawi et al. 2009) u Measures how much more certainty than correctness there is u Scores less / greater than 0 indicate under- / over-confidence Bias = COR_CER + INC_CER COR_CER + INC_CER + COR_UNC + INC_UNC COR_CER + COR_UNC COR_CER + INC_CER + COR_UNC + INC_UNC minus Metacognitive Performance Metrics Total certain answers Total correct answers

18 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus

19 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus Correct answersIncorrect answers

20 u Discrimination (Kelemen et al., 2000; Saadawi et al. 2009) u Measures one’s ability to discriminate whether one is correct u Scores greater than 0 indicate higher performance Discrimination = COR_CER INC_CER COR_CER + COR_UNC INC_CER + INC_UNC Metacognitive Performance Metrics minus Proportion of correct certain answers Proportion of incorrect certain answers

21 Prior ITSPOKE-WOZ Corpus Results MetricMeanSDRp AV Impasse Severity HC Bias Discrimination %Correct %Uncertain u In ideal conditions, higher learning correlates with: u Less severe impasses (that include uncertainty)/no impasses u Higher knowledge monitoring accuracy u Underconfidence about correctness u Better discrimination of when one is correct u Being correct

22 MetricMeanSDRp AV Impasse Severity_auto HC_auto Bias_auto Discrimination_auto %Correct_auto %Uncertain_auto Current ITSPOKE-AUTO Corpus Results: -auto labels u In noisy/realistic conditions, higher learning still correlates with: u Less severe/no impasses u Higher knowledge monitoring accuracy u Underconfidence about correctness u Being correct

23 MetricMeanSDRp AV Impasse Severity_manu HC_manu Bias_manu Discrimination_manu %Correct_manu %Uncertain_manu Current ITSPOKE-AUTO Corpus Results: -manu labels u In corrected noisy conditions, higher learning still correlates with: u Less severe/no impasses u Higher knowledge monitoring accuracy u Being correct

24 Discussion u Does metacognition add value over correctness for predicting learning in ideal and realistic conditions? u Recomputed correlations controlling for pretest and %Correct: u ITSPOKE-WOZ: All complex metrics correlate with posttest u ITSPOKE-AUTO: No metrics correlate with posttest u Metacognition adds value in ideal conditions u Stepwise linear regression greedily selects from all metrics+pretest u ITSPOKE-WOZ: selects HC after %Correct and pretest u ITSPOKE-AUTO: selects Impasse Severity_auto after pretest u Metacognition adds value in realistic conditions too

25 Conclusions u Metacognitive performance metrics predict learning in a fully automated spoken dialogue computer tutoring corpus u Prior work: four metrics predict learning in a wizarded corpus u Three metrics still predict learning even with automated speech recognition, uncertainty and correctness labeling u Average impasse severity, Knowledge monitoring accuracy, Bias u Metacognitive metrics add value over correctness for predicting learning in ideal and realistic conditions u At least some metrics (e.g., noisy average impasse severity)

26 Current and Future Work u Use results to inform system modification aimed at improving metacognitive abilities (and therefore learning) u Feasible to use fully automated system and noisy metacognitive metrics, rather than expensive wizarded system u Metacognitive metrics represent inferred values u Self-judged values differ from inferred (Pon-Barry & Shieber, 2010); expert-judged values are most reliable (D’Mello et al., 2008) u FOK ratings in future system versions can help measure metacognitive improvement u “Metacognition in ITS” literature will also inform system modification (e.g., AIED’07 and ITS’08 workshops)

27 Questions/Comments? Further Information? web search: ITSPOKE Thank You!

28 Future Work cont. u Why didn’t Discrimination_auto, Discrimination_manu and Bias_manu correlate with learning in ITSPOKE-AUTO? u Due to NLP errors in ITSPOKE-AUTO? u Rerun correlations over students with few speech recognition, uncertainty and correctness errors to see if results pattern like ITSPOKE-WOZ u Due to different user populations? u Run ITSPOKE-AUTO on ITSPOKE-WOZ corpus then compute noisy metric correlations to see if results pattern like ITSPOKE-AUTO corpus

29 u For C+U, I+U, I+nonU answers u ITSPOKE gives same content with same dialogue act u ITSPOKE gives feedback on (in)correctness Simple Adaptation to Uncertainty

30 Simple Adaptation Example TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] u Same TUTOR2 subdialogue if student was I+U or I+nonU