Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Slides:

Advertisements

Similar presentations

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.

Advertisements

Testing Theories: Three Reasons Why Data Might not Match the Theory.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

Evaluating Search Engine

QUANTITATIVE DATA ANALYSIS

Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.

CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.

An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.

MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #18 3/6/02 Taguchi’s Orthogonal Arrays.

Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.

The Effectiveness of Supplemental Online vs. Traditional Tutorials on Students’ English Proficiency and Learning Achievement Ponlak Pantahachart Faculty.

Click to edit the title text format An Introduction to TuTalk: Developing Dialogue Agents for Learning Studies Pamela Jordan University of Pittsburgh Learning.

Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

Week 11 Chapter 12 – Association between variables measured at the nominal level.

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.

Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.

Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development.

Click to edit the title text format An Introduction to TuTalk: Developing Dialogue Agents for Learning Studies Pamela Jordan University of Pittsburgh Learning.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Individual Preferences for Uncertainty: An Ironically Pleasurable Stimulus Bankert, M., VanNess, K., Hord, E., Pena, S., Keith, V., Urecki, C., & Buchholz,

Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.

“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.

Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.

Statistics (cont.) Psych 231: Research Methods in Psychology.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.

Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Intelligent Tutoring Systems Conference (2014)

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.

T tests comparing two means t tests comparing two means.

Diane Litman Learning Research & Development Center

Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.

Early Identification of Introductory Major's Biology Students for Inclusion in an Academic Support Program BETHANY V. BOWLING and E. DAVID THOMPSON Department.

(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.

Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department Learning Research & Development.

Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.

User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.

Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Inferential Statistics Psych 231: Research Methods in Psychology.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.

Chapter 12 Understanding Research Results: Description and Correlation

Towards Emotion Prediction in Spoken Tutoring Dialogues

Dialogue-Learning Correlations in Spoken Dialogue Tutoring

Detecting Prosody Improvement in Oral Rereading

Psych 231: Research Methods in Psychology

Presentation transcript:

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA

Outline u Motivation u Metacognitive Measures u System(s) and Corpus u Evaluation Results u Discussion

Background u Speaker uncertainty is of interest in several research communities u Human Language Technologies (Liscombe et al. 2005; Dijkstra et al. 2006; Pon-Barry 2008) u Psycholinguistics (Brennan & Williams 1995) u AI & Education (Tsukahara & Ward 2001; Aist et al. 2002; Craig et al, 2004; Pon-Barry et al. 2006; Forbes-Riley & Litman 2009)

This Paper u We show that remediating after student uncertainty has the potential to increase student's metacognitive (and cognitive) abilities u Evaluations use a corpus of previously collected dialogues between students and several versions of a Wizard of Oz spoken tutorial dialogue system

Metacognitive Performance u We measure metacognitive performance, in dialogues annotated for student uncertainty and correctness u Impasse severity u Monitoring accuracy (Nietfield et al. 2006) u Bias (Kelemen et al. 2000; Saadawi et al. 2009) u Discrimination (Kelemen et al. 2000; Saadawi et al. 2009) u We then conduct two evaluations u Do the measures differ across experimental conditions? u Do metacognitive and cognitive performance correlate?

Impasse Severity u Tutoring Theory: Uncertainty and Incorrectness both signal Learning Impasses (VanLehn et al., 2003) u Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity Nominal State:I+nonUI+UC+UC+nonU Scalar State: Severity:mostlessleastnone u This Paper: Use the scalar value associated with each student turn to compute an average impasse severity, per student

Impasse Severity u Tutoring Theory: Uncertainty and Incorrectness both signal Learning Impasses (VanLehn et al., 2003) u Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity Nominal State:I+nonUI+UC+UC+nonU Scalar State: Severity:mostlessleastnone u This Paper: Use the scalar value associated with each student turn to compute an average impasse severity, per student Not remediated in many systems

Measures from the Metacognitive Performance Literature CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option motivated by Feeling of Knowing (FOK) research, which is closely related to uncertainty (Dijkstra et al., 2006) The array is then used to compute various standard measures

Monitoring Accuracy CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

Bias CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Bias scores greater than and less than zero indicate over-confidence and under-confidence, with zero indicating best performance

Discrimination CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Discrimination scores greater than zero indicate higher metacognitive performance, in terms of certainty for correct responses and uncertainty for incorrect responses

Prior Experiment: Does Remediating after Student Uncertainty Impact Learning?  ITSPOKE-WOZ  Speech-enabled Why2-Atlas (VanLehn, Jordan, Rosé et al. 2002), further parameterized to adapt to uncertainty and/or correctness  Speech recognition and correctness/uncertainty annotations performed in real-time by a human "Wizard"  4 Conditions  Normal Control: original system (no adaptation)  Simple Adaptation: same response for all impasses  Complex Adaptation: different responses for each impasse  Random Control: Simple Adaptation to random 20% corrects  Results (Forbes-Riley & Litman 2009)  Learning: main effect (Simple > Normal, Complex)

Normal (non-adaptive) System  ITSPOKE (Intelligent Tutoring Spoken Dialogue System)  Dialogue Format: Question – Student Answer – Response  Response Types:  to Corrects (C): positive feedback (e.g. “Fine”)‏  to Incorrects (I): negative feedback (e.g. “Well…”) and  Bottom Out: correct answer with reasoning  Subdialogue: questions walk through reasoning

 Simple Adaptation  Same response for all 3 impasses  Feedback on only (in)correctness  Complex Adaptation  Different responses for the 3 impasses  Feedback on both uncertainty and (in)correctness Two Uncertainty Adaptations

Simple Adaptation Example: C+U TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]  Same TUTOR2 subdialogue if student was I+U or I+nonU

TUTOR2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK] The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [NEW BOTTOM OUT]  Different TUTOR2 subdialogue if student was I+U or I+nonU Complex Adaptation Example: C+U

u Although our experiment was designed to impact learning, we hypothesized that the experimental conditions might also improve metacognitive performance u For each measure: u 1-way ANOVA with condition as between-subjects factor u Planned comparisions for each pair of conditions u complex adaptation > simple adaptation > random control > normal control Evaluation I: Does Remediating after Uncertainty Impact Metacognition?

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity Monitoring Accuracy Bias Discrimination u Both complex and simple reduced average impasse severity, compared to normal (p <.08 in paired contrasts)

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity Monitoring Accuracy Bias Discrimination u Simple (and random) increased monitoring accuracy, compared to normal (p <.06 in paired contrasts)

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity Monitoring Accuracy Bias Discrimination u No statistically significant differences or trends for bias

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity Monitoring Accuracy Bias Discrimination u Trend for discrimination differences overall (p =.09) u However, contrary to our predictions, complex reduced discrimination ability, compared to random and simple (p <.04 in paired contrasts)

u We also hypothesized that better metacognitive abilities would be better from a learning perspective u For each measure: u Partial Pearson's correlation over all 81 students with posttest score, controlled for pretest score to measure learning gain Evaluation II: Do metacognitive and cognitive performance correlate?

Results II: Significant Partial Correlations with Posttest (after controlling for Pretest) Metacognitive MeasureRp Average Impasse Severity Monitoring Accuracy Discrimination u Average Impasse Severity (where smaller is better) is negatively correlated with learning

Results II: Significant Partial Correlations with Posttest (after controlling for Pretest) Metacognitive MeasureRp Average Impasse Severity Monitoring Accuracy Discrimination u Better monitoring accuracy and discrimination (where higher is better) are positively correlated with learning

Summary u We analyzed metacognitive performance in tutorial dialogue u evaluation uses both new and traditional measures u Remediating after student uncertainty improved metacognitive performance u Impasse Severity u Monitoring Accuracy u Metacognitive performance is correlated with learning u Impasse Severity (negative) u Monitoring Accuracy, Discrimination (positive)

Current and Future Work u Use results to inform ITSPOKE modifications that improve both cognitive and metacognitive abilities u measure metacognition using Feeling of Knowing ratings u Replicate analysis using fully automated ITSPOKE

Questions? Further Information? Thank You!

Background u Speaker uncertainty is of interest in several research communities u Prosodic/lexical properties (Liscombe et al. 2005; Dijkstra et al., 2006; Pon-Barry 2008) u Feeling of (Another's) Knowing (Brennan and Williams, 1995) u Learning correlations (Craig et al., 2004; Forbes-Riley et al. 2008) u Tutoring evaluations have investigated the impact of responding to student uncertainty, over and above correctness u Positive feedback responses improved satisfaction (Tsukahara and Ward, 2001) and persistence (Aist et al., 2002) u More substantive responses didn’t improve learning in a fully- automated system (Pon-Barry et al., 2006), but did improve learning, efficiency, and satisfaction under semi-automated conditions (Forbes-Riley and Litman, 2009)

Discussion u Why didn’t Simple Adaptation and Complex Adaptation outperform Random Control? u Random Control adapted to some C+U, diminishing differences u Adapting to C+nonU may increase certainty u Why didn’t Complex Adaptation outperform Simple Adaptation? u Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

Current and Future Work u Use results to inform ITSPOKE modifications that improve both cognitive and metacognitive abilities u Better optimize remediations for each impasse type, particularly those of highest severity u Remediate metacognitive abilities, not just domain content (e.g., Roll and Aleven, 2008; Saadawi et al., 2009) u measure metacognition using Feeling of Knowing ratings u Replicate analysis using recently collected data from fully automated ITSPOKE

Example Dialogue Excerpt TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? TUTOR2: Fine. [FEEDBACK]

u Simple Adaptation: For CU, IU, InonU answers: u ITSPOKE gives same content with same dialogue act u ITSPOKE gives feedback on (in)correctness u Complex Adaptation: Depending on if answer is CU, IU, InonU: u ITSPOKE gives same content but varies dialogue act u Based on human tutor responses significantly associated with CU, IU, InonU answers u ITSPOKE gives complex feedback on affect and (in)correctness u Based on empathetic computer tutor literature (Wang et al., 2005; Hall et al., 2004; Burleson et al., 2004) Two Uncertainty Adaptations

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: Zero?? [I+U] TUTOR2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK] The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of gravity. [I+nonU] TUTOR2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+nonU