A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research and Development Center

Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty (joint work with Kate Forbes-Riley) – Uncertainty Detection and Adaptation – Experimental Evaluation »Wizard-of-Oz »Fully-Automated  Summing Up

Tutorial Dialogue Systems  Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001]  Goal: improve Intelligent Tutoring Systems using Natural Language Processing

More generally... Natural Language Processing and Tools for Learning

More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Tutors Scoring

More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Tutors Scoring Conversational Tutors / Peers CSCL

More generally... Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Tutors Scoring Readability Processing Language Conversational Tutors / Peers CSCL Discourse Coding Lecture Retrieval Questioning & Answering

Outline  Motivation  The ITSPOKE System and Corpora  Detecting and Adapting to Student Uncertainty – Uncertainty Detection and Adaptation – Experimental Evaluation  Summing Up

ITSPOKE: Intelligent Tutoring Spoken Dialogue System  Back-end is Why2-Atlas [VanLehn, Jordan, Rose et al. 2002]  Speech Enhanced – Sphinx2 speech recognition – Cepstral text-to-speech  Reimplemented, other changes

ITSPOKE Corpora  Wizard Tutoring (ITSPOKE-WOZ) –81 students / 405 dialogues –human performs speech recognition, semantic analysis –computer performs dialogue management  Computer Tutoring (ITSPOKE-AUTO) –72 students / 360 dialogues

Experimental Procedure  College students without physics –Read a small background document –Took a multiple-choice Pretest –Worked 5 problems (dialogues) with ITSPOKE –Took an isomorphic Posttest  Goal was to optimize Learning Gain – e.g., Posttest – Pretest

Why Uncertainty?  Most frequent student state in our dialogue corpora [Litman and Forbes-Riley 2004]  Focus of other learning sciences, speech and language processing, and psycholinguistic studies [Craig et al. 2004; Liscombe et al. 2005; Pon-Barry et al. 2006; Dijkstra et al. 2006] .73 Kappa [Forbes-Riley et al. 2008]

Corpus-Based Detection Methodology  Learn detection models from training corpora –Use spoken language processing to automatically extract features from user turns –Use extracted features (e.g., prosodic, lexical) to predict uncertainty annotations  Evaluate learned models on testing corpora –Significant reduction of error compared to baselines [Litman and Forbes-Riley 2006; Litman et al. 2007]

System Adaptation: How to Respond?  Theory-based –[VanLehn et al. 2003; Craig et al. 2004]  Corpus-based –How do humans respond? e.g. [Forbes-Riley, Rotaru, Litman, and Tetreault 2007] * –What are optimal responses? e.g. [Chi, VanLehn and Litman 2010] * * Best paper awards

Theory-Based Adaptation: Uncertainty as Learning Opportunity  Uncertainty represents one type of learning impasse, and is also associated with cognitive disequilibrium – An impasse motivates a student to take an active role in constructing a better understanding of the principle. [VanLehn et al. 2003] –A state of failed expectations causing deliberation aimed at restoring equilibrium. [Craig et al. 2004]  Hypothesis: The system should adapt to uncertainty in the same way it responds to other impasses (e.g., incorrectness)

Adaptation to Student Uncertainty in ITSPOKE  Most systems respond only to (in)correctness  Literature suggests uncertain as well as incorrect student answers signal learning impasses  Experimentally manipulate tutor responses to student uncertainty, over and above correctness, and investigate impact on learning –Platform: Adaptive version(s) of ITSPOKE

Normal (non-adaptive) ITSPOKE  System Initiative Dialogue Format: –Tutor Question – Student Answer – Tutor Response  Tutor Response Types: –to Corrects (C): positive feedback (e.g. “Fine”) –to Incorrects (I): negative feedback (e.g. “Well…”) and »Bottom Out: correct answer with reasoning »Subdialogue: questions walk through reasoning

 Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State:I+nonUI+UC+UC+nonU Severity:mostlessleastnone Adaptive ITSPOKE

 Our Prior Work: Rank correctness (C, I) + uncertainty (U, nonU) states in terms of impasse severity State:I+nonUI+UC+UC+nonU Severity:mostlessleastnone  Adaptation Hypothesis: –ITSPOKE already resolves I impasses (I+nonU, I+U), but it ignores one type of U impasse (C+U) –Performance improvement if ITSPOKE provides additional content to resolve all impasses Adaptive ITSPOKE(s)

 Simple Adaptation –Same response for all 3 impasses –Feedback on only (in)correctness  Complex Adaptation –Different responses for the 3 impasses –Feedback on both uncertainty and (in)correctness Two Uncertainty Adaptations

Simple Adaptation Example: C+U TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of the car hitting it?? [C+U] TUTOR2: Fine. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE]  Same TUTOR2 subdialogue if student was I+U or I+nonU

Experiment 1: ITSPOKE-WOZ  Wizard of Oz version of ITSPOKE –Human recognizes speech, annotates correctness and uncertainty –Provides upper-bound language performance  Conditions –Simple Adaptation: used same response for all impasses –Complex Adaptation: used different responses for each impasse –Normal Control: used original system (no adaptation) –Random Control: gave Simple Adaptation to random 20% of correct answers (to control for additional tutoring)

Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213- F(3, 77) = 3.275, p = 0.02

Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213-  Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes-Riley and Litman 2010] F(3, 77) = 3.275, p = 0.02

Results I: Learning MetricConditionNMeanDiffp Learning Gain (Posttest – Pretest) Normal Control21.183< Simple Adaptation.03 Random Control20.269- Simple Adaptation20.307- Complex Adaptation20.213-  Simple Adaptation yields more student learning than Normal Control (original ITSPOKE) [Forbes-Riley and Litman 2010]  Similar results for learning efficiency [Forbes-Riley and Litman 2009] F(3, 77) = 3.275, p = 0.02

Additional Evaluations - Metacognition  Do metacognitive performance measures differ across experimental conditions? –e.g., Monitoring Accuracy [Nietfield et al. 2006]  Do metacognitive and cognitive performance measures (i.e. learning) correlate?

Metacognitive Results  Simple (and random) increased monitoring accuracy compared to normal (p <.06 in paired contrasts)  Monitoring Accuracy is positively correlated with learning [Litman and Forbes-Riley 2009]

Experiment 2: ITSPOKE-AUTO  Fully automated ITSPOKE –Sphinx2 speech recognizer / TuTalk semantic analyzer »Correctness Accuracy of 85% –Weka uncertainty model »Logistic regression (includes lexical, prosodic, dialogue features) »Uncertainty Accuracy of 80%  Only 3 Conditions –Simple Adaptation –Normal Control –Random Control

Preliminary Results: ITSPOKE-AUTO  Simple Adaptation yields more student learning than Normal and Random Controls  Differences only significant for a subset of students  Noisy uncertainty detection is the system bottleneck  3 of the 4 metacognitive metrics remain correlated with learning [Forbes-Riley and Litman, 2010]

Current and Future Research  More sophisticated ITSPOKE adaptations –User modeling (domain knowledge, gender) –Multiple student states (disengagement) –Motivation [Ward 2010]  Remediate metacognition, not just domain content

Summing Up  Spoken dialogue contributes to the success of human tutors  Using presently available technology, successful tutorial dialogue systems can also be built  Adapting to uncertainty can further improve performance –Learning gains, efficiency, metacognition  Tutors can serve as platforms for learning science studies

Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors

Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors Tutor Abstraction and Specialization during Reflective Conversation [Katz/Jordan/Litman poster]

Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Conversational Tutors Tutor Abstraction and Specialization during Reflective Conversation [Katz/Jordan/Litman poster] Semantic Class Acquisition via Web-Learning [Lipschultz/Litman poster]

Related Projects Natural Language Processing and Tools for Learning Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Computer-Supported Peer Review for Writing [Xiong/Litman/Schunn poster]

Acknowledgements  ITSPOKE group past and present –Hua Ai, Min Chi, Joanna Drummond, Kate Forbes-Riley, Heather Friedberg, Alison Huettner, Michael Lipschultz, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetreault, Art Ward, Wenting Xiong  NLP@Pitt –Jan Wiebe, Rebecca Hwa, Wendy Chapman  Why2-Atlas and Human Tutoring groups –Kurt Vanlehn, Pamela Jordan, Carolyn Rose –Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy, Sandra Katz

Thank You!  Questions?  Further Information –http://www.cs.pitt.edu/~litman/itspoke.html

The End

Example Student States in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]

WOZ-TUT Screenshot

Bigram Dependency Analysis EXPECTED Tutor IncludePos Tutor OmitsPos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Tutor IncludesPos Tutor OmitsPos neutral2522517 certain273832 uncertain185631 mixed71161 χ2 = 225.92 (critical χ2 value at p =.001 is 16.27) - “Student Certainness – Tutor Positive Feedback” Bigrams

Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 OBSERVED Includes Pos Omits Pos neutral2522517 - Less Tutor Positive Feedback after Student Neutral turns

Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral439.462329.54 certain175.21928.79 uncertain129.51686.49 mixed36.82195.18 OBSERVED Includes Pos Omits Pos neutral2522517 certain273832 uncertain185631 mixed71161 - Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns

Survey Tutoring Uncertainty Spoken Dialogue

Learning Efficiency Results MetricConditionNMeanDiffp Normalized learning gain / total tutoring time in minutes Normal Control21.010< Simple Adapt.004 Random Control20.014- Simple Adaptation20.016- Complex Adaptation20.011< Simple Adapt.013  Given same amount of tutoring time, Simple Adaptation yields more student learning than either Normal Control or Complex Adaptation  Results also hold using raw learning gain, and total number of student turns F(3, 77) = 3.56, p = 0.02

Bias CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Bias scores greater than and less than zero indicate over-confidence and under-confidence, with zero indicating best performance

Discrimination CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Discrimination scores greater than zero indicate higher metacognitive performance, in terms of certainty for correct responses and uncertainty for incorrect responses

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity.59.60.73 Monitoring Accuracy.58.62.52 Bias-.01-.03-.01-.02 Discrimination.34.46.48.41  No statistically significant differences or trends for bias

Results I: Means across Conditions Metacognitive Measure Complex Adaptation (20) Simple Adaptation (20) Random Control (20) Normal Control (21) Average Impasse Severity.59.60.73 Monitoring Accuracy.58.62.52 Bias-.01-.03-.01-.02 Discrimination.34.46.48.41  Trend for discrimination differences overall (p =.09)  However, contrary to our predictions, complex reduced discrimination ability, compared to random and simple (p <.04 in paired contrasts)

Intelligent Tutoring

Corpus-Based Adaptation: How Do Human Tutors Respond?  An empirical method for designing dialogue systems adaptive to student state –extraction of “dialogue bigrams” from annotated human tutoring corpora –χ 2 analysis to identify dependent bigrams –generalizable to any domain with corpora labeled for user state and system response

Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

Findings  Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor –After uncertain, tutor Bottoms Out and avoids expansions –After certain, tutor Restates –After any emotion, tutor increases Feedback  Dependencies suggest adaptive strategies for implementation in our computer tutor [Forbes-Riley and Litman 2010]

TUTOR2: That’s exactly right, but you seem unsure, so let’s sum up. [FEEDBACK] The net force on the truck is equal to the impact force on it. We can prove this just like we did for the car. First, we know that gravity and the normal force on the truck must cancel each other, otherwise the truck would not be at rest vertically. Second, we know that the impact force is the only horizontal force exerted on the truck. [LONG BOTTOM OUT] Complex Adaptation Example: C+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: Zero?? [I+U] TUTOR2: That’s not correct, but don’t worry. You seem to be aware of your mistake. Let’s resolve it. [FEEDBACK] The net force on the truck is equal to the impact force on it. Let’s walk through this answer step by step. [SHORT BOTTOM OUT] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+U

TUTOR1: By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT1: The force of gravity. [I+nonU] TUTOR2: I’m sorry, but I see an error in your answer. Let’s fix it. [FEEDBACK] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [SUBDIALOGUE] Complex Adaptation Example: I+nonU

Discussion u Predictions versus results: - Complex Adaptation > Simple Adaptation > Random Control > Normal Control u Why didn’t Simple Adaptation and Complex Adaptation outperform Random Control? –Random Control adapted to some C+U, diminishing differences –Adapting to C+nonU may increase certainty u Why didn’t Complex Adaptation outperform Simple Adaptation? –Complex Adaptation’s human-based content responses were based on frequency, not effectiveness

 Depending on if answer is C+U, I+U, I+nonU: –ITSPOKE gives same content but varies dialogue act »Based on human tutor responses significantly associated with C+U, I+U, I+nonU answers –ITSPOKE gives complex feedback on uncertainty and (in)correctness »Based on empathetic computer tutor literature (Wang et al., 2005; Hall et al., 2004; Burleson et al., 2004) Complex Adaptation to Uncertainty

Impasse Severity  Use the scalar value associated with each student turn to compute an average impasse severity, per student Nominal State:I+nonUI+UC+UC+nonU Scalar State:3 2 1 0 Severity:mostlessleastnone

Results II Metacognitive Measure (n=81)Rp Average Impasse Severity-.56.00 Monitoring Accuracy.42.00  Correlations of Metacognitive Measures with Posttest, after controlling for Pretest  Average Impasse Severity (where smaller is better) is negatively correlated with learning [Litman and Forbes-Riley 2009]

Additional Results II Metacognitive Measure (n=81)Rp Average Impasse Severity-.56.00 Monitoring Accuracy.42.00  Monitoring Accuracy (where higher is better) is positively correlated with learning [Litman and Forbes-Riley 2009]

Preliminary Results: ITSPOKE-AUTO Metacognitive Measure WOZAUTO RpRp Average Impasse Severity-.56.00-.40.00 Monitoring Accuracy.42.00.35.00  Impasse Severity and Monitoring Accuracy remain correlated with learning in ITSPOKE-AUTO corpus [Forbes-Riley and Litman, submitted]

Monitoring Accuracy CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU The wizard's annotations for each student are first represented in an array, where each cell represents a mutually exclusive option motivated by Feeling of (Another’s) Knowing [Smith and Clark 1993; Brennan and Williams 1995] which is closely related to uncertainty [Dijkstra et al. 2006] The array is then used to compute monitoring accuracy

Monitoring Accuracy CorrectIncorrect NonUncertainCnonUInonU UncertainCUIU Ranges from -1 (no monitoring accuracy) to 1 (perfect monitoring accuracy)

 Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics

 Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Denominator sums over all cases

 Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which one’s certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our certainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness agree

 Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics cases where (un)certainty and (in)correctness are at odds

 Knowledge monitoring accuracy (HC) (Nietfeld et al., 2006)  Monitoring one’s own knowledge ≈ one’s Certainty level ≈ one’s Feeling of Knowing (FOK) –HC has been used to measure FOK accuracy (Smith & Clark, 1993): the accuracy with which certainty corresponds to correctness  Feeling of Another’s Knowing (FOAK): inferring the FOK of someone else (Brennan & Williams, 1995) –We use HC to measure FOAK accuracy (our uncertainty is inferred) HC = (COR_CER + INC_UNC) – (INC_CER + COR_UNC) (COR_CER + INC_UNC) + (INC_CER + COR_UNC) Metacognitive Performance Metrics Scores range from -1 (no accuracy) to 1 (perfect accuracy)

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Similar presentations

Presentation on theme: "A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.

Similar presentations

Presentation on theme: "A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research."— Presentation transcript:

Similar presentations

About project

Feedback