Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diane Litman Learning Research & Development Center

Similar presentations


Presentation on theme: "Diane Litman Learning Research & Development Center"— Presentation transcript:

1 Spoken Tutorial Dialogue Systems: Opportunities, Challenges and Results
Diane Litman Learning Research & Development Center Computer Science Department Intelligent Systems Program University of Pittsburgh

2 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

3 What is Natural Language Processing?
“The goal of this new field is to get computers to perform useful tasks involving human language, tasks like enabling human-machine communication, improving human-human communication, or simply doing useful processing of text or speech.” [Jurafsky and Martin 2008] Many names and facets Speech and Language Processing Human Language Technology Computational Linguistics

4 What is Tutoring? “A one-on-one dialogue between a teacher and a student for the purpose of helping the student learn something.” [Evens and Michael 2006] Human Tutoring Excerpt [Thanks to Natalie Person and Lindsay Sears, Rhodes College]

5 Intelligent Tutoring Systems
Students who receive one-on-one instruction perform as well as the top two percent of students who receive traditional classroom instruction [Bloom 1984] Unfortunately, providing every student with a personal human tutor is infeasible Develop computer tutors instead

6 Tutorial Dialogue Systems
Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001] Currently only humans use full-fledged natural language dialogue KVL: In 4 studies now, we have ties between human tutors and reading controls (Why2 Spring 2002, Why2 fall 2003, Art’s masters student, Rod Roscoe’s study). Thus, you should definitely assert that the hypothesis in bullet one is true. At best it is a commonly believed working hypothesis that we are still being tested.

7 Spoken Tutorial Dialogue Systems
Most human tutoring involves face-to-face spoken interaction, while most computer dialogue tutors are text-based Can the effectiveness of dialogue tutorial systems be further increased by using spoken interactions? KVL: In 4 studies now, we have ties between human tutors and reading controls (Why2 Spring 2002, Why2 fall 2003, Art’s masters student, Rod Roscoe’s study). Thus, you should definitely assert that the hypothesis in bullet one is true. At best it is a commonly believed working hypothesis that we are still being tested.

8 Potential Benefits of Speech: I
Self-explanation correlates with learning [Chi et al ] and occurs more in speech [Hausmann and Chi 2002] Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... Student 2 (doesn’t self-explain): right side pumps blood to lungs

9 Potential Benefits of Speech: I
Self-explanation correlates with learning [Chi et al ] and occurs more in speech [Hausmann and Chi 2002] Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... Student 2 (doesn’t self-explain): right side pumps blood to lungs

10 Potential Benefits of Speech: I
Self-explanation correlates with learning [Chi et al ] and occurs more in speech [Hausmann and Chi 2002] Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... Student 2 (doesn’t self-explain): right side pumps blood to lungs

11 Potential Benefits of Speech: II
Speech contains prosodic information, providing new sources of information about the student for dialogue adaptation [Fox 1993; Litman and Forbes-Riley 2003; Pon-Barry et al. 2005] A correct but uncertain student turn ITSPOKE: How does his velocity compare to that of his keys? STUDENT: his velocity is constant

12 Potential Benefits of Speech: III
Spoken computational environments may foster social relationships that may enhance learning AutoTutor [Graesser et al. 2003]

13 Potential Benefits of Speech: IV
Some applications inherently involve spoken language Spoken Conversational Interface for Language Learning [Thanks to Stephenie Seneff, MIT and Cambridge] Reading Tutors [Mostow, Cole] Others require hands-free interaction Circuit Fix-It Shop [Smith 1992] NASA

14 Why Should NLP Researchers Care?
Many reasons why tutoring researchers are interested in spoken dialogue Why should spoken dialogue researchers become interested in tutoring? Tutoring applications differ in many ways from typical spoken dialogue applications Opportunities and Challenges!

15 More generally... NLP Applications to Education

16 More generally... NLP Applications to Education Learning Language
(reading, writing, speaking) Tutors Scoring

17 (to teach everything else)
More generally... NLP Applications to Education Learning Language (reading, writing, speaking) Using Language (to teach everything else) Tutors Conversational Tutors / Peers Scoring CSCL

18 (to teach everything else)
More generally... NLP Applications to Education Learning Language (reading, writing, speaking) Using Language (to teach everything else) Processing Language Readability Tutors Conversational Tutors / Peers Questioning & Answering Scoring CSCL Discourse Coding Lecture Retrieval

19 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

20 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
Sphinx2 speech recognition and Cepstral text-to-speech

21 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
Sphinx2 speech recognition and Cepstral text-to-speech

22 Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002]
Sphinx2 speech recognition and Cepstral text-to-speech

23 Two Types of Tutoring Corpora
Human Tutoring 14 students / 128 dialogues (physics problems) 5948 student turns, 5505 tutor turns Computer Tutoring ITSPOKE v1 20 students / 100 dialogues 2445 student turns, 2967 tutor turns ITSPOKE v2 57 students / 285 dialogues both synthesized and pre-recorded tutor voices

24 ITSPOKE Experimental Procedure
College students without physics Read a small background document Took a multiple-choice Pretest Worked 5 problems (dialogues) with ITSPOKE Took an isomorphic Posttest Goal was to optimize Learning Gain e.g., Posttest – Pretest

25 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

26 Monitoring Student State (motivation)
Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27: dammit (ASR: it is) Tutor28 : Could you please repeat that? Student29 : same (ASR: i same) Tutor30 : Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31: zero (ASR: the zero) Tutor32 : Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario <…omitted…> Student33: oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34: Fine. Are there any other forces acting on the apple as it falls? Student35: no why are you doing this again (ASR: no y and to it yes) Tutor36: Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37: downward you computer (ASR: downward you computer) KVL: This text will not be visible on the 9th floor setup. How about just including the student turns and not the tutor turns on the slide?

27 Adaptive Spoken Dialogue Systems: Standard Methodology
Manual Annotation of User States (Affect, Attitudes, etc.) Naturally-occurring spoken dialogue data [Ang et al. 2002; Lee et al ; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003; Liscombe et al. 2005] Prediction via Machine Learning Automatically extract features from user turns Use different feature sets (e.g. prosodic, lexical) to predict user state(s) Significant reduction of baseline error

28 What to Annotate? Information-Access and Customer Care Systems
Negative: Angry, Annoyed, Frustrated, Tired Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]

29 [Litman and Forbes-Riley 2006, D’Mello et al. 2006]
What to Annotate? Information-Access and Customer Care Systems Negative: Angry, Annoyed, Frustrated, Tired Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005] Tutorial Dialogue Systems Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Contempt, Disgusted, Sad Positive/Neutral: Certain, Curious, Enthusiastic, Eureka [Litman and Forbes-Riley 2006, D’Mello et al. 2006]

30 Detecting Neg/Pos/Neu in ITSPOKE
Baseline Accuracy via Majority Class Prediction

31 Detecting Neg/Pos/Neu in ITSPOKE
Use of prosodic (sp), recognized (asr) and/or actual (lex) lexical features outperforms baseline

32 Detecting Neg/Pos/Neu in ITSPOKE
As with other applications, highest predictive accuracies are obtained by combining multiple feature types [Litman and Forbes-Riley, Speech Communication 2006]

33 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

34 System Adaptation: How to Respond?
Our initial focus: responding to student uncertainty Most frequent user state in our data Focus of other studies [VanLehn et al. 2003; Craig et al , Porayska-Pomsta et al. 2007; Pon-Barry et al ] .62 Kappa Approaches to adaptive system design Theory-based Data-driven

35 Theory-Based Adaptation
In tutoring, not all negatively-valenced states are bad! While frustration/anger/annoyance is often frustrating… Frustration can also be an opportunity to learn Example from AutoTutor neutral  flow  confusion  frustration  neutral [Thanks to Sidney D‘Mello and Arthur Graesser, University of Memphis]

36 Uncertainty is also a Learning Opportunity
Uncertainty represents one type of learning impasse An impasse motivates a student to take an active role in constructing a better understanding of the principle [VanLehn et al. 2003] Uncertainty associated with cognitive disequilibrium A state of failed expectations causing deliberation aimed at restoring equilibrium. [Craig et al. 2004] Hypothesis: The system should adapt to uncertainty in the same way it responds to other impasses Three main areas are being addressed in the CS literature = Phase 1, 2, 3 -Correlations have shown results – will get to in a few slides

37 Data-Driven Adaptation: How Do Human Tutors Respond?
An empirical method for designing dialogue systems adaptive to student state extraction of “dialogue bigrams” from annotated human tutoring corpora χ2 analysis to identify dependent bigrams generalizable to any domain with corpora labeled for user state and system response

38 Example Human Tutoring Excerpt
S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]

39 Bigram Dependency Analysis
- “Student Certainness – Tutor Positive Feedback” Bigrams χ2 = (critical χ2 value at p = .001 is 16.27) OBSERVED Tutor IncludesPos Tutor OmitsPos neutral 252 2517 certain 273 832 uncertain 185 631 mixed 71 161 EXPECTED Tutor IncludePos Tutor OmitsPos neutral 439.46 certain 175.21 928.79 uncertain 129.51 686.49 mixed 36.82 195.18

40 Bigram Dependency Analysis (cont.)
- Less Tutor Positive Feedback after Student Neutral turns OBSERVED Includes Pos Omits neutral 252 2517 EXPECTED Includes Pos Omits neutral 439.46

41 Bigram Dependency Analysis (cont.)
- Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns OBSERVED Includes Pos Omits neutral 252 2517 certain 273 832 uncertain 185 631 mixed 71 161 EXPECTED Includes Pos Omits neutral 439.46 certain 175.21 928.79 uncertain 129.51 686.49 mixed 36.82 195.18

42 Findings Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor After uncertain, tutor Bottoms Out and avoids expansions After certain, tutor Restates After mixed, tutor Hints After any emotion, tutor increases Feedback Dependencies suggest adaptive strategies for implementation in computer tutoring systems

43 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

44 Adaptation to Student Uncertainty in ITSPOKE: A First Evaluation
Most systems respond only to (in)correctness Recall that literature suggests uncertain as well as incorrect student answers signal learning impasses Experimentally manipulate tutor responses to student uncertainty and investigate impact on learning Basic adaptation Data-driven adaptation

45 Platform: Adaptive WOZ-TUT System
Modified version of ITSPOKE Dialogue manager adapts to uncertainty system responses based on combined uncertainty and correctness Full automation replaced by Wizard of Oz (WOZ) components human wizard recognizes student speech human also annotates both uncertainty and correctness 1st Overview study... And the coding done on the student turns Parameters extracted from corpora, used to build models Then I’ll discuss our results – our predictive models

46 Experimental Design: 4 Conditions
Experimental-Basic: treat all uncertain turns as incorrect Experimental-Empirical: for uncertain or incorrect turns provide original content, but vary dialogue act (human tutor analysis) provide additional feedback on uncertainty (beyond propositional content) Control-Norm: ignore uncertainty (as in original system) Control-Random: ignore uncertainty, but treat a percentage of random correct answers as incorrect (to control for additional tutoring) 1st Overview study... And the coding done on the student turns Parameters extracted from corpora, used to build models Then I’ll discuss our results – our predictive models

47 Treatments in Different Conditions
TUTOR: Now let’s talk about the net force exerted on the truck. By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT: The force of the car hitting it? [uncertain+correct] TUTOR (Control-Norm): Good [Feedback] … [moves on] TUTOR (Experimental-Basic): Fine. [Feedback] We can derive the net force on the truck by summing the individual forces on it, just like we did for the car. First, what horizontal force is exerted on the truck during the collision? [Remediation Subdialogue] Same tutor response if student had been incorrect 1st Overview study... And the coding done on the student turns Parameters extracted from corpora, used to build models Then I’ll discuss our results – our predictive models

48 Experimental Procedure
20-21 subjects in each condition Native English speakers with no college physics Procedure: 1) read background material, 2) took pretest, 3) worked training problem with WOZ-TUT, 4) took user survey, 5) took posttest 1st Overview study... And the coding done on the student turns Parameters extracted from corpora, used to build models Then I’ll discuss our results – our predictive models

49 Experimental Results Two-way ANOVA indicated students learned (F(1,77) = , p = 0.000, MSe = ) Amount depended on condition (F(3,77) = 3.275, p = 0.025, MSe = 0.009) One-way ANOVA with post-hoc Tukey tests determined which conditions learned more What is LI

50 Experimental Results Two-way ANOVA indicated students learned (F(1,77) = , p = 0.000, MSe = ) Amount depended on condition (F(3,77) = 3.275, p = 0.025, MSe = 0.009) One-way ANOVA with post-hoc Tukey tests determined which conditions learned more What is LI

51 In Addition… Learning Efficiency also improved Current Directions
Two Efficiency Measures (Normalized Learning Gains) / (Total Student Turns) (Normalized Learning Gains) / (Total Time in Minutes) Experimental-Basic > Control-Norm (p < .05) Current Directions New evaluation of Experimental-Basic fully-automated ITSPOKE New methods for designing Experimental-Empirical educational data mining using reinforcement learning Other student states

52 Outline Motivation The ITSPOKE System and Corpora
Detecting and Adapting to Student Uncertainty Uncertainty Detection System Adaptation Experimental Evaluation Summing Up

53 Summing Up: I Spoken Dialogue Systems are of great interest to researchers in Intelligent Tutoring One-on-one tutoring is a powerful technique for helping students learn Natural language dialogue contributes in a powerful way to the efficacy of one-on-one-tutoring Using presently available NLP technology, computer tutors can be built and can serve as a valuable experimental platform to investigate student learning

54 Summing Up: II Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems Adapting to Student States (Kate Forbes-Riley)

55 Summing Up: II Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems Adapting to Student States (Kate Forbes-Riley) and many more! Cohesion/Alignment (Arthur Ward, Sandra Katz), Reinforcement Learning (Min Chi), User Simulation (Hua Ai), Miscommunication (Pamela Jordan, Michael Lipschultz, Joanna Drummond)

56 Summing Up: II Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems Adapting to Student States (Kate Forbes-Riley) and many more! Cohesion/Alignment (Arthur Ward, Sandra Katz), Reinforcement Learning (Min Chi), User Simulation (Hua Ai), Miscommunication (Pamela Jordan, Michael Lipschultz, Joanna Drummond) Your NLP educational application here!

57 Acknowledgements ITSPOKE group past and present NLP@Pitt
Hua Ai, Min Chi, Joanna Drummond, Kate Forbes-Riley, Alison Huettner, Michael Lipschultz, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetreault, Art Ward Columbia Collaborators: Julia Hirschberg, Jackson Liscombe, Jennifer Venditti Jan Wiebe, Rebecca Hwa, Wendy Chapman, Paul Hoffmann, Behrang Mohit, Carol Nichols, Swapna Somasundaran, Theresa Wilson, Chenhai Xi Why2-Atlas and Human Tutoring groups Kurt Vanlehn, Pamela Jordan, Uma Pappuswamy, Carolyn Rose Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy

58 Thank You! Questions? Further Information

59 The End

60 Overview: Towards Adaptive Spoken Dialogue Systems
Detection Promising across user states and applications, e.g.: Craig et al., 2006 Litman & Forbes-Riley, 2006 Lee & Narayanan, 2005 Vidrascu & Devillers, 2005 Batliner et al., 2003 Adaptation Sparse, can be difficult to show adaptation improves performance Some used basic adaptations and showed likeability increases For other performance metrics, basic adaptations not clear a priori User State Application System Adaptation Health Assessment Stress Empathy [Liu & Picard 2005] Gaming Frustration Apology [Klein et al. 2002] Tutoring ??????? ???????

61 Detecting and Responding to Student States
Opportunity Adaptive spoken dialogue system technology can improve student learning and other measures of performance [Aist et al. 2002; Pon-Barry et al. 2006] Challenges What to detect? How to respond? Evaluation?

62 Example Student States in ITSPOKE
ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]

63 Prediction via Machine Learning
Multiple feature types per student turn, e.g. Acoustic-prosodic Lexical Identifiers System and student performance Sample research questions Relative utility of feature types Impact of speech recognition Speaker and task dependence Impact of learning algorithm, amount of training data

64 Detecting Neg/Pos/Neu in ITSPOKE
- However, relative feature utility differs in tutoring (e.g., for speech features: temporal > energy > pitch)

65 Example from ITSPOKE ITSPOKE: What is the net force acting on the truck equal to? Student: I don’t know [Uncertain, Frustrated] ITSPOKE: Since net force is defined as the vector sum of all the forces exerted on an object, let's first find the forces acting on the truck. Try to name them. Student: The impact force, the normal force, and gravity [Certain, Neutral] ITSPOKE: Yep. Here’s an annotated excerpt. You see T, S, ASR, C,FA Also see the “correctness” label – ITSPOKE labels each student turns with a Correctness label, which I’ll talk about more shortly

66 Approaches to Evaluation
“Correlational” Studies, e.g. Student uncertainty positively correlates with learning [Craig et al. 2004] Adding uncertainty and frustration metrics to regression models increases model fit [Forbes-Riley et al. 2008] “Causal” Studies, e.g. Adding human-provided emotional scaffolding to a reading tutor increases student persistence [Aist et al. 2002] Experimentally manipulate tutor responses to student uncertainty and investigate impact on learning [Pon-Barry et al. 2006]

67 WOZ-TUT Screenshot 1st Overview study...
And the coding done on the student turns Parameters extracted from corpora, used to build models Then I’ll discuss our results – our predictive models

68 Treatments in Different Conditions
TUTOR: Now let’s talk about the net force exerted on the truck. By the same reasoning that we used for the car, what’s the overall net force on the truck equal to? STUDENT: The force of the car hitting it? [uncertain+correct] TUTOR (Control-Norm): Good [Feedback] … [moves on] TUTOR (Experimental-Empirical): That’s exactly right, but you seem unsure, so let’s sum up. [Feedback] The net force on the truck is equal to the impact force on it… [New Bottom Out] New tutor responses for incorrect +/- uncertainty answers as well

69 In Closing Synergy between Intelligent Tutoring and Spoken Dialogue Systems can provide Better scientific understanding of how dialogue facilitates learning Long-term benefit for scaling spoken dialogue systems to new and complex domains


Download ppt "Diane Litman Learning Research & Development Center"

Similar presentations


Ads by Google