Spoken Dialogue for Intelligent Tutoring Systems: Opportunities and Challenges Diane Litman Computer Science Department & Learning Research & Development Center University of Pittsburgh Currently Leverhulme Visiting Professor School of Informatics University of Edinburgh
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
What is Tutoring? “A one-on-one dialogue between a teacher and a student for the purpose of helping the student learn something.” [Evens and Michael 2006] Human Tutoring Excerpt [Thanks to Natalie Person and Lindsay Sears, Rhodes College]
Intelligent Tutoring Systems Students who receive one-on-one instruction perform as well as the top two percent of students who receive traditional classroom instruction [Bloom 1984] Unfortunately, providing every student with a personal human tutor is infeasible – Develop computer tutors instead
Tutorial Dialogue Systems Why is one-on-one tutoring so effective? “...there is something about discourse and natural language (as opposed to sophisticated pedagogical strategies) that explains the effectiveness of unaccomplished human [tutors].” [Graesser, Person et al. 2001] Working hypothesis regarding learning gains –Human Dialogue > Computer Dialogue > Text
Spoken Tutorial Dialogue Systems Most human tutoring involves face-to-face spoken interaction, while most computer dialogue tutors are text-based Can the effectiveness of dialogue tutorial systems be further increased by using spoken interactions?
A Brief History 1970 – Mid 1980s –SCHOLAR (Carbonell) –WHY (Stevens and Collins) –SOPHIE (Burton and Brown) –Meno-Tutor (Woolf and McDonald) … Late 1980s s –CIRCSIM-Tutor (Evens, Michael and Rovick) –SHERLOCK II (Lesgold) –Unix Consultant (Wilensky et al. ) –EDGE (Cawsey) … Currently… –Why2-AutoTutor (Graesser et al.) (speech synthesis) –Why2-Atlas (VanLehn et al.) –CyclePad (Rose et al.) –Beetle (Moore et al.) –DIAG-NLG (Di Eugenio) –SCoT (Peters et al.)(spoken dialogue) –ITSPOKE (Litman et al.) …(spoken dialogue)
Potential Benefits of Speech: I Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] –Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? –Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... –Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: I Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] –Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? –Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... –Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: I Self-explanation correlates with learning [Chi et al. 1994] and occurs more in speech [Hausmann and Chi 2002] –Tutor: The right side pumps blood to the lungs, and the left side pumps blood to the other parts of the body. Could you explain how that works? –Student 1 (self-explains): So the septum is a divider so that the blood doesn't get mixed up. So the right side is to the lungs, and the left side is to the body. So the septum is like a wall that divides the heart into two parts...it kind of like separates it so that the blood doesn't get mixed up... –Student 2 (doesn’t self-explain): right side pumps blood to lungs
Potential Benefits of Speech: II Speech contains prosodic information, providing new sources of information about the student for dialogue adaptation [Fox 1993; Litman and Forbes-Riley 2003; Pon-Barry et al. 2005] A correct but uncertain student turn –ITSPOKE: How does his velocity compare to that of his keys? –STUDENT: his velocity is constant
Potential Benefits of Speech: III Spoken computational environments may foster social relationships that may enhance learning –AutoTutor [Graesser et al. 2003]
Potential Benefits of Speech: IV Some applications inherently involve spoken language –Spoken Conversational Interface for Language Learning [MIT(Seneff,Glass,Wang),Cambridge (Young,He,Ye)] –Reading Tutors [Mostow, Cole] Others require hands-free interaction –Circuit Fix-It Shop [Smith 1992]
Why Should NLP Researchers Care? Many reasons why tutoring researchers are interested in spoken dialogue Why should researchers in computational linguistics become interested in tutoring? –Tutoring applications differ in many ways from typical spoken dialogue applications –Opportunities and Challenges!
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech
Back-end is Why2-Atlas system [VanLehn, Jordan, Rose et al. 2002] Sphinx2 speech recognition and Cepstral text-to-speech
Two Types of Tutoring Corpora Human Tutoring –14 students / 128 dialogues (physics problems) –5948 student turns, 5505 tutor turns Computer Tutoring –ITSPOKE v1 »20 students / 100 dialogues »2445 student turns, 2967 tutor turns –ITSPOKE v2 » 57 students / 285 dialogues » both synthesized and pre-recorded tutor voices
ITSPOKE Experimental Procedure College students without physics –Read a small background document –Took a multiple-choice Pretest –Worked 5 problems (dialogues) with ITSPOKE –Took an isomorphic Posttest Goal was to optimize Learning Gain – e.g., Posttest – Pretest
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
Predictive Performance Modeling Opportunity –Spoken dialogue system evaluation methodologies can improve our understanding of how dialogue facilitates student learning [Forbes-Riley and Litman 2006] Challenges – How to measure system performance? – What are predictive interaction parameters?
Predictive Performance Modeling Understand why a spoken dialogue system fails or succeeds PARADISE [Walker et al. 1997] –Measure parameters (interaction costs and benefits) and performance in a system corpus –Train model via multiple linear regression over parameters, predicting performance System Performance = ∑ w i * p i –Test model on new corpus –Predict performance during future system design n i=1
Challenges System Performance –Prior evaluations used User Satisfaction –Is Student Learning more relevant for the tutoring domain? Interaction Parameters –Prior applications used Generic parameters –Are Task-Specific and Affective parameters also useful?
Findings Using PARADISE to predict Learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals Traditional predictive parameters – e.g., Elapsed Time, Dialogue and Turn Length New parameters – e.g., Affect, Correctness Predictive power increases with the linguistic sophistication of the parameters – e.g., Semantic concepts rather than words
Contrasts with Non-Tutorial Dialogue User Satisfaction models are less useful –Tutoring systems are not designed to maximize User Satisfaction Interaction parameters for learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals » longer dialogues are better » speech recognition problems don’t seem to matter » lack of some types of affect is bad
Contrasts with Non-Tutorial Dialogue User Satisfaction models are less useful –Tutoring systems are not designed to maximize User Satisfaction Interaction parameters for learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals » longer dialogues are better » speech recognition problems don’t seem to matter » lack of some types of affect is bad
Contrasts with Non-Tutorial Dialogue User Satisfaction models are less useful –Tutoring systems are not designed to maximize User Satisfaction Interaction parameters for learning –Posttest =.86 * Time +.65 * Pretest -.54 * #Neutrals » longer dialogues are better » speech recognition problems don’t seem to matter » lack of some types of affect is bad
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
Detecting and Responding to Student Affective States Opportunity –Affective spoken dialogue system technology can improve student learning and other measures of performance [Aist et al. 2002; Pon-Barry et al. 2006] Challenges – What to detect? – How to respond?
Monitoring Student State (motivation) Tutor26: Let me ask my question in a more general way. What happens to the velocity of a body when there is no force acting on it? Student27:dammit (ASR: it is) Tutor28 :Could you please repeat that? Student29 :same (ASR: i same) Tutor30 :Right. And what happens to the HORIZONTAL velocity of a body when there is no HORIZONTAL force acting on it? Student31:zero (ASR: the zero) Tutor32 :Well... OK, let's consider how forces in the vertical direction affect the motion of a body in a horizontal direction in a different scenario Student33:oh gravity you already said this (ASR: beats gravity you're exert less) Tutor34:Fine. Are there any other forces acting on the apple as it falls? Student35:no why are you doing this again (ASR: no y and to it yes) Tutor36:Correct. The only force acting on the apple as it falls to the earth is the earth's gravitational force. In which direction does gravitational force act? Student37:downward you computer (ASR: downward you computer)
Affective Spoken Dialogue Systems: Standard Methodology Manual Annotation of Affect and Attitudes –Naturally-occurring spoken dialogue data [Ang et al. 2002; Lee et al. 2002; Batliner et al. 2003; Devillers et al. 2003; Shafran et al. 2003; Liscombe et al. 2005] Prediction via Machine Learning –Automatically extract features from user turns –Use different feature sets (e.g. prosodic, lexical) to predict affect –Significant reduction of baseline error
Challenge 1: What “emotions” to detect? Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005]
Challenge 1: What “emotions” to detect? Communicator and Customer Care Systems –Negative: Angry, Annoyed, Frustrated, Tired –Positive/Neutral: Amused, Cheerful, Delighted, Happy, Serious [Ang et al. 2002; Shafran et al. 2003; Lee and Narayanan 2005; Liscombe et al. 2005] Tutorial Dialogue Systems –Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Contempt, Disgusted, Sad –Positive/Neutral: Certain, Curious, Enthusiastic, Eureka [Litman and Forbes-Riley 2006, D’Mello et al. 2006]
Example Student States in ITSPOKE ITSPOKE: What else do you need to know to find the box‘s acceleration? Student: the direction [UNCERTAIN] ITSPOKE : If you see a body accelerate, what caused that acceleration? Student: force [CERTAIN] ITSPOKE : Good job. Say there is only one force acting on the box. How is this force, the box's mass, and its acceleration related? Student: velocity [UNCERTAIN] ITSPOKE : Could you please repeat that? Student: velocity [ANNOYED]
Challenge 2: How to respond? In tutoring, not all negatively-valenced emotions are bad! Instead, they often represent learning opportunities Uncertainty represents one type of learning impasse [VanLehn et al. 2003]: An impasse motivates a student to take an active role in constructing a better understanding of the principle. Uncertainty is also associated with cognitive disequilibrium [Craig et al. 2004]: A state of failed expectations causing deliberation aimed at restoring equilibrium –Uncertainty positively correlates with learning
Example from ITSPOKE ITSPOKE: What is the net force acting on the truck equal to? Student: I don’t know [Uncertain, Frustrated] ITSPOKE: Since net force is defined as the vector sum of all the forces exerted on an object, let's first find the forces acting on the truck. Try to name them. Student: The impact force, the normal force, and gravity [Certain, Neutral] ITSPOKE: Yep.
Do Human Tutors Respond to Student Uncertainty? A data-driven method for designing dialogue systems adaptive to student state [Forbes-Riley and Litman 2005] –extraction of “dialogue bigrams” from annotated human tutoring corpora –χ 2 analysis to identify dependent bigrams –generalizable to any domain with corpora labeled for user state and system response
Example Human Tutoring Excerpt S: So the- when you throw it up the acceleration will stay the same? [Uncertain] T: Acceleration uh will always be the same because there is- that is being caused by force of gravity which is not changing. [Restatement, Expansion] S: mm-k. [Neutral] T: Acceleration is– it is in- what is the direction uh of this acceleration- acceleration due to gravity? [Short Answer Question] S: It’s- the direction- it’s downward. [Certain] T: Yes, it’s vertically down. [Positive Feedback, Restatement]
Bigram Dependency Analysis EXPECTED Tutor IncludePos Tutor OmitsPos neutral certain uncertain mixed OBSERVED Tutor IncludesPos Tutor OmitsPos neutral certain uncertain mixed71161 χ2 = (critical χ2 value at p =.001 is 16.27) - “Student Certainness – Tutor Positive Feedback” Bigrams
Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral OBSERVED Includes Pos Omits Pos neutral Less Tutor Positive Feedback after Student Neutral turns
Bigram Dependency Analysis (cont.) EXPECTED Includes Pos Omits Pos neutral certain uncertain mixed OBSERVED Includes Pos Omits Pos neutral certain uncertain mixed Less Tutor Positive Feedback after Student Neutral turns - More Tutor Positive Feedback after “Emotional” turns
Findings Statistically significant dependencies exist between students’ state of certainty and the responses of an expert human tutor –After uncertain, tutor Bottoms Out & avoids expansions –After certain, tutor Restates –After mixed, tutor Hints –After any emotion, tutor increases Feedback Dependencies suggest adaptive strategies for implementation in computer tutoring systems – Experiment in progress with adaptive ITSPOKE
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
Discourse Structure Opportunity –Dialogues with tutoring systems have more complex hierarchical discourse structures compared to many other types of dialogues Challenges –How can discourse structure be exploited in the context of spoken dialogue systems?
Exploiting Discourse Structure (Motivation) Average ITSPOKE dialogue is 20 minutes Student turns are hierarchically structured –Level 1 : 1350 (57.3%) –Level 2 : 643 (27.3%) –Level 3 : 248 (10.5%) –Levels 4-6 :113 (4.8%)
Discourse structure Annotation and Transitions Based on the Grosz & Sidner theory of discourse structure –Discourse segment Discourse segment purpose –Hierarchy of discourse segments Tutoring information encoded in a hierarchical structure –Human tutor manually authored dialogue paths for ITSPOKE –Automatic traversal of logs places utterances into the structure Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2
Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2 ITSPOKE behavior & Discourse structure annotation
Q1Q1 Q2Q2 Q3Q3 Q 2.1 Q 2.2 Discourse structure transitions
Findings Student correctness is predictive of student learning, but only after particular discourse transitions [Rotaru and Litman 2006] –e.g., After Pops (PopUp, PopUpAdvance) » incorrect turns negatively predict learning » correct turns positively predict learning –Currently testing with experimental manipulation Student certainness is more predictive only after particular transitions
Findings (cont.) While single discourse transitions are not predictive of learning, patterns in the discourse structure are –e.g., Advance-Advance and Push-Push both positively correlate with learning Statistically significant dependencies exist between discourse transitions and speech recognition – e.g., after both Pushes and Pops, more misrecognitions Graphical display of discourse structure increases user satisfaction –e.g., easier for students to concentrate and to learn
Outline Motivation and History The ITSPOKE System and Corpora Opportunities and Challenges – Performance Evaluation – Affective Reasoning – Discourse Analysis Summing Up
Summing Up: I Spoken Dialogue Systems are of great interest to researchers in Intelligent Tutoring –One-on-one tutoring is a powerful technique for helping students learn –Natural language dialogue contributes in a powerful way to the efficacy of one-on-one-tutoring –Using presently available NLP technology, computer tutors can be built and can serve as a valuable aid to student learning
Summing Up: II Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems –Performance Evaluation –Affective Reasoning –Discourse Analysis
Summing Up: II Intelligent Tutoring in turn provides many opportunities and challenges for researchers in Spoken Dialogue Systems –Performance Evaluation –Affective Reasoning –Discourse Analysis –and many more! »Cohesion/Coherence, Priming and Alignment, Dialogue Acts, Reinforcement Learning, User Simulation, Prosody and Dialogue
Acknowledgements ITSPOKE group –Hua Ai, Kate Forbes-Riley, Alison Huettner, Beatriz Maeireizo-Tokeshi, Greg Nicholas, Amruta Purandare, Mihai Rotaru, Scott Silliman, Joel Tetrault, Art Ward –Columbia Collaborators: Julia Hirschberg, Jackson Liscombe, Jennifer Venditti –Jan Wiebe, Rebecca Hwa, Wendy Chapman, Paul Hoffmann, Behrang Mohit, Carol Nichols, Swapna Somasundaran, Theresa Wilson, Chenhai Xi Why2-Atlas and Human Tutoring groups –Kurt Vanlehn, Pam Jordan, Uma Pappuswamy, Carolyn Rose –Micki Chi, Scotty Craig, Bob Hausmann, Margueritte Roy Art Graesser, Natalie Person, Sidney D’Mello, Lindsay Sears
Thank You! Questions? Further Information – Annotated ITSPOKE Corpus –
The End
Detecting Neg/Pos/Neu in ITSPOKE - As with other applications, highest predictive accuracies are obtained by combining multiple feature types [Litman and Forbes-Riley 2006]
Detecting Neg/Pos/Neu in ITSPOKE - However, relative feature utility differs in tutoring (e.g., for speech features: temporal > energy > pitch)
In Closing Synergy between Intelligent Tutoring and Spoken Dialogue Systems can provide –Better scientific understanding of how dialogue facilitates learning –Long-term benefit for scaling spoken dialogue systems to new and complex domains