Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Slides:



Advertisements
Similar presentations
Test Item Analysis Science Grade 6 Josh Doty John Sevier Middle School Kingsport, TN Grade Level: 6 Content Area: Earth Science Source: 2007 Tennessee.
Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
Jozef Tvarožek Fifth European Conference on Technology Enhanced Learning Sustaining TEL EC-TEL 2010, Barcelona September 28 – October 1, 2010.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
The art and science of measuring people l Reliability l Validity l Operationalizing.
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 Tools of Software Development l 2 types of tools used by software engineers:
Click to edit the title text format An Introduction to TuTalk: Developing Dialogue Agents for Learning Studies Pamela Jordan University of Pittsburgh Learning.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman and Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA USA.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Collecting, Storing, Coding, and Analyzing Spoken Tutorial Dialogue Corpora Diane Litman LRDC & Pitt CS.
Analyzing Chat Dialogue with Taghelper Tools Catherine Chase Stanford University PSLC Summer Institute June 22, 2007.
Software Testing. Definition To test a program is to try to make it fail.
QUALITATIVE MODELING IN EDUCATION Bert Bredweg and Ken Forbus Yeşim İmamoğlu.
Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.
©2010 John Wiley and Sons Chapter 11 Research Methods in Human-Computer Interaction Chapter 11- Analyzing Qualitative.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Are there “Hidden Variables” in Students’ Initial Knowledge State Which Correlate with Learning Gains? David E. Meltzer Department of Physics and Astronomy.
circle A Comparison of Tutor and Student Behavior in Speech Versus Text Based Tutoring Carolyn P. Rosé, Diane Litman, Dumisizwe Bhembe, Kate Forbes, Scott.
The Role of Information in Systems for Learning Paul Nichols Charles DePascale The Center for Assessment.
Expanding the Accessibility and Impact of Language Technologies for Supporting Education (TFlex): Edinburgh Effort Dr. Myroslava Dzikovska, Prof. Johanna.
Kate’s Ongoing Work on Uncertainty Adaptation in ITSPOKE.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Comparing Synthesized versus Pre-Recorded Tutor Speech in an Intelligent Tutoring Spoken Dialogue System Kate Forbes-Riley and Diane Litman and Scott Silliman.
María Pita Carranza Ángel Centeno Ángela Corengia Laura Llull Belén Mesurado Cecilia Primogerio Francisco Redelico Predicting Academic Performance and.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Correlations with Learning in Spoken Tutoring Dialogues Diane Litman Learning Research and Development Center and Computer Science Department University.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
1 Virtual COMSATS Inferential Statistics Lecture-16 Ossam Chohan Assistant Professor CIIT Abbottabad.
A Meta-Study of Algorithm Visualization Effectiveness Christopher Hundhausen, Sarah Douglas, John Stasko Presented by David Burlinson 8/10/2015.
ACS'08, November, Venice, ITALY Designing organic reaction simulation engine using qualitative reasoning approach Y.C. Alicia Tang Tenaga Nasional.
1 USC Information Sciences Institute Yolanda GilFebruary 2001 Knowledge Acquisition as Tutorial Dialogue: Some Ideas Yolanda Gil.
Collaborative Research: Monitoring Student State in Tutorial Spoken Dialogue Diane Litman Computer Science Department and Learning Research and Development.
Does the modality principle for multimedia learning apply to science classrooms? 指導教授: Chen Ming-Puu 報 告 者: Chen Hsiu-Ju 報告日期: Harskamp, E.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Intelligent Tutoring Systems Conference (2014)
Recommendation for English multiple-choice cloze questions based on expected test scores 2011, International Journal of Knowledge-Based and Intelligent.
Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Speech and Language Processing for Educational Applications Professor Diane Litman Computer Science Department & Intelligent Systems Program & Learning.
Working Memory and Learning Underlying Website Structure
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Spoken Dialogue in Human and Computer Tutoring Diane Litman Learning Research and Development Center and Computer Science Department University of Pittsburgh.
Speech and Language Processing for Adaptive Training Diane Litman Professor, Computer Science Department Senior Scientist, Learning Research & Development.
Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.
Boris Milašinović Faculty of Electrical Engineering and Computing University of Zagreb, Croatia.
InfoMagnets : Making Sense of Corpus Data Jaime Arguello Language Technologies Institute.
(Speech and Affect in Intelligent Tutoring) Spoken Dialogue Systems Diane Litman Computer Science Department and Learning Research and Development Center.
Metacognition and Learning in Spoken Dialogue Computer Tutoring Kate Forbes-Riley and Diane Litman Learning Research and Development Center University.
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Computer Science Department University of Pittsburgh.
A Tutorial Dialogue System that Adapts to Student Uncertainty Diane Litman Computer Science Department & Intelligent Systems Program & Learning Research.
circle Towards Spoken Dialogue Systems for Tutorial Applications Diane Litman Reprise of LRDC Board of Visitors Meeting, April 2003.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
Experiments with ITSPOKE: An Intelligent Tutoring Spoken Dialogue System Diane Litman Computer Science Department and Learning Research and Development.
User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.
Detecting and Adapting to Student Uncertainty in a Spoken Tutorial Dialogue System Diane Litman Computer Science Department & Learning Research & Development.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources Kate Forbes-Riley and Diane Litman Learning Research and Development Center and Computer.
Roy B. Clariana, Assistant Professor The Pennsylvania State University
Towards Emotion Prediction in Spoken Tutoring Dialogues
Dialogue-Learning Correlations in Spoken Dialogue Tutoring
Julie Booth, Robert Siegler, Ken Koedinger & Bethany Rittle-Johnson
Presentation transcript:

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh, PA Johanna Moore, Myroslava Dzikovska, Elaine Farrow University of Edinburgh, Edinburgh, Scotland

Outline  Introduction  Dialogue Data and Prior Results  A Common Research Framework  Predicting Learning from Student Dialogue  Summary and Future Work

Motivation  An empirical basis for designing tutorial dialogue systems  What aspects of dialogue are predictive of learning? –Student behaviors  Do results generalize across tutoring situations? –Domain (mechanics versus electricity in physics) –Modality (spoken versus typed) –Tutor (computer versus human)  Can natural language processing be used for automation?

Tutorial Dialogue Research  Many correlations between dialogue and learning –e.g. [Chi et al. 2001, Katz et al. 2003, Rose et al. 2003, Craig et al. 2004, Boyer et al. 2007, Ohlsson et al. 2007]  Difficult to generalize findings due to different –annotation schemes –learning measures –statistical approaches –software tools

Two Prior Tutorial Dialogue Corpora  ITSPOKE –Spoken dialogue with a computer tutor –Conceptual mechanics –100 dialogues, 20 students  BEETLE –Typed dialogue with human tutors –Basic electricity and electronics –60 dialogues, 30 students

Back-end is Why2-Atlas (VanLehn, Jordan, Rose et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

BEETLE Student Screen Lesson Slides Simulation-Based Circuit Workspace Chat Window

Common Experimental Aspects  Data Collection –Students take a multiple choice pretest –Students work problems with dialogue tutor –Students take a (non-identical) posttest  Data Analysis –Dialogues are annotated with various tagsets –Quantitative measures of dialogue behavior are examined for correlations with learning

Prior Correlation Results  Student domain content and novel dialogue content positively predict learning –both ITSPOKE and BEETLE  However, measures such as domain content are computed differently across systems –ITSPOKE: # of student lexical items in an online physics dictionary »Tutor:What is the definition of Newton’s second law? »Student:an object in motion tends to stay in motion until its act by an outside force –BEETLE: # of student segments containing information relevant to lesson topics »Tutor:If bulb B is damaged, what do you think will happen to bulbs A and C? »Student:A and C will not light up.  Would other findings have generalized with a more uniform analysis? –affect/attitudes (ITSPOKE only) –words or turns, accuracy, impasses (BEETLE only)

A Common Research Framework: I  Map related but non-identical annotations to identical tagsets –Word tokenizer –Dictionary-based domain content tagger  Additional BEETLE experiments –Impact of domain dictionary –Impact of automated content tagging

Methods: I  Extract quantitative measures of student dialogue behavior from tagged data –StuWords –StuPhysicsDictWords –StuBeetleDictWords –StuContentSegmentWords  Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest

Methods: I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –StuWords / Words –StuPhysicsDictWords / Words –StuBeetleDictWords / Words –StuContentSegmentWords / Words  Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest

Methods: I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –StuWords / Words –StuPhysicsDictWords / Words –StuBeetleDictWords / Words –StuContentSegmentWords / Words  Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words StuPhysicsDictWords / Words StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Student talk is not significantly correlated with learning (although trend in BEETLE)

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words StuPhysicsDictWords / Words StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words StuPhysicsDictWords / Words StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning Domain dictionary matters

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words StuPhysicsDictWords / Words StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Using natural language processing for domain tagging is a viable alternative to manual annotation of contentful discourse segments

A Common Research Framework: II  Map related but non-identical annotations to common higher level theoretical constructs –DeMAND coding scheme [Campbell et al. poster] –Dialogues uniformly represented/queried using NXT, NLTK

DeMAND: Coding Utterances for Significant Events  Consider common theories of effective learning events Constructivism / generative learning –Duffy & Jonassen, 1992 Impasses –Van Lehn et. al., 2003 Accountable talk –Wolf, Crosson & Resnick, 2006 Deep processing / cognitive effort –Nolen, 1988 Student produces a lot of new information Student is incorrect or correct with low confidence Student is both accurate & deep Student utterances are deep (regardless of accuracy) NOVELTY ACCURACY & DOUBT ACCURACY & DEPTH DEPTH

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Some learning event constructs map directly from the tagging dimensions cognitive effort: student utterances that are deep

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep student impasses: utterances that are correct with doubt, or incorrect

Methods: I I  Extract quantitative measures of student dialogue behavior from tagged data –Tag Dimensions » depth, novelty, accuracy, doubt –Learning Constructs »effort, knowledge construction, impasses, accountable  Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures

Methods: I I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –Tag Dimensions » % depth, % novelty, % accuracy, % doubt –Learning Constructs » % effort, % knowledge construction, % impasses, % accountable  Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures

Methods: I I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –Tag Dimensions » % depth, % novelty, % accuracy, % doubt –Learning Constructs » % effort, % knowledge construction, % impasses, % accountable  Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right Constructs%Impasses.22.01%Accountable, %Effort The same tag dimension is selected as most predictive of learning across corpora, after unifying methods (%Right) Beetle Accuracy = Correct V Missing ITSPOKE Accuracy = Correct

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right, Constructs%Impasses.22.01%Accountable, %Effort When using stepwise regression, different learning constructs are selected as best predictors across corpora

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right, ITSPOKE Constructs %Accountable %Effort.18.07%Accountable %Effort However, constructs trained from ITSPOKE predict learning when tested on BEETLE both predictors individually significant (p <.03) for BEETLE

Summary  Methods for uniformly annotating and statistically analyzing previously collected dialogue corpora  Enhancement of original findings –Replication of positive correlations with student domain content –Impact of dictionary –Use of natural language processing for automated tagging  Emergence of new results across corpora –Positive correlations with student accuracy –Accountable talk and student effort together predict learning

Future Directions  Further generalization of prior results –tutor behaviors (e.g., questioning, restating) –additional corpora  More sophisticated natural processing for content tagging

Thank You! Questions?

Graphical User Interface

Annotated Human-Human Excerpt T: Which one will be faster? [Short Answer Question] S: The feathers. [Novel/Single Answer] T: The feathers - why? [Restatement, Deep Answer Question] S: Because there’s less matter. [Deep Answer]  All turns in both corpora were manually coded for dialogue acts (Kappa >.6)

A B C battery X Question: If bulb B is damaged, what do you think will happen to bulbs A and C? Non-Accountable Talk: utt69: student: A and C will not light up Accuracy = Correct; Cognitive Processing = Absent Cognitive Effort and Potential Impasse: utt122a: student: bulb a will light but b and c won't since b is damaged and breaks the closed path circuit Accuracy = Incorrect; Cognitive Processing = Present Potential Impasse: utt97: student: both would be either dim or not light I would think Accuracy = Partially Correct; Cognitive Processing = Absent; Signs of Low Confidence = Yes utt83a: student: both bulbs A and C will go out because this scenario would act the same as if there was an open circuit Accuracy = Correct; Cognitive Processing = Present Accountable Talk: