Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh, PA Johanna Moore, Myroslava Dzikovska, Elaine Farrow University of Edinburgh, Edinburgh, Scotland

Outline  Introduction  Dialogue Data and Prior Results  A Common Research Framework  Predicting Learning from Student Dialogue  Summary and Future Work

Motivation  An empirical basis for designing tutorial dialogue systems  What aspects of dialogue are predictive of learning? –Student behaviors  Do results generalize across tutoring situations? –Domain (mechanics versus electricity in physics) –Modality (spoken versus typed) –Tutor (computer versus human)  Can natural language processing be used for automation?

Tutorial Dialogue Research  Many correlations between dialogue and learning –e.g. [Chi et al. 2001, Katz et al. 2003, Rose et al. 2003, Craig et al. 2004, Boyer et al. 2007, Ohlsson et al. 2007]  Difficult to generalize findings due to different –annotation schemes –learning measures –statistical approaches –software tools

Two Prior Tutorial Dialogue Corpora  ITSPOKE –Spoken dialogue with a computer tutor –Conceptual mechanics –100 dialogues, 20 students  BEETLE –Typed dialogue with human tutors –Basic electricity and electronics –60 dialogues, 30 students

Back-end is Why2-Atlas (VanLehn, Jordan, Rose et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech

BEETLE Student Screen Lesson Slides Simulation-Based Circuit Workspace Chat Window

Common Experimental Aspects  Data Collection –Students take a multiple choice pretest –Students work problems with dialogue tutor –Students take a (non-identical) posttest  Data Analysis –Dialogues are annotated with various tagsets –Quantitative measures of dialogue behavior are examined for correlations with learning

Prior Correlation Results  Student domain content and novel dialogue content positively predict learning –both ITSPOKE and BEETLE  However, measures such as domain content are computed differently across systems –ITSPOKE: # of student lexical items in an online physics dictionary »Tutor:What is the definition of Newton’s second law? »Student:an object in motion tends to stay in motion until its act by an outside force –BEETLE: # of student segments containing information relevant to lesson topics »Tutor:If bulb B is damaged, what do you think will happen to bulbs A and C? »Student:A and C will not light up.  Would other findings have generalized with a more uniform analysis? –affect/attitudes (ITSPOKE only) –words or turns, accuracy, impasses (BEETLE only)

A Common Research Framework: I  Map related but non-identical annotations to identical tagsets –Word tokenizer –Dictionary-based domain content tagger  Additional BEETLE experiments –Impact of domain dictionary –Impact of automated content tagging

Methods: I  Extract quantitative measures of student dialogue behavior from tagged data –StuWords –StuPhysicsDictWords –StuBeetleDictWords –StuContentSegmentWords  Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest

Methods: I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –StuWords / Words –StuPhysicsDictWords / Words –StuBeetleDictWords / Words –StuContentSegmentWords / Words  Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Student talk is not significantly correlated with learning (although trend in BEETLE)

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning Domain dictionary matters

Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Using natural language processing for domain tagging is a viable alternative to manual annotation of contentful discourse segments

A Common Research Framework: II  Map related but non-identical annotations to common higher level theoretical constructs –DeMAND coding scheme [Campbell et al. poster] –Dialogues uniformly represented/queried using NXT, NLTK

DeMAND: Coding Utterances for Significant Events  Consider common theories of effective learning events Constructivism / generative learning –Duffy & Jonassen, 1992 Impasses –Van Lehn et. al., 2003 Accountable talk –Wolf, Crosson & Resnick, 2006 Deep processing / cognitive effort –Nolen, 1988 Student produces a lot of new information Student is incorrect or correct with low confidence Student is both accurate & deep Student utterances are deep (regardless of accuracy) NOVELTY ACCURACY & DOUBT ACCURACY & DEPTH DEPTH

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Some learning event constructs map directly from the tagging dimensions cognitive effort: student utterances that are deep

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep

Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep student impasses: utterances that are correct with doubt, or incorrect

Methods: I I  Extract quantitative measures of student dialogue behavior from tagged data –Tag Dimensions » depth, novelty, accuracy, doubt –Learning Constructs »effort, knowledge construction, impasses, accountable  Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures

Methods: I I  Extract quantitative measures of student dialogue behavior from tagged data (normalized) –Tag Dimensions » % depth, % novelty, % accuracy, % doubt –Learning Constructs » % effort, % knowledge construction, % impasses, % accountable  Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right.23.03 4 Constructs%Impasses.22.01%Accountable, %Effort.50.01 The same tag dimension is selected as most predictive of learning across corpora, after unifying methods (%Right) Beetle Accuracy = Correct V Missing ITSPOKE Accuracy = Correct

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right,.23.03 4 Constructs%Impasses.22.01%Accountable, %Effort.50.01 When using stepwise regression, different learning constructs are selected as best predictors across corpora

Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right,.23.03 ITSPOKE Constructs %Accountable %Effort.18.07%Accountable %Effort.50.01 However, constructs trained from ITSPOKE predict learning when tested on BEETLE both predictors individually significant (p <.03) for BEETLE

Summary  Methods for uniformly annotating and statistically analyzing previously collected dialogue corpora  Enhancement of original findings –Replication of positive correlations with student domain content –Impact of dictionary –Use of natural language processing for automated tagging  Emergence of new results across corpora –Positive correlations with student accuracy –Accountable talk and student effort together predict learning

Future Directions  Further generalization of prior results –tutor behaviors (e.g., questioning, restating) –additional corpora  More sophisticated natural processing for content tagging

Thank You! Questions?

Graphical User Interface

Annotated Human-Human Excerpt T: Which one will be faster? [Short Answer Question] S: The feathers. [Novel/Single Answer] T: The feathers - why? [Restatement, Deep Answer Question] S: Because there’s less matter. [Deep Answer]  All turns in both corpora were manually coded for dialogue acts (Kappa >.6)

A B C battery X Question: If bulb B is damaged, what do you think will happen to bulbs A and C? Non-Accountable Talk: utt69: student: A and C will not light up Accuracy = Correct; Cognitive Processing = Absent Cognitive Effort and Potential Impasse: utt122a: student: bulb a will light but b and c won't since b is damaged and breaks the closed path circuit Accuracy = Incorrect; Cognitive Processing = Present Potential Impasse: utt97: student: both would be either dim or not light I would think Accuracy = Partially Correct; Cognitive Processing = Absent; Signs of Low Confidence = Yes utt83a: student: both bulbs A and C will go out because this scenario would act the same as if there was an open circuit Accuracy = Correct; Cognitive Processing = Present Accountable Talk:

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Similar presentations

Presentation on theme: "Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,

Similar presentations

Presentation on theme: "Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh,"— Presentation transcript:

Similar presentations

About project

Feedback