Download presentation
Presentation is loading. Please wait.
Published byDwayne Baldwin Modified over 8 years ago
1
Using Natural Language Processing to Analyze Tutorial Dialogue Corpora Across Domains and Modalities Diane Litman, University of Pittsburgh, Pittsburgh, PA Johanna Moore, Myroslava Dzikovska, Elaine Farrow University of Edinburgh, Edinburgh, Scotland
2
Outline Introduction Dialogue Data and Prior Results A Common Research Framework Predicting Learning from Student Dialogue Summary and Future Work
3
Motivation An empirical basis for designing tutorial dialogue systems What aspects of dialogue are predictive of learning? –Student behaviors Do results generalize across tutoring situations? –Domain (mechanics versus electricity in physics) –Modality (spoken versus typed) –Tutor (computer versus human) Can natural language processing be used for automation?
4
Tutorial Dialogue Research Many correlations between dialogue and learning –e.g. [Chi et al. 2001, Katz et al. 2003, Rose et al. 2003, Craig et al. 2004, Boyer et al. 2007, Ohlsson et al. 2007] Difficult to generalize findings due to different –annotation schemes –learning measures –statistical approaches –software tools
5
Two Prior Tutorial Dialogue Corpora ITSPOKE –Spoken dialogue with a computer tutor –Conceptual mechanics –100 dialogues, 20 students BEETLE –Typed dialogue with human tutors –Basic electricity and electronics –60 dialogues, 30 students
6
Back-end is Why2-Atlas (VanLehn, Jordan, Rose et al., 2002) Sphinx2 speech recognition and Cepstral text-to-speech
7
BEETLE Student Screen Lesson Slides Simulation-Based Circuit Workspace Chat Window
8
Common Experimental Aspects Data Collection –Students take a multiple choice pretest –Students work problems with dialogue tutor –Students take a (non-identical) posttest Data Analysis –Dialogues are annotated with various tagsets –Quantitative measures of dialogue behavior are examined for correlations with learning
9
Prior Correlation Results Student domain content and novel dialogue content positively predict learning –both ITSPOKE and BEETLE However, measures such as domain content are computed differently across systems –ITSPOKE: # of student lexical items in an online physics dictionary »Tutor:What is the definition of Newton’s second law? »Student:an object in motion tends to stay in motion until its act by an outside force –BEETLE: # of student segments containing information relevant to lesson topics »Tutor:If bulb B is damaged, what do you think will happen to bulbs A and C? »Student:A and C will not light up. Would other findings have generalized with a more uniform analysis? –affect/attitudes (ITSPOKE only) –words or turns, accuracy, impasses (BEETLE only)
10
A Common Research Framework: I Map related but non-identical annotations to identical tagsets –Word tokenizer –Dictionary-based domain content tagger Additional BEETLE experiments –Impact of domain dictionary –Impact of automated content tagging
11
Methods: I Extract quantitative measures of student dialogue behavior from tagged data –StuWords –StuPhysicsDictWords –StuBeetleDictWords –StuContentSegmentWords Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest
12
Methods: I Extract quantitative measures of student dialogue behavior from tagged data (normalized) –StuWords / Words –StuPhysicsDictWords / Words –StuBeetleDictWords / Words –StuContentSegmentWords / Words Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest
13
Methods: I Extract quantitative measures of student dialogue behavior from tagged data (normalized) –StuWords / Words –StuPhysicsDictWords / Words –StuBeetleDictWords / Words –StuContentSegmentWords / Words Correlate measures with learning gain – partial correlations with Posttest, after regressing out Pretest
14
Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Student talk is not significantly correlated with learning (although trend in BEETLE)
15
Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning
16
Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Domain talk is significantly correlated with learning Domain dictionary matters
17
Results I: Correlations with Posttest (controlled for Pretest) MeasureBEETLEITSPOKE RpRp StuWords / Words.34.08.17.48 StuPhysicsDictWords / Words.22.26.60.01 StuBeetleDictWords / Words.38.04NA StuContentSegmWords / Words.43.02NA Using natural language processing for domain tagging is a viable alternative to manual annotation of contentful discourse segments
18
A Common Research Framework: II Map related but non-identical annotations to common higher level theoretical constructs –DeMAND coding scheme [Campbell et al. poster] –Dialogues uniformly represented/queried using NXT, NLTK
19
DeMAND: Coding Utterances for Significant Events Consider common theories of effective learning events Constructivism / generative learning –Duffy & Jonassen, 1992 Impasses –Van Lehn et. al., 2003 Accountable talk –Wolf, Crosson & Resnick, 2006 Deep processing / cognitive effort –Nolen, 1988 Student produces a lot of new information Student is incorrect or correct with low confidence Student is both accurate & deep Student utterances are deep (regardless of accuracy) NOVELTY ACCURACY & DOUBT ACCURACY & DEPTH DEPTH
20
Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Some learning event constructs map directly from the tagging dimensions cognitive effort: student utterances that are deep
21
Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep
22
Mapping Tag Dimensions to Constructs ConstructBEETLEITSPOKE EffortDepth=YesAnswer=Deep AccountableDepth=Yes Accuracy=Correct V Missing Answer=Deep Accuracy=Correct Impasses(Doubt=Yes Accuracy=Correct V Missing) V (Accuracy=Incorrect V Errors) (Certainness = Uncertain V Mixed Accuracy=Correct) V (Accuracy=Incorrect V Partial) Other constructs map tag values from multiple dimensions accountable talk: utterances that are accurate and deep student impasses: utterances that are correct with doubt, or incorrect
23
Methods: I I Extract quantitative measures of student dialogue behavior from tagged data –Tag Dimensions » depth, novelty, accuracy, doubt –Learning Constructs »effort, knowledge construction, impasses, accountable Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures
24
Methods: I I Extract quantitative measures of student dialogue behavior from tagged data (normalized) –Tag Dimensions » % depth, % novelty, % accuracy, % doubt –Learning Constructs » % effort, % knowledge construction, % impasses, % accountable Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures
25
Methods: I I Extract quantitative measures of student dialogue behavior from tagged data (normalized) –Tag Dimensions » % depth, % novelty, % accuracy, % doubt –Learning Constructs » % effort, % knowledge construction, % impasses, % accountable Predict learning gain from dialogue measures – multivariate linear regression – dependent measure: posttest – independent measures: pretest and sets of dialogue measures
26
Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right.23.03 4 Constructs%Impasses.22.01%Accountable, %Effort.50.01 The same tag dimension is selected as most predictive of learning across corpora, after unifying methods (%Right) Beetle Accuracy = Correct V Missing ITSPOKE Accuracy = Correct
27
Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right,.23.03 4 Constructs%Impasses.22.01%Accountable, %Effort.50.01 When using stepwise regression, different learning constructs are selected as best predictors across corpora
28
Results II: Regressions with Posttest MeasuresBEETLEITSPOKE PredictorsR2R2 p R2R2 p 4 Tags%Right.46.00%Right,.23.03 ITSPOKE Constructs %Accountable %Effort.18.07%Accountable %Effort.50.01 However, constructs trained from ITSPOKE predict learning when tested on BEETLE both predictors individually significant (p <.03) for BEETLE
29
Summary Methods for uniformly annotating and statistically analyzing previously collected dialogue corpora Enhancement of original findings –Replication of positive correlations with student domain content –Impact of dictionary –Use of natural language processing for automated tagging Emergence of new results across corpora –Positive correlations with student accuracy –Accountable talk and student effort together predict learning
30
Future Directions Further generalization of prior results –tutor behaviors (e.g., questioning, restating) –additional corpora More sophisticated natural processing for content tagging
31
Thank You! Questions?
32
Graphical User Interface
33
Annotated Human-Human Excerpt T: Which one will be faster? [Short Answer Question] S: The feathers. [Novel/Single Answer] T: The feathers - why? [Restatement, Deep Answer Question] S: Because there’s less matter. [Deep Answer] All turns in both corpora were manually coded for dialogue acts (Kappa >.6)
34
A B C battery X Question: If bulb B is damaged, what do you think will happen to bulbs A and C? Non-Accountable Talk: utt69: student: A and C will not light up Accuracy = Correct; Cognitive Processing = Absent Cognitive Effort and Potential Impasse: utt122a: student: bulb a will light but b and c won't since b is damaged and breaks the closed path circuit Accuracy = Incorrect; Cognitive Processing = Present Potential Impasse: utt97: student: both would be either dim or not light I would think Accuracy = Partially Correct; Cognitive Processing = Absent; Signs of Low Confidence = Yes utt83a: student: both bulbs A and C will go out because this scenario would act the same as if there was an open circuit Accuracy = Correct; Cognitive Processing = Present Accountable Talk:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.