The interaction plateau CPI 494, April 9, 2009 Kurt VanLehn 1
2 Schematic of a natural language tutoring systems, AutoTutor Step start T: Tell T: ElicitS: Correct Step end S: Incorrect T: Hint or prompt Remediation: Only if out of hints
3 Schematic of other natural language tutors, e.g., Atlas, Circsim-Tutor, Kermit-SE Step start T: Tell T: ElicitS: Correct Step end S: Incorrect T: What is…? S: I don’t know. T:Well, what is… S:… T:… Remediation: Only if out of hints Often called a KCD: Knowledge construction dialogue
4 Hypothesized ranking of tutoring, most effective first A.Expert human tutors B.Ordinary human tutors C.Natural language tutoring systems D.Step-based tutoring systems E.Answer-based tutoring systems F.No tutoring
5 Hypothesized effect sizes
6 Bloom’s (1984) 2-sigma: 4 weeks of human tutoring vs. classroom Classroom
7 Hypothesized effect sizes Classroom Kulik (1984) meta-analysis of CAI vs. classroom 0.4 sigma
8 Hypothesized effect sizes Classroom Many intelligent tutoring systems: e.g., Andes (VanLehn et al, 2005), Carnegie Learning’s tutors…
9 My main claim: There is an interaction plateau
10 A problem and its steps uSuppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land? 1.Initially, you and the pumpkin have the same horizontal velocity. 2.Your throw exerts a net force vertically on the pumpkin. 3.Thus causing a vertical acceleration. 4.Which leaves the horizontal velocity unaffected. 5.So when the pumpkin falls, it has traveled the same distance horizontally as you have. 6.Thus, it lands in your hands
11 A dialogue between a human tutor (T) and human student (S) uSuppose you are running in a straight line at constant speed. You throw a pumpkin straight up. Where will it land? S: Behind me. –T: Hmm. Let’s think about that. Before you toss the pumpkin and are just carrying it, do you and the pumpkin have the same speed? S: Yes –T: Good. When you toss it up, is the net force on it exactly vertical? S: I’m not sure. T: You exert a force on the pumpkin, right? Etc.
12 Schematic of dialogue about a single step Step start T: Tell T: ElicitS: Correct Step end S: Incorrect T: Hint, or prompt, or explain, or analogy, or … Remediation:
13 Comparisons of expert to novice human tutors Step start T: Tell T: ElicitS: Correct Step end S: Incorrect T: Hint, or prompt, or explain, or analogy, or … Novices Experts Experts may have a wider variety
14 Schematic of an ITS handling of a single step Step start T: Tell S: Correct Step end S: IncorrectT: Hint Only if out of hints
15 Major differences uLow-interaction tutoring (e.g., CAI) –Remediation on answer only uStep-based interaction (e.g., ITS) –Remediation on each step –Hint sequence, with final “bottom out” hint uNatural tutoring (e.g., human tutoring) –Remediation on each step, substep, inference… –Natural language dialogues –Many tutorial tactics
16 Conditions (VanLehn, Graesser et al., 2007) uNatural tutoring –Expert Human tutors »Typed »Spoken –Natural language dialogue computer tutors »Why2-AutoTutor (Graesser et al.) »Why2-Atlas (VanLehn et al.) uStep-based interaction –Canned text remediation uLow interaction –Textbook
17 Human tutors (a form of natural tutoring) Step start T: Tell T: ElicitS: Correct Step end S: Incorrect T: Hint, or prompt, or explain, or analogy, or …
18 Why2-Atlas (a form of natural tutoring) Step start T: Tell T: ElicitS: Correct Step end S: Incorrect A Knowledge Construction Dialogue
19 Why2-AutoTutor (a form of natural tutoring) Step start T: Tell T: ElicitS: Correct Step end S: Incorrect Hint or prompt
20 Canned-text remediation (a form of step-based interaction) Step start T: Tell T: ElicitS: Correct Step end S: Incorrect Text
21 Experiment 1: Intermediate students & instruction
22 Experiment 1: Intermediate students & instruction No reliable differences
23 Experiment 2: AutoTutor > Textbook = Nothing Reliably different
24 Experiments 1 & 2 (VanLehn, Graesser et al., 2007) No significant differences
25 Experiment 3: Intermediate students & instruction Deeper assessments
26 Experiment 3: Intermediate students & instruction No reliable differences
27 Experiment 4: Novice students & intermediate instruction Relearning
28 Experiment 4: Novice students & intermediate instruction All differences reliable
29 Experiment 5: Novice students & intermediate (but shorter) instruction Relearning Add
30 Experiment 5: Novice students & intermediate instruction No reliable differences
31 Experiment 5: Low-pretest students only Aptitude- treatment interaction?
32 Experiment 5, Low-pretest students only Spoken human tutoring > canned text remediation
33 Experiments 6 and 7 Novice students & novice instruction Was the intermediate text over the novice students’ heads?
34 Experiments 6 and 7 Novice students & novice instruction No reliable differences
35 Interpretation Experiments 1 & 4 Experiments 3 & 5 Experiments 6 & 7 High-pretest Low-pretest Intermediates High-pretest Low-pretest Novices Content complexity = Can follow reasoning only with tutor’s help (ZPD) predict: Tutoring > Canned text remediation = Can follow reasoning without any help predict: Tutoring = Canned text remediation
36 Original research questions uCan natural language tutorial dialog add pedagogical value? –Yes, when students must study content that is too complex to be understood by reading alone uHow feasible is a deep linguistic tutoring system? –We built it. It’s fast enough to use. uCan deep linguistic and dialog techniques add pedagogical value?
37 When content is too complex to learn by reading alone: Deep>Shallow? Why2-Atlas is not clearly better than Why2-AutoTutor
38 When to use deep vs. shallow? Shallow linguisticDeep linguistic Sentence understanding LSA, Rainbow, RappelCarmel: parser, semantics… Essay/Discourse understanding LSAAbduction, Bnets Dialog management Finite state networksReactive planning Natural language generation TextPlan-based Use both Use deep Use locally smart FSA Use equivalent texts
39 Results from all 7 experiments (VanLehn, Graesser et al., 2007) uWhy2: Atlas = AutoTutor uWhy2 > Textbook –No essays –Content differences uHuman tutoring = Why2 = Canned text remediation –Except when novice students worked with instruction designed for intermediates, then Human tutoring > Canned text remediation
40 Other evidence for the interaction plateau (Evens & Michael, 2006) No significant differences
41 Other evidence for the interaction plateau (Reif & Scott, 1999 ) No significant differences
42 Other evidence for the interaction plateau (Chi, Roy & Hausmann, in press) No significant differences
43 Still more studies where natural tutoring = step-based interaction uHuman tutors 1.Human tutoring = human tutoring with only content-free prompting for step remediation (Chi et al., 2001) 2.Human tutoring = canned text during post-practice remediation (Katz et al., 2003) 3.Socratic human tutoring = didactic human tutoring (Rosé et al., 2001a 4.Socratic human tutoring = didactic human tutoring (Johnson & Johnson, 1992) 5.Expert human tutoring = novice human tutoring (Chae, Kim & Glass, 2005) uNatural language tutoring systems 1.Andes-Atlas = Andes with canned text (Rosé et al, 2001b) 2.Kermit = Kermit with dialogue explanations (Weerasinghe & Mitrovic, 2006)
44 Hypothesis 1: Exactly how tutors remedy a step doesn’t matter much Step start T: Tell T: ElicitS: Correct Step end S: Incorrect What’s in here doesn’t matter much
45 Main claim: There is an interaction plateau Hypothesis 1
46 Hypothesis 2: Cannot eliminate the step remediation loop Step start T: Tell T: ElicitS: Correct Step end S: Incorrect Must avoid this
47 Main claim: There is an interaction plateau Hypothesis 2
48 Conclusions uWhat does it take to make computer tutors as effective as human tutors? –Step-based interaction –Bloom’s 2-sigma results may have been due to weak control conditions (classroom instruction) –Other evaluations have also used weak controls uWhen is natural language useful? –For steps themselves (vs. menus, algebra…) –NOT for feedback & hints (remeditation) on steps
49 Future directions for tutoring systems research uMaking step-based instruction ubiquitous –Authoring & customizing –Novel task domains uIncreasing engagement
50 Final thought uMany people “just know” that more interaction produces more learning. u“It ain’t so much the things we don’t know that get us into trouble. It’s the things we know that just ain’t so.” –Josh Billings (aka. Henry Wheeler Shaw)