Download presentation
Presentation is loading. Please wait.
1
circle Empirical Methods for Dialogs, June 2002 1 Some Goals for Evaluating Dialogue Systems Kenneth R. Koedinger Human-Computer Interaction Carnegie Mellon University
2
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 2 Ultimate goal of ITSs: Enhance student learning Many things need to be right to achieve better student learning –Natural language dialogue is one more tough item to get wrong! Not all things we think we need to get right actually have to be right –Our expectations for our systems are sometimes misplaced –Example: Remember John Self's advice “Don't diagnose what you can't fix!” Bottom line: Evaluate or die! (waste time)
3
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 3 Kinds of evaluation questions Different dependent measures (outcome variables) –Student –System Different independent measures (explanatory variables) –Use natural language or not –Use dialogs or not –Have students explain or not –Lots of others...
4
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 4 Student outcome measures Goal: Measure system's effectiveness by measuring its effect on students Kinds of measures –% correct, latency, attitudes/beliefs What is measured? –Student learning in real settings (ultimate goal) –Student learning in lab (proximal goal) –Student performance (proximal goal?)
5
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 5 System outcome measures Goal: Measure whether system achieves what designer thinks will be effective Kinds of measures: %correct, efficiency What is measured? –Next slide...
6
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 6 System outcome measures 2 Natural language understanding Is input "understood” (in correct category)? –compare to gold standard, human raters Natural language generation Are system responses “good”? –have humans rate responses –compare responses to human tutor’s –(observe subsequent student performance) Dialog (not derivable from above) Are turns coherent? Reflect a plan? Adaptive? –Compare with human dialog
7
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 7 Toward Instructional Dialog “Design Theorems” We need: "design theorems" (really heuristics) that relate intermediate measures with ultimate measures Non-theorem: If system improves performance, then it improves learning Theorem(?): If system improves "cognitively engaged" performance, then it improves learning Example conjectures: –If the dialog is better, then learning is improved –If the NLU is better, then learning is improved –If NLG is better, then better learning
8
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 8 “Proofs” of Instructional Dialog Effectiveness Want smaller theorems that can be chained: –If NLU is better, then better responses –If better responses, then better cognitively engaged (CE) performance –If better CE performance, then better learning Need experiments to establish 1. Certainty values for each theorem 2. Applicability conditions (more if-parts) Then: Compose them to create “proofs”
9
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 9 What are your favorite conjectures? Examples from above NLU -> responses NLG -> responses responses -> ce performance ce performance -> learning NLU -> dialog NLG -> dialog dialog -> learning Others? [by audience!] Detailed, non-directive feedback -> self-explain Self-explain -> learning Dialog games -> learning % of info provided by student -> learning Coherent dialog -> learning Planning -> coherent dialog Collaboration -> learning Resolve peer disputes -> collaboration Student initiative (opening a new dialog game) -> learning timely feedback -> learning Adaptive -> dialog More natural (e.g., speech, gesture) -> dialog Better affect -> Prob solved/unit time -> learning Read “X->Y” as “If X is better, then better Y”
10
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 10 Empirical support for any of these conjectures? Refs? Brainstorm if time permits Here: Dialog -> learning: Heffernan, Siler Dialog -> persistance: Heffernan Emotional scaffolding -> motivation -> persistance: Aist Response -> ce performance: Aleven Others: Self-explanation -> learning: Aleven, Conati, Renkl, … Less tutor explanation -> learning: Chi,...
11
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 11 The “mimic human tutoring” conjecture If the system is more like a human tutor, it yields better student learning –Always true? What are the counter-examples? –What if-parts can be added to make more accurate conjectures? –What features of human tutorial dialog lead to better learning? –What features of human tutorial dialog do not lead to better learning?
12
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 12 Example features of human dialog In natural language Highly interactive Responsive to content of student turns –Deep or surface diagnosis? Provide correctness feedback –Immediate or delayed? Provide detailed explanations –Or is socratic (pumping for student knowledge)? –Or prompts for self-explanations? Others? Are these effective?
13
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 13 What are the key questions? What questions should empirical methods for tutorial dialogue systems be addressing? Where should we place our efforts? What will give us the biggest “bang for the buck”? Your ideas?
14
circle Empirical Methods for Dialogs, June 2002Ken Koedinger 14 END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.