Circle Empirical Methods for Dialogs, June 2002 1 Some Goals for Evaluating Dialogue Systems Kenneth R. Koedinger Human-Computer Interaction Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Costa’s Levels of Questioning
Advertisements

WHAT IS THE MOST IMPORTANT THING THAT WE HAVE LEARNED ABOUT QUALITY AND COST? The factors that lead to increased student learning and increased student.
Correction, feedback and assessment: Their role in learning
The Effect of the A.C.E. Program on Achievement and Learning Joy Crosby.
Gradual Release of Responsibility & Feedback
Dialogic teaching in language classrooms. Do you know what RHINOs are? Really Here In Name Only Do you discover any ‘Rhinos’ in your classrooms?
Quality First Teaching In Any Subject From Good to Outstanding
Philosophy 223 Relativism and Egoism. Remember This Slide? Ethical reflection on the dictates of morality can address these sorts of issues in at least.
Debating the Issue of Tutoring Interactivity: Intuition vs. Experimentation Tanner Jackson It’s a MAD MAD MAD MAD Morning.
Supporting (aspects of) self- directed learning with Cognitive Tutors Ken Koedinger CMU Director of Pittsburgh Science of Learning Center Human-Computer.
Learning the Language of Linear Algebra John Hannah (Canterbury, NZ) Sepideh Stewart (Oklahoma, US) Mike Thomas (Auckland, NZ)
Interview Effects on the Development of Algebraic Strategies Howard Glasser and Jon R. Star, Michigan State University Purpose/Objective This research.
Evaluating with experts
Contrasting Examples in Mathematics Lessons Support Flexible and Transferable Knowledge Bethany Rittle-Johnson Vanderbilt University Jon Star Michigan.
Metacognition Helping students to self-regulate. Definitions  Metacognition - literally “beyond knowing”, knowing what one knows and doesn’t know - promoting.
CLT Conference Heerlen Ron Salden, Ken Koedinger, Vincent Aleven, & Bruce McLaren (Carnegie Mellon University, Pittsburgh, USA) Does Cognitive Load Theory.
Section VI: Comprehension Teaching Reading Sourcebook 2 nd edition.
Matt Moxham EDUC 290. The Idaho Core Teacher Standards are ten standards set by the State of Idaho that teachers are expected to uphold. This is because.
Interactive Science Notebooks: Putting the Next Generation Practices into Action
International Reading Association May 9, 2014 Peggy Coyne Universal Design for Learning, Comprehension and Online Dialogues: Engaging Struggling.
Good Slide vs. Bad Slide - The Bad 1.Title is not changed. 2.The Challenge statement is not "stated as an instructional dilemma or problem." 3.Although.
Source: Erica MelisLeActiveMath Language-enhanced, user-adaptive, interactive eLearning for Mathematics Erica Melis Competence Center for Technology-Enhanced.
 Inquiry-Based Learning Instructional Strategies Link to Video.
Slide 1 © Crown copyright 2009 Talk for learning Session 3.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 25, 2010.
Interstate New Teacher Assessment and Support Consortium (INTASC)
Technology in Early childhood education
Chapter 11 Helping Students Construct Usable Knowledge.
Using Collaborative Filtering in an Intelligent Tutoring System for Legal Argumentation Niels Pinkwart, Vincent Aleven, Kevin Ashley, and Collin Lynch.
Using Feedback. Objectives Assess the effectiveness of a range of factors affecting achievement Assess the effectiveness of a range of factors affecting.
ELA: Focus on Collaborative Conversations & Writing FCUSD Instructional Focus Meeting Sara Parenzin September 20, 2012 Welcome! Please sign in and start.
Enriching primary student teachers’ conceptions about science teaching: Towards dialogic inquiry-based learning Ilkka Ratinen, Sami Lehesvuori, Otto Kulhomäki,
Inquiry-Oriented Learning Tasks, Group work & Office Applications CPE4112 Computer-Based Teaching & Learning Session 3.
Common problematic practice… Providing grades, not feedback Providing feedback that is too global Providing feedback that only focuses on specific errors.
Assist in the Implementation of Planned Educational Programs.
Evidence-based Practice Chapter 3 Ken Koedinger Based on slides from Ruth Clark 1.
Slide 1 Kirsten Butcher Elaborated Explanations for Visual/Verbal Problem Solving: Interactive Communication Cluster July 24, 2006.
PROBLEM AREAS IN MATHEMATICS EDUCATION By C.K. Chamasese.
Building Effective Partnerships Our Work For Today : *Partnerships will work for all content areas and grade levels. ●The “Why” of partnerships.
Vincent Aleven & Kirsten Butcher Robust Learning in Visual/Verbal Problem Solving: Contiguity, Integrated Hints, and Elaborated Explanations.
Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.
Sandra GB Iturbides, M.Ed. Maritza Abreu, M.Ed..  PLEASE TURN OFF OR SILENCE YOUR CELL PHONES.  WRITE YOUR QUESTIONS ON POST IT NOTES AND PLACE ON PARKING.
The Background & Research
KLI & selecting appropriate instructional principles Ken Koedinger 1.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
Human-Computer Interaction Design process Task and User Characteristics Guidelines Evaluation ISE
Intentional - Purposeful - Explicit NOT SCRIPT Don’t need more prescription but more precision. Precision requires: 1.Teachers know students 2.Teachers.
1 Scaffolding self-regulated learning and metacognition – Implications for the design of computer-based scaffolds Instructor: Chen, Ming-Puu Presenter:
© TNTP 2013 ACE Observer Training Hill – Special Education Guest Facilitator Rebecca Cutler For observers new to TNTP and the ACE Instructional Framework.
ORGANIZATION BEGINNING SOCIAL COMMUNICATION MIDDLE SCHOOL: LESSON FOUR.
PDP Framework P = Pre-listening D = During-listening P= Post-listening.
February 19, 2013 EXPLICIT INSTRUCTION.  After achieving a working knowledge and components of explicit instruction, teachers will self-assess their.
PRACTICAL GOAL SETTING ADVANCED SOCIAL COMMUNICATION MIDDLE SCHOOL: LESSON THREE.
Personalised approach “There is clear evidence that assessment impacts on students by helping them to make exceptional progress” “Assessment is focused.
Marking to improve student outcomes. Marking and feedback – are they the same?  Marking is the annotating of a piece of written work, using words, symbols.
Enhancing Mathematical Learning through Talk June 4 th 2014 : Session 1.
Hello world!.
The Role of Teachers and Technology in Assessing the CCSS Speaking and
Teaching for Learning Foundations
Does Learning from Examples Improve Tutored Problem Solving?
Learning and Teaching Principles
Teaching with Instructional Software
Instructional Leadership Ted Zigler & Kristall Day June 20, 2018
Tim Strode and Bridget O’Leary October 2018
Vincent Aleven & Kirsten Butcher
JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY JEOPARODY.
Section VI: Comprehension
Julie Booth, Robert Siegler, Ken Koedinger & Bethany Rittle-Johnson
Peer and Self Assessment: A Guide
Presentation transcript:

circle Empirical Methods for Dialogs, June Some Goals for Evaluating Dialogue Systems Kenneth R. Koedinger Human-Computer Interaction Carnegie Mellon University

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 2 Ultimate goal of ITSs: Enhance student learning Many things need to be right to achieve better student learning –Natural language dialogue is one more tough item to get wrong! Not all things we think we need to get right actually have to be right –Our expectations for our systems are sometimes misplaced –Example: Remember John Self's advice “Don't diagnose what you can't fix!” Bottom line: Evaluate or die! (waste time)

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 3 Kinds of evaluation questions Different dependent measures (outcome variables) –Student –System Different independent measures (explanatory variables) –Use natural language or not –Use dialogs or not –Have students explain or not –Lots of others...

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 4 Student outcome measures Goal: Measure system's effectiveness by measuring its effect on students Kinds of measures –% correct, latency, attitudes/beliefs What is measured? –Student learning in real settings (ultimate goal) –Student learning in lab (proximal goal) –Student performance (proximal goal?)

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 5 System outcome measures Goal: Measure whether system achieves what designer thinks will be effective Kinds of measures: %correct, efficiency What is measured? –Next slide...

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 6 System outcome measures 2 Natural language understanding Is input "understood” (in correct category)? –compare to gold standard, human raters Natural language generation Are system responses “good”? –have humans rate responses –compare responses to human tutor’s –(observe subsequent student performance) Dialog (not derivable from above) Are turns coherent? Reflect a plan? Adaptive? –Compare with human dialog

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 7 Toward Instructional Dialog “Design Theorems” We need: "design theorems" (really heuristics) that relate intermediate measures with ultimate measures Non-theorem: If system improves performance, then it improves learning Theorem(?): If system improves "cognitively engaged" performance, then it improves learning Example conjectures: –If the dialog is better, then learning is improved –If the NLU is better, then learning is improved –If NLG is better, then better learning

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 8 “Proofs” of Instructional Dialog Effectiveness Want smaller theorems that can be chained: –If NLU is better, then better responses –If better responses, then better cognitively engaged (CE) performance –If better CE performance, then better learning Need experiments to establish 1. Certainty values for each theorem 2. Applicability conditions (more if-parts) Then: Compose them to create “proofs”

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 9 What are your favorite conjectures? Examples from above NLU -> responses NLG -> responses responses -> ce performance ce performance -> learning NLU -> dialog NLG -> dialog dialog -> learning Others? [by audience!] Detailed, non-directive feedback -> self-explain Self-explain -> learning Dialog games -> learning % of info provided by student -> learning Coherent dialog -> learning Planning -> coherent dialog Collaboration -> learning Resolve peer disputes -> collaboration Student initiative (opening a new dialog game) -> learning timely feedback -> learning Adaptive -> dialog More natural (e.g., speech, gesture) -> dialog Better affect -> Prob solved/unit time -> learning Read “X->Y” as “If X is better, then better Y”

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 10 Empirical support for any of these conjectures? Refs? Brainstorm if time permits Here: Dialog -> learning: Heffernan, Siler Dialog -> persistance: Heffernan Emotional scaffolding -> motivation -> persistance: Aist Response -> ce performance: Aleven Others: Self-explanation -> learning: Aleven, Conati, Renkl, … Less tutor explanation -> learning: Chi,...

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 11 The “mimic human tutoring” conjecture If the system is more like a human tutor, it yields better student learning –Always true? What are the counter-examples? –What if-parts can be added to make more accurate conjectures? –What features of human tutorial dialog lead to better learning? –What features of human tutorial dialog do not lead to better learning?

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 12 Example features of human dialog In natural language Highly interactive Responsive to content of student turns –Deep or surface diagnosis? Provide correctness feedback –Immediate or delayed? Provide detailed explanations –Or is socratic (pumping for student knowledge)? –Or prompts for self-explanations? Others? Are these effective?

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 13 What are the key questions? What questions should empirical methods for tutorial dialogue systems be addressing? Where should we place our efforts? What will give us the biggest “bang for the buck”? Your ideas?

circle Empirical Methods for Dialogs, June 2002Ken Koedinger 14 END