Student simulation and evaluation DOD meeting Hua Ai 03/03/2006.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California Elena Messina.
Dialogue Policy Optimisation
Representing and Querying Correlated Tuples in Probabilistic Databases
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
How much data is enough? – Generating reliable policies w/MDP’s Joel Tetreault University of Pittsburgh LRDC July 14, 2006.
A Combinatorial Fusion Method for Feature Mining Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma Fordham University Presented by Gary Weiss.
Statistics for the Social Sciences
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
WebSTer: A Web-based Surgical Training System School of Computer Studies University of Leeds Nuha El-Khalili, Ken Brodlie and David Kessel
MIS 650 Knowledge Generation1 MIS 650 Generating Knowledge: Some Methodological Issues.
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.
Model Calibration and Model Validation
VR-ENGAGE: A Virtual Reality Educational Game that Incorporates Intelligence Maria Virvou, Constantinos Manos, George Katsionis, Kalliopi Tourtoglou Department.
Topics = Domain-Specific Concepts Online Physics Encyclopedia ‘Eric Weisstein's World of Physics’ Contains total 3040 terms including multi-word concepts.
Science Inquiry Minds-on Hands-on.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Unit 2: Engineering Design Process
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Modeling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters Kate Forbes-Riley.
Laws of Logic and Rules of Evidence Larry Knop Hamilton College.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Cooperative Learning in the Classroom
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 10, Slide 1 Chapter 10 Understanding Randomness.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 4, 2013.
1 Some Issues on Technology Use in the classroom Pertemuan Matakuliah: G0454/ Class Management and Education Media Tahun: 2006.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Participatory Approach. Background In 1960s, Paulo Freire developed a literacy program for peasants in Brazil. He started the dialogues about problems.
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
Game Theory, Social Interactions and Artificial Intelligence Supervisor: Philip Sterne Supervisee: John Richter.
The E ngineering Design Process Foundations of Technology The E ngineering Design Process © 2013 International Technology and Engineering Educators Association,
Capturing the requirements  Requirement: a feature of the system or a description of something the system is capable of doing in order to fulfill the.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.
Using Models Chapter 2-3. What is a model  Simplified representations of reality play a crucial role in economics.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
Understanding Randomness.  Many phenomena in the world are random: ◦ Nobody can guess the outcome before it happens. ◦ When we want things to be fair,
Student Learning Objectives (SLO) Resources for Science 1.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Chapter 10 Understanding Randomness. Why Be Random? What is it about chance outcomes being random that makes random selection seem fair? Two things: –
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Increasing Rigor in the Classroom Natalie Redman.
User Simulation for Spoken Dialogue Systems Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh.
Writing Learning Outcomes Best Practices. Do Now What is your process for writing learning objectives? How do you come up with the information?
The PLA Model: On the Combination of Product-Line Analyses 강태준.
Sofus A. Macskassy Fetch Technologies
Towards Emotion Prediction in Spoken Tutoring Dialogues
Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.
Understanding Randomness
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
CHAPTER 4 Designing Studies
Orientation and Training
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Presentation transcript:

Student simulation and evaluation DOD meeting Hua Ai 03/03/2006

2 Outline  Motivations  Backgrounds  Corpus  Student Simulation Model  Comparisons  Conclusions & Future Work

3 Motivations  For larger corpus  Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically  Best strategy may often not even be present in small dataset  For cheaper corpus  Human subjects are expensive

4 Simulated User Dialog Manager Strategy Reinforcement Learning Dialog Corpus Simulation models Strategy learning using a simulated user (Schatzmann et al., 2005)

5 Backgrounds (1)  Education community  Focusing on changes of student’s inner- brain knowledge representation forms  Usually not dialogue based  Simulated students for (Venlehn et al., 1994)  tutor training  Collaborative learning

6 Backgrounds (2)  Dialogue community  Focusing on interactions and dialogue behaviors  Simulated users have limited actions to take  (Schatzmann et al., 2005)  Simulating on DA level

7 Corpus (1)  Spoken dialogue physics tutor (ITSPOKE)

8 Corpus (2)  Tutoring procedure (T) Question (S) Answer Dialogue (T) Q (S) A … Essay revision Dialogue (T) Question (S) Answer Dialogue (T) Q (S) A … Essay revision Dialogue … 5 problems

9 Corpus (3)  Tutor’s behaviors  Defined in KCD (Knowledge Construction Dialogues) Correct Incorrect/ Partially Correct

10 Corpus (4) #dialogues stuWordstuTurntutorWordtutorTurn f03100avg (Synthesized) stdev syn136avg (Synthesized) stdev pre135avg (pre- recorded) stdev f03:s05 Different groups of subjects

11 Simulation Models (1)  Simulating on word level  Student’s have more complex behaviors  DA info alone isn’t enough for the system  Two models trained on two corpus ProbCorrect Random f03 s05 03ProbCorrect 03Random 05ProbCorrect 05Random

12 Simulation Models (2)  ProbCorrect Model  Simulates average knowledge level of real students  Simulate meaningful dialogue behaviors  Random Model  Non-sense  As a contrast

13 ProbCorrect Model Real corpus question1 Answer1_1 (c) Answer1_2 (ic) Answer1_3 (ic) question2 Answer2_1 (c) Answer2_2 (ic) Candidate Ans: For question1 c:ic = 1:2 c: Answer1_1 ic: Answer1_2 Answer1_3 For question2 c:ic = 1:1 c: Answer2_1 ic Answer2_2 ProbCorrect Model: Question 1 Answer: 1)Choose to give a c/ic answer with the same average probability as real student 2)Randomly choose one answers from the corresponding answer set

14 HC03&05 Question1 Answer1_1 Answer1_2 Answer1_3 Answer1_4 Question2 Answer2_1 Answer2_2 Candidate Ans: 1) Answer1_1 2) Answer1_2 3) Answer1_3 4) Answer1_4 5) Answer2_1 6) Answer2_2 Big random Model: Question i: Answer: any of the 6 answers with the same probability (Regardless the question!) Random Model

15 Experiments  Comparisons between real corpora  Comparisons between real & simulated corpora  Comparisons between simulated corpora

16  Evaluation metrics  High-level dialog features  Dialog style and cooperativeness  Dialog Success Rate and Efficiency  Learning Gains Real Corpora Comparisons (1)

17  High-level dialog features Real corpora comparisons (2)

18 Real corpora comparisons (3)  Dialogue style features

19 Real corpora comparisons (3)  Dialogue success rate

20 Real corpora comparisons (4)  Learning gains features

21 Results  Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005)  Differences could be due to different user population

22 Real Vs Simulated Corpora Comparisons

23 Results (1)  Most of the measurements are able to distinguish between Random and ProbCorrect model  ProbCorrect model generates more realistic behaviors  We can’t conclude on the power of these metrics since the two simulated corpus are really different

24 Results (2)  Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear  We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small

25 Results (3)  S05 variety > f03 variety  05probCorrect variety > 03probCorrect variety  However, we don’t get significantly more varieties in the simulated corpus than the real ones  Could be the computer tutor is simple (c/ic)  We’re using the same candidate answer set

26 Results (4)  ProbCorrect models trained on different real corpora are quite different  The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus

27 Comparisons between simulated dialogues with different dialogue structure

28 Results  Larger differences between the two simulated corpora in prob7 than in prob34  Dialogue structure of prob34 is more restricted  The power of these simple metrics is restricted by the dialogue structure

29 Conclusions  The simple measurements can distinguish between  real corpora  Different population  simulated and real corpora  To different extent  simulated corpora  Different models  Trained on different corpora  Limited to different Dialog structure

30 Future work  Explore “deep” evaluation metrics  Test simulated corpus on policy  More simulation models  More human features  Emotion, learning  Special cases  Quick learners, slow learners