Download presentation
Presentation is loading. Please wait.
1
Student simulation and evaluation DOD meeting Hua Ai (hua@cs.pitt.edu) 03/03/2006
2
2 Outline Motivations Backgrounds Corpus Student Simulation Model Comparisons Conclusions & Future Work
3
3 Motivations For larger corpus Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically Best strategy may often not even be present in small dataset For cheaper corpus Human subjects are expensive
4
4 Simulated User Dialog Manager Strategy Reinforcement Learning Dialog Corpus Simulation models Strategy learning using a simulated user (Schatzmann et al., 2005)
5
5 Backgrounds (1) Education community Focusing on changes of student’s inner- brain knowledge representation forms Usually not dialogue based Simulated students for (Venlehn et al., 1994) tutor training Collaborative learning
6
6 Backgrounds (2) Dialogue community Focusing on interactions and dialogue behaviors Simulated users have limited actions to take (Schatzmann et al., 2005) Simulating on DA level
7
7 Corpus (1) Spoken dialogue physics tutor (ITSPOKE)
8
8 Corpus (2) Tutoring procedure (T) Question (S) Answer Dialogue (T) Q (S) A … Essay revision Dialogue (T) Question (S) Answer Dialogue (T) Q (S) A … Essay revision Dialogue … 5 problems
9
9 Corpus (3) Tutor’s behaviors Defined in KCD (Knowledge Construction Dialogues) Correct Incorrect/ Partially Correct
10
10 Corpus (4) #dialogues stuWordstuTurntutorWordtutorTurn f03100avg57.1623.351256.9229.64 (Synthesized) stdev45.5763817.44334849.819519.76351 05syn136avg91.096330.785191655.46738.06667 (Synthesized) stdev53.8293114.42551757.874416.32469 05pre135avg87.3455930.117651597.20637.33088 (pre- recorded) stdev55.4800416.96972832.984518.20096 f03:s05 Different groups of subjects
11
11 Simulation Models (1) Simulating on word level Student’s have more complex behaviors DA info alone isn’t enough for the system Two models trained on two corpus ProbCorrect Random f03 s05 03ProbCorrect 03Random 05ProbCorrect 05Random
12
12 Simulation Models (2) ProbCorrect Model Simulates average knowledge level of real students Simulate meaningful dialogue behaviors Random Model Non-sense As a contrast
13
13 ProbCorrect Model Real corpus question1 Answer1_1 (c) Answer1_2 (ic) Answer1_3 (ic) question2 Answer2_1 (c) Answer2_2 (ic) Candidate Ans: For question1 c:ic = 1:2 c: Answer1_1 ic: Answer1_2 Answer1_3 For question2 c:ic = 1:1 c: Answer2_1 ic Answer2_2 ProbCorrect Model: Question 1 Answer: 1)Choose to give a c/ic answer with the same average probability as real student 2)Randomly choose one answers from the corresponding answer set
14
14 HC03&05 Question1 Answer1_1 Answer1_2 Answer1_3 Answer1_4 Question2 Answer2_1 Answer2_2 Candidate Ans: 1) Answer1_1 2) Answer1_2 3) Answer1_3 4) Answer1_4 5) Answer2_1 6) Answer2_2 Big random Model: Question i: Answer: any of the 6 answers with the same probability (Regardless the question!) Random Model
15
15 Experiments Comparisons between real corpora Comparisons between real & simulated corpora Comparisons between simulated corpora
16
16 Evaluation metrics High-level dialog features Dialog style and cooperativeness Dialog Success Rate and Efficiency Learning Gains Real Corpora Comparisons (1)
17
17 High-level dialog features Real corpora comparisons (2)
18
18 Real corpora comparisons (3) Dialogue style features
19
19 Real corpora comparisons (3) Dialogue success rate
20
20 Real corpora comparisons (4) Learning gains features
21
21 Results Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005) Differences could be due to different user population
22
22 Real Vs Simulated Corpora Comparisons
23
23 Results (1) Most of the measurements are able to distinguish between Random and ProbCorrect model ProbCorrect model generates more realistic behaviors We can’t conclude on the power of these metrics since the two simulated corpus are really different
24
24 Results (2) Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small
25
25 Results (3) S05 variety > f03 variety 05probCorrect variety > 03probCorrect variety However, we don’t get significantly more varieties in the simulated corpus than the real ones Could be the computer tutor is simple (c/ic) We’re using the same candidate answer set
26
26 Results (4) ProbCorrect models trained on different real corpora are quite different The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus
27
27 Comparisons between simulated dialogues with different dialogue structure
28
28 Results Larger differences between the two simulated corpora in prob7 than in prob34 Dialogue structure of prob34 is more restricted The power of these simple metrics is restricted by the dialogue structure
29
29 Conclusions The simple measurements can distinguish between real corpora Different population simulated and real corpora To different extent simulated corpora Different models Trained on different corpora Limited to different Dialog structure
30
30 Future work Explore “deep” evaluation metrics Test simulated corpus on policy More simulation models More human features Emotion, learning Special cases Quick learners, slow learners
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.