Download presentation
Presentation is loading. Please wait.
1
Evaluation Metrics February 12, 2010
2
A break in the usual order of things… Today’s Probing Question will be discussed later in the class rather than at the beginning Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester – You really engaged with the implications, both at an educational level and a policy level
3
Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments
4
Starting from the simplest metric… Pre-test Post-test Of what the student (hopefully) learned during the learning intervention
5
Post-test What is “SQUIRREL” in Japanese? – People named Adam not allowed to answer
6
Why would you want to do a post-test?
7
Why would you want to do a pre-test?
8
Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?)
9
Al Corbett did not use pre-tests for some research on the LISP tutor, he just filtered participants who had ever used LISP or Scheme before, under the logic that LISP was so different from other programming paradigms that there would essentially be no overlap What do you think?
10
Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) A dangerous decision, in my opinion Singley & Anderson (1989), and many others, find that there can be surprising and unexpected degrees of transfer
11
Comments? Questions?
12
How can you mess up your tests? I’m not asking about ways to do a better test – E.g. Bransford & Schwartz would say PFL is better than a standard pre-test of knowledge But things you could do that will result in useless data
13
How can you mess up your tests? Multiple choice with terrible alternatives What is the capital of Tajikstan? – Raise your hand if you know the answer
14
How can you mess up your tests? Multiple choice with terrible alternatives What is the capital of Tajikstan? 1.Boston 2.Worcester 3.Tokyo 4.Dushanbe
15
How can you mess up your tests? Using the same items for both pre-test and post-test for any given student “Gee, this looks familiar…”
16
How can you mess up your tests? Using pre-tests and post-tests of different difficulty Pre-test: What is the capital of Tajikstan? Post-test: What is the capital of Japan? Look how great my geography tutor is!
17
How can you mess up your tests? Using pre-tests and post-tests of different difficulty (Even worse if you put the easy items on the pre- test and the hard items on the post-test!) The most common approach is to counter- balance the tests – Half of students: Pre-test Form A, Post-test Form B – Half of students: Pre-test Form B, Post-test Form A
18
How can you mess up your tests? Letting students “help” each other during the tests – Raise your hand if you’ve ever seen this
19
How can you mess up your tests? Letting the teacher give a student the answer during the post-test – Raise your hand if you’ve ever seen this
20
How can you mess up your tests? Not communicating that an online test is not a tutor – “Hey, how come this tutor doesn’t have any feedback?”
21
Comments? Questions?
22
Pre-Post Comparison (4 ways) t-test on Post-test - Pre-test for each group Advantages? Disadvantages?
23
Pre-Post Comparison (4 ways) t-test on Post-test – Pre-test for each group Advantages? Disadvantages? – Vulnerable to ceiling effects Test Score Pre Post 100% 0%
24
Pre-Post Comparison (4 ways) t-test on (Post-test – Pre-test)/(1-Pre-test) for each group Advantages? Disadvantages?
25
Pre-Post Comparison (4 ways) t-test on (Post-test – Pre-test)/(1-Pre-test) for each group – Accounts for high performers… – But has weird effects if anyone does worse on post-test than pre-test – Pre = 20%, Post = 10%, Res = -50% – Pre = 100%, Post = 90%, Res = -∞%
26
Pre-Post Comparison (4 ways) Regression set up as Post-test = Pre-test + Condition + – allows you to find mean difference in conditions while controlling for each student’s pre-test score – Advantages? Disadvantages?
27
Pre-Post Comparison (4 ways) Regression set up as Post-test = Pre-test + Condition + – allows you to find mean difference in conditions while controlling for each student’s pre-test score You need to check that condition differences are not actually pre-test differences between conditions using Pre-test = Condition +
28
Pre-Post Comparison (4 ways) Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control Advantages? Disadvantages?
29
Pre-Post Comparison (4 ways) Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control – How big is the difference between groups? (not just how likely is it, if chance was all there was)
30
Comments? Questions?
31
(Some Types of) Contents of Tests Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
32
Types I believe you already know Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
33
Complete Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 Draw a scatterplot of this fake data
34
Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What variables would you use to draw a scatterplot of this data?
35
Have them turn in their answer (Or go to the next webpage)
36
Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What is a good scale for Population?
37
Have them turn in their answer (Or go to the next webpage)
38
Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What is a good upper and lower bound for Population?
39
Have them turn in their answer (Or go to the next webpage)
40
Decomposed Problem-Solving Label the axes with values (Have Population go from 0 to 700 with scale of 50, and Number of Restaurants go from 0 to 80 with scale of 10) Population Number of Restaurants
41
And so on…
42
Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
43
Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
44
Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
45
Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
46
Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving
47
“Contingent Correctness” Grading Some researchers try to deal with the issue of partial correctness in complete problem- solving by grading contingent correctness – i.e. If step A is wrong, but step B is correct based on step A, count step B as correct E.g. if the student used the wrong variable, but plotted the points correctly, the point plotting is contingently correct – Time-consuming and tricky to do
48
Comments? Questions?
49
Other measures
50
Learning Efficiency Perhaps two conditions have equal learning, but one condition takes significantly more time than another condition Advantages? Disadvantages?
51
Inferential Challenge with Learning Efficiency How do you know that the slower condition would not have been equally effective, if you’d just stopped at some earlier point? Usually addressed by then running a time- controlled study of some sort
52
Retention Re-doing the post-test some number of hours, days, weeks, or even years later
53
Retention Re-doing the post-test some number of hours, days, weeks, or even years later When might you want to do a retention test?
54
Retention Re-doing the post-test some number of hours, days, weeks, or even years later When might you want to do a retention test? – Does improvement maintain? – Some results may only manifest sometime after intervention (e.g. Meta-cognitive training) – Shallow learning may disappear more quickly than “robust learning” – Different interventions may have different results at post- test and retention post-test (e.g. individual and collaborative learning)
55
Retention Re-doing the post-test some number of hours, days, weeks, or even years later What are some situations in which retention tests would not be beneficial?
56
Retention Re-doing the post-test some number of hours, days, weeks, or even years later What are some situations in which retention tests would not be beneficial? – Waiting too long and getting a floor effect – Other learning during time interval – Time-consuming to conduct
57
Transfer Using items that involve applying the skills or concepts learned in a different situation/ to a problem involving potentially different skills
58
Near Transfer.vs. Far Transfer A little fuzzy exactly where the line is Some theoretical accounts (e.g. Royer, 1979) say that the difference is in how similar the performance situations/stimuli are Singley & Anderson (1989) would ask how similar the productions are that govern successful performance in the two situations – How many additional productions or modifications of productions are needed for the transfer task?
59
Example
60
Original Learning 3x + 2x = 5 7x + 4x = 22 4x + 5x = 3 9x – 5x = 16
61
Near Transfer 3x + x = 4 x + 5x = 18
62
Near Transfer 16 = 6x + 2x
63
Near Transfer 5x = 18 – 4x
64
Near Transfer 9h + 2h = 77
65
Far Transfer You bought 3 slices of pizza, and your brother bought 2 slices of pizza. The bill came to $10. How much does a slice of pizza cost?
66
Far Transfer 3x + y = 8 8x + 4y = 20
67
Advantages/Disadvantages?
68
Advantages/Disadvantages Tests conceptual knowledge as much/more than procedural knowledge (a good thing IMHO) Can be used to study whether skills learned were over-generalized or under-generalized Chance of floor effect if your transfer task is too far away – Hard to come up with transfer tasks that are neither too near or far! – Requires piloting
69
Preparation for Future Learning Can a student learn a new skill or concept better, based on their previous experience?
70
Preparation for Future Learning What might be some ways to measure the better learning on the new task?
71
Preparation for Future Learning What might be some ways to measure the better learning on the new task? – Better performance on new task – Faster learning on new task (“Accelerated future learning”)
72
Advantages/Disadvantages of PFL
73
Really gets at not just skill, but sophisticated conceptual understanding High vulnerability to second learning task – If the task is too easy or too hard, you won’t learn anything – Really requires understanding your domain Most people aren’t good at learning really fast – Requires running longer, more complex study OR – Picking relatively easy second learning tasks
74
Comments? Questions?
75
Which measure should you use? Easy to say “all of ‘em!” Hard to actually do “all of ‘em” and code the data, in a reasonable amount of time
76
“Robust Learning” The “Robust Learning” movement argues that we should test “robust learning”, which is learning that – is retained – can transfer – prepares students for future learning (VanLehn, 2005; Corbett et al, in preparation)
77
“Robust Learning” Other researchers believe that these are distinct ways that learning can be “robust”, and that there is no single “robust learning” construct – E.g. you can remember something forever but be unable to transfer it – E.g. you can understand something flexibly and be prepared for future learning, but only for a couple of weeks before you forget it What do you think?
78
Hoping to find out… Albert Corbett, a proponent of “robust learning”, and Ryan Baker, a dis-believer, have an ongoing grant to test which model is more accurate
79
Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments
80
Probing Question for Friday, February 12 Should state/national/international assessments of learning (like the MCAS) have Preparation for Future Learning items? Why or why not?
81
Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments
82
Assignment #3 Any questions?
83
Assignment #4 Will be handed out by Monday noon (when Assignment #3 is due)
84
Assignment #1 and #2 grading Will be completed by early next week Thanks for your patience
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.