Evaluation Metrics February 12, 2010. A break in the usual order of things… Today’s Probing Question will be discussed later in the class rather than.

Slides:



Advertisements
Similar presentations
Parts of a Lesson Plan Any format that works for you and your JTEs is ok… BUT! Here are some ideas that might help you set up your LP format. The ALTs.
Advertisements

The Assessment Toolbox Linda Suskie Middle States Commission on Higher Education AB Tech February 2005.
Think-Aloud Protocols February 5, Today’s Class Probing Question Think-Aloud Protocols Assignments.
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
Increasing your confidence that you really found what you think you found. Reliability and Validity.
1 Quick Write: Take about 10 minutes and address the following questions about Assessment: What is it? What is it for? What is the difference between formal.
The “Highly Effective” Early Childhood Classroom Environment
HUDM4122 Probability and Statistical Inference March 30, 2015.
Evaluation Metrics II February 12, Today’s Class Evaluation Metrics I Last Week’s Probing Question Evaluation Metrics II Next Friday’s Probing Question.
Tradeoffs Between Immediate and Future Learning: Feedback in a Fraction Addition Tutor Eliane Stampfer EARLI SIG 6&7 September 13,
1 RUNNING a CLASS (2) Pertemuan Matakuliah: G0454/Class Management & Education Media Tahun: 2006.
Experimental Design.
Chapter 41 Training for Organizations Research Skills.
“Muddy point” one-minute papers
Contrasting Examples in Mathematics Lessons Support Flexible and Transferable Knowledge Bethany Rittle-Johnson Vanderbilt University Jon Star Michigan.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Perceptions of the Role of Feedback in Supporting 1 st Yr Learning Jon Scott, Ruth Bevan, Jo Badge & Alan Cann School of Biological Sciences.
Lessons Learned about Assessing Quantitative Literacy MAA PREP Workshop: Creating and Strengthening Interdisciplinary Programs in Quantitative Literacy.
Creativity? Is that what they’re after?. Creativity to go … …Radical ? …Regular ? …Specialist ? …General ?
Evaluating your ideas and Reading the Literature Psych 231: Research Methods in Psychology.
» Teaching an online class, what takes up most of your time?
Training Math Tutors To Tutor Developmental Math Students
Research Methods for the Learning Sciences Ken Koedinger Phil Pavlik TA: Ben Shih Lecture 3 Experimental Design.
Discussion examples Andrea Zhok.
Test Preparation Strategies
Lesson 4: Percentage of Amounts.
Time Management.
Effective Questioning in the classroom
UNIT 9. CLIL THINKING SKILLS
Creating a Positive Classroom Environment
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
How Do I Find a Job to Apply to?
Edexcel GCSE History Guidance to Students: The Controlled Assessment
New Teacher Preparation: Compass Teacher Evaluation
Extensive Reading Research in Action
Final Exams!. Where do I go/What do I do? After break, report to the gym. You will need to find your row and seat numbers. See below for Monday’s seating.
Measured Progress ©2011 ASDN Webinar Series Spring 2013 Session Four March 27, 2013 New Alaska State Standards for Math: Connecting Content with Classroom.
Effective Teaching of Health Reporting: Lectures and More Barbara Gastel, MD, MPH Texas A&M University Train the Trainer Workshop: Health Reporting for.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
Designing in and designing out: strategies for deterring student plagiarism through course and task design Jude Carroll, Oxford Brookes University 22 April.
Compound Interest ©Dr. B. C. Paul 2001 revisions 2008, 2011 Note – The subject covered in these slides is considered to be “common knowledge” to those.
Classroom Interactions in Science and Math # 05: Classroom Norms.
Reflection helps you articulate and think about your processes for communication. Reflection gives you an opportunity to consider your use of rhetorical.
Advanced Quantitative Research ED 602. You know, Mary Stevens has really blossomed this year. She is doing much better. Actually, this whole fifth grade.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Assessment and Testing
Experiments. The essential feature of the strategy of experimental research is that you… Compare two or more situations (e.g., schools) that are as similar.
Communicating Ocean Sciences to Informal Audiences (COSIA) Session 8 Inquiring Minds and Promoting Discussion.
ASC 282: Peer Tutor Training The 12-Step Tutor Cycle: Beginning Steps (Steps 1-4)
Course Enhancement Module on Evidence-Based Reading Instruction K-5 Collaboration for Effective Educator Development, Accountability, and Reform H325A
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
1 Psych 5510/6510 Chapter 13: ANCOVA: Models with Continuous and Categorical Predictors Part 3: Within a Correlational Design Spring, 2009.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Outline of Today’s Discussion 1.Introduction to Factorial Designs 2.Analysis of Factorial Designs 3.Hypotheses For Factorial Designs 4.Eta Squared and.
Effective mathematics instruction:  foster positive mathematical attitudes;  focus on conceptual understanding ;  includes students as active participants.
Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.Introduction To Basic Ratios 3.Basic Ratios In Excel 4.Cumulative.
LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
Questioning as Formative Assessment: GRECC Math Alliance February 4 th - 7 th, 2008.
Friday, March 11, 2016 Welcome to the Science Extravaganza with Mr. Fireng.
Unpacking each and every strategy! THE MATHEMATICIAN’S TOOLBOX.
Independent Samples ANOVA. Outline of Today’s Discussion 1.Independent Samples ANOVA: A Conceptual Introduction 2.The Equal Variance Assumption 3.Cumulative.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Assessment in Education ~ What teachers need to know.
Using Data to Improve Student Achievement Summer 2006 Preschool CSDC.
The Role of Prior Knowledge in the Development of Strategy Flexibility: The Case of Computational Estimation Jon R. Star Harvard University Bethany Rittle-Johnson.
Big Data, Education, and Society
Effective Questioning
Constructing a Test We now know what makes a good question:
Presentation transcript:

Evaluation Metrics February 12, 2010

A break in the usual order of things… Today’s Probing Question will be discussed later in the class rather than at the beginning Your responses to this (those of you who responded) were the most thoughtful ones I’ve seen all semester – You really engaged with the implications, both at an educational level and a policy level

Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments

Starting from the simplest metric… Pre-test Post-test Of what the student (hopefully) learned during the learning intervention

Post-test What is “SQUIRREL” in Japanese? – People named Adam not allowed to answer

Why would you want to do a post-test?

Why would you want to do a pre-test?

Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?)

Al Corbett did not use pre-tests for some research on the LISP tutor, he just filtered participants who had ever used LISP or Scheme before, under the logic that LISP was so different from other programming paradigms that there would essentially be no overlap What do you think?

Is there ever a case where you don’t need to do a pre-test? (or shouldn’t do one?) A dangerous decision, in my opinion Singley & Anderson (1989), and many others, find that there can be surprising and unexpected degrees of transfer

Comments? Questions?

How can you mess up your tests? I’m not asking about ways to do a better test – E.g. Bransford & Schwartz would say PFL is better than a standard pre-test of knowledge But things you could do that will result in useless data

How can you mess up your tests? Multiple choice with terrible alternatives What is the capital of Tajikstan? – Raise your hand if you know the answer

How can you mess up your tests? Multiple choice with terrible alternatives What is the capital of Tajikstan? 1.Boston 2.Worcester 3.Tokyo 4.Dushanbe

How can you mess up your tests? Using the same items for both pre-test and post-test for any given student “Gee, this looks familiar…”

How can you mess up your tests? Using pre-tests and post-tests of different difficulty Pre-test: What is the capital of Tajikstan? Post-test: What is the capital of Japan? Look how great my geography tutor is!

How can you mess up your tests? Using pre-tests and post-tests of different difficulty (Even worse if you put the easy items on the pre- test and the hard items on the post-test!) The most common approach is to counter- balance the tests – Half of students: Pre-test Form A, Post-test Form B – Half of students: Pre-test Form B, Post-test Form A

How can you mess up your tests? Letting students “help” each other during the tests – Raise your hand if you’ve ever seen this

How can you mess up your tests? Letting the teacher give a student the answer during the post-test – Raise your hand if you’ve ever seen this

How can you mess up your tests? Not communicating that an online test is not a tutor – “Hey, how come this tutor doesn’t have any feedback?”

Comments? Questions?

Pre-Post Comparison (4 ways) t-test on Post-test - Pre-test for each group Advantages? Disadvantages?

Pre-Post Comparison (4 ways) t-test on Post-test – Pre-test for each group Advantages? Disadvantages? – Vulnerable to ceiling effects Test Score Pre Post 100% 0%

Pre-Post Comparison (4 ways) t-test on (Post-test – Pre-test)/(1-Pre-test) for each group Advantages? Disadvantages?

Pre-Post Comparison (4 ways) t-test on (Post-test – Pre-test)/(1-Pre-test) for each group – Accounts for high performers… – But has weird effects if anyone does worse on post-test than pre-test – Pre = 20%, Post = 10%, Res = -50% – Pre = 100%, Post = 90%, Res = -∞%

Pre-Post Comparison (4 ways) Regression set up as Post-test =   Pre-test +   Condition +   – allows you to find mean difference in conditions while controlling for each student’s pre-test score – Advantages? Disadvantages?

Pre-Post Comparison (4 ways) Regression set up as Post-test =   Pre-test +   Condition +   – allows you to find mean difference in conditions while controlling for each student’s pre-test score You need to check that condition differences are not actually pre-test differences between conditions using Pre-test =   Condition +  

Pre-Post Comparison (4 ways) Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control Advantages? Disadvantages?

Pre-Post Comparison (4 ways) Effect Size: (Mean Gain in Experimental – Mean Gain in Control)/ St Dev in Control – How big is the difference between groups? (not just how likely is it, if chance was all there was)

Comments? Questions?

(Some Types of) Contents of Tests Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Types I believe you already know Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Complete Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 Draw a scatterplot of this fake data

Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What variables would you use to draw a scatterplot of this data?

Have them turn in their answer (Or go to the next webpage)

Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What is a good scale for Population?

Have them turn in their answer (Or go to the next webpage)

Decomposed Problem-Solving CityPopulation (in 1000) Number of Brazilian Restaurants Worcester1554 Fitchburg650 Boston6506 Providence1500 Springfield701 Manchester1302 Hartford2204 New Haven1200 New Bedford553 Arapiraca, Brazil14080 What is a good upper and lower bound for Population?

Have them turn in their answer (Or go to the next webpage)

Decomposed Problem-Solving Label the axes with values (Have Population go from 0 to 700 with scale of 50, and Number of Restaurants go from 0 to 80 with scale of 10) Population Number of Restaurants

And so on…

Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

Advantages/Disadvantages? Multiple-choice Fill-in-the-blank Essay Complete Problem-solving Decomposed Problem-solving

“Contingent Correctness” Grading Some researchers try to deal with the issue of partial correctness in complete problem- solving by grading contingent correctness – i.e. If step A is wrong, but step B is correct based on step A, count step B as correct E.g. if the student used the wrong variable, but plotted the points correctly, the point plotting is contingently correct – Time-consuming and tricky to do

Comments? Questions?

Other measures

Learning Efficiency Perhaps two conditions have equal learning, but one condition takes significantly more time than another condition Advantages? Disadvantages?

Inferential Challenge with Learning Efficiency How do you know that the slower condition would not have been equally effective, if you’d just stopped at some earlier point? Usually addressed by then running a time- controlled study of some sort

Retention Re-doing the post-test some number of hours, days, weeks, or even years later

Retention Re-doing the post-test some number of hours, days, weeks, or even years later When might you want to do a retention test?

Retention Re-doing the post-test some number of hours, days, weeks, or even years later When might you want to do a retention test? – Does improvement maintain? – Some results may only manifest sometime after intervention (e.g. Meta-cognitive training) – Shallow learning may disappear more quickly than “robust learning” – Different interventions may have different results at post- test and retention post-test (e.g. individual and collaborative learning)

Retention Re-doing the post-test some number of hours, days, weeks, or even years later What are some situations in which retention tests would not be beneficial?

Retention Re-doing the post-test some number of hours, days, weeks, or even years later What are some situations in which retention tests would not be beneficial? – Waiting too long and getting a floor effect – Other learning during time interval – Time-consuming to conduct

Transfer Using items that involve applying the skills or concepts learned in a different situation/ to a problem involving potentially different skills

Near Transfer.vs. Far Transfer A little fuzzy exactly where the line is Some theoretical accounts (e.g. Royer, 1979) say that the difference is in how similar the performance situations/stimuli are Singley & Anderson (1989) would ask how similar the productions are that govern successful performance in the two situations – How many additional productions or modifications of productions are needed for the transfer task?

Example

Original Learning 3x + 2x = 5 7x + 4x = 22 4x + 5x = 3 9x – 5x = 16

Near Transfer 3x + x = 4 x + 5x = 18

Near Transfer 16 = 6x + 2x

Near Transfer 5x = 18 – 4x

Near Transfer 9h + 2h = 77

Far Transfer You bought 3 slices of pizza, and your brother bought 2 slices of pizza. The bill came to $10. How much does a slice of pizza cost?

Far Transfer 3x + y = 8 8x + 4y = 20

Advantages/Disadvantages?

Advantages/Disadvantages Tests conceptual knowledge as much/more than procedural knowledge (a good thing IMHO) Can be used to study whether skills learned were over-generalized or under-generalized Chance of floor effect if your transfer task is too far away – Hard to come up with transfer tasks that are neither too near or far! – Requires piloting

Preparation for Future Learning Can a student learn a new skill or concept better, based on their previous experience?

Preparation for Future Learning What might be some ways to measure the better learning on the new task?

Preparation for Future Learning What might be some ways to measure the better learning on the new task? – Better performance on new task – Faster learning on new task (“Accelerated future learning”)

Advantages/Disadvantages of PFL

Really gets at not just skill, but sophisticated conceptual understanding High vulnerability to second learning task – If the task is too easy or too hard, you won’t learn anything – Really requires understanding your domain Most people aren’t good at learning really fast – Requires running longer, more complex study OR – Picking relatively easy second learning tasks

Comments? Questions?

Which measure should you use? Easy to say “all of ‘em!” Hard to actually do “all of ‘em” and code the data, in a reasonable amount of time

“Robust Learning” The “Robust Learning” movement argues that we should test “robust learning”, which is learning that – is retained – can transfer – prepares students for future learning (VanLehn, 2005; Corbett et al, in preparation)

“Robust Learning” Other researchers believe that these are distinct ways that learning can be “robust”, and that there is no single “robust learning” construct – E.g. you can remember something forever but be unable to transfer it – E.g. you can understand something flexibly and be prepared for future learning, but only for a couple of weeks before you forget it What do you think?

Hoping to find out… Albert Corbett, a proponent of “robust learning”, and Ryan Baker, a dis-believer, have an ongoing grant to test which model is more accurate

Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments

Probing Question for Friday, February 12 Should state/national/international assessments of learning (like the MCAS) have Preparation for Future Learning items? Why or why not?

Today’s Class Evaluation Metrics Last Wednesday’s Probing Question Assignments

Assignment #3 Any questions?

Assignment #4 Will be handed out by Monday noon (when Assignment #3 is due)

Assignment #1 and #2 grading Will be completed by early next week Thanks for your patience