Consistency in testing

Slides:



Advertisements
Similar presentations
Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.
Advertisements

Topics: Quality of Measurements
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
The Department of Psychology
Chapter 4 – Reliability Observed Scores and True Scores Error
1 Reliability in Scales Reliability is a question of consistency do we get the same numbers on repeated measurements? Low reliability: reaction time High.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Lesson Six Reliability.
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Explaining Cronbach’s Alpha
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
-生醫統計期末報告- Reliability 學生 : 劉佩昀 學號 : 授課老師 : 蔡章仁.
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability and Validity
A quick introduction to the analysis of questionnaire data John Richardson.
Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.
Session 3 Normal Distribution Scores Reliability.
2.3. Measures of Dispersion (Variation):
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Research Methods in MIS
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Classical Test Theory By ____________________. What is CCT?
Validity and Reliability
Foundations of Educational Measurement
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Reliability Lesson Six
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Chapter 6 Norm-Referenced Reliability and Validity.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Professor Jim Tognolini
Reliability Analysis.
Classical Test Theory Margaret Wu.
Reliability & Validity
Calculating Reliability of Quantitative Measures
PSY 614 Instructor: Emily Bullock, Ph.D.
Using statistics to evaluate your test Gerard Seinhorst
By ____________________
Reliability Analysis.
The first test of validity
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Consistency in testing Reliability Consistency in testing

Types of variance Meaningful variance Error variance Variance between test takers which reflects differences in the ability or skill being measured Error variance Variance between test takers which is caused by factors other than differences in the ability or skill being measured Test developers as ‘variance chasers’

Sources of error variance Measurement error Environment Administration procedures Scoring procedures Examinee differences Test and items Remember, OS = TS + E

Estimating reliability for NRTs Are the test scores reliable over time? Would a student get the same score if tested tomorrow? Are the test scores reliable over different forms of the same test? Would the student get the same score if given a different form of the test? Is the test internally consistent?

Reliability coefficient (rxx) Range: 0.0 (totally unreliable test) to 1.0 (perfectly reliable test) Reliability coefficients are estimates of the systematic variance in the test scores lower reliability coefficient = greater measurement error in the test score

Test-retest reliability Same students take test twice Calculate reliability (Pearson’s r) Interpret r as reliability (conservative) Problems Logistically difficult Learning might take place between tests

Equivalent forms reliability Same students take parallel forms of test Calculate correlation Problems Creating parallel forms can be tricky Logistical difficulty

University of Michigan English Placement Test (University of Michigan English Placement Test Examiner’s Manual)

Internal consistency reliability Calculating the reliability from a single administration of a test Commonly reported Split-half Cronbach alpha K-R20 K-R21 Calculated automatically by many statistical software packages

Split-half reliability The test is split in half (e.g., odd / even) creating “equivalent forms” The two “forms” are correlated with each other The correlation coefficient is adjusted to reflect the entire test length Spearman-Brown Prophecy formula

Calculating split half reliability ID Q1 Q2 Q3 Q4 Q5 Q6 Odd Even 1 2 3 4 5 6 Odd Mean 1.83 2 1 SD 0.75 1 3 3 2 Even 2 Mean 1.33 2 2 SD 1.21 1

Calculating split half reliability (2) Odd Mean Diff Even Prod. 2 1.83 1 1.33 3 0.17 -0.33 -0.056 1.67 -1.386 -0.83 1.17 0.67 0.784 -1.33 -0.226 0.17 0.114 0.17 0.67 -0.83 -1.33 1.104 0.334

Calculating split half 0.334 = 0.06 (6)(.75)(1.21) Adjust for test length using Spearman-Brown Prophecy formula 2 x 0.06 (2 – 1)0.06 +1 rxx =0.11

Cronbach alpha = 0.12 2 (1 - (0.75)2 + (1.21)2 (1.47)2 ) Similar to split half but easier to calculate 2 (1 - (0.75)2 + (1.21)2 (1.47)2 ) = 0.12

K-R20 “Rolls-Royce” of internal reliability estimates Simulates calculating split-half reliability for every possible combination of items

K-R20 formula Note that this is variance, not standard deviation Sum of Item Variance = the sum of IF(1-IF)

K-R21 Slightly less accurate than KR-20, but can be calculated with just descriptive statistics Tends to underestimate reliability

KR-21 formula Note that this is variance (standard deviation squared)

Test summary report (TAP) Number of Items Excluded = 0 Number of Items Analyzed = 40 Mean Item Difficulty = 0.597 Mean Item Discrimination = 0.491 Mean Point Biserial = 0.417 Mean Adj. Point Biserial = 0.369 KR20 (Alpha) = 0.882 KR21 = 0.870 SEM (from KR20) = 2.733 # Potential Problem Items = 9 High Grp Min Score (n=15) = 31.000 Low Grp Max Score (n=14) = 17.000 Split-Half (1st/ 2nd) Reliability = 0.307 (with Spearman-Brown = 0.470) Split-Half (Odd/Even) Reliability = 0.865 (with Spearman-Brown = 0.927)

Standard Error of Measurement If we give a student the same test repeatedly (test-retest), we would expect to see some variation in the scores 50 49 52 50 51 49 48 50 With enough repetition, these scores would form a normal distribution We would expect the student to score near the center of the distribution the most often

Standard Error of Measurement The greater the reliability of the test, the smaller the SEM We expect the student to score within one SEM approximately 68% of the time If a student has a score of 50 and the SEM is 3, we expect the student to score between 47 ~ 53 approximately 68% of the time on a retest

Interpreting the SEM For a score of 29: (K-R21) 26 ~ 32 is within 1 SEM 23 ~ 35 are within 2 SEM 20 ~ 38 are within 3 SEM

Calculating the SEM What is the SEM for a test with a reliability of r=.889 and a standard deviation of 8.124? SEM = 2.7 What if the same test had a reliability of r = .95? SEM = 1.8

Reliability for performance assessment Traditional fixed response assessment Performance assessment (i.e. writing, speaking) Test-taker Test-taker Task Instrument (test) Performance Scale Score Score Rater / judge

Interrater/Intrarater reliability Calculate correlation between all combinations of raters Adjust using Spearman-Brown to account for total number of raters giving score