Reliability & Validity

Slides:



Advertisements
Similar presentations
Assessment in Early Childhood Education Fifth Edition Sue C. Wortham
Advertisements

Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Consistency in testing
Topics: Quality of Measurements
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Concept of Measurement
Reliability and Validity
Chapter 7 Correlational Research Gay, Mills, and Airasian
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Classical Test Theory By ____________________. What is CCT?
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
Measurement in Exercise and Sport Psychology Research EPHE 348.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Reliability & Validity
Validity Is the Test Appropriate, Useful, and Meaningful?
Tests and Measurements Intersession 2006.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Measurement MANA 4328 Dr. Jeanne Michalski
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Measurement and Scaling Concepts
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Ch. 5 Measurement Concepts.
Lecture 5 Validity and Reliability
Product Reliability Measuring
Reliability and Validity
Concept of Test Validity
Evaluation of measuring tools: validity
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Tests and Measurements: Reliability
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
Reliability and Validity of Measurement
Correlation and Regression
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
RESEARCH METHODS Lecture 18
Chapter 4 Characteristics of a Good Test
By ____________________
The first test of validity
Measurement Concepts and scale evaluation
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Reliability & Validity Chapter 4 Reliability & Validity

Reliability & Validity Aids in determining test accuracy and dependability Reliability—the dependability or consistency of an instrument across time or items. Validity—the degree to which an instrument measures what it was designed to measure. Instruments should have both properties but may have only one (not that a strong of an instrument)

Correlation (r) Correlation—the degree of relationship between two variables. Two administrations of the same test Administration of equivalent forms Correlation coefficient ranges: +1.00 to -1.00 Perfect positive correlation = +1.00 Perfect negative correlation = -1.00 No correlation = 0 Numbers closer to +1.00 represent stronger relationships The greater degree of the relationship, the more reliable the instrument. The + does not indicate strength, but direction.

Scattergram Scattergrams provide a graphic representation of a data set and show a correlation. The more closely the dots on a scattergram approximate a straight line, the nearer to perfect the correlation.

Types of Correlation Positive Correlation Negative Correlation No Variables with a positive relationship move in the same direction. Scores on variables increase simultaneously. High scores on one variable are associated with low scores on another variable. When data from two variables are not associated or have no relationship. No linear direction on a scattergram

Methods of Measuring Reliability Pearson’s r Pearson’s Product Moment correlation Used with interval or ratio data Internal Consistency The consistency of items on an instrument to measure a skill, trait or domain. Test-retest Equivalent forms Split-half Kuder-Richardson formulas

Test-Retest Reliability Test-retest reliability—the trait being measured is one that is stable over time. If the trait being measured remains constant, the re-administration of the instrument will result in scores similar to the first score. Important to conduct re-test shortly after first test to control for influencing variables. Difficulties: Too soon: Students may remember test items (practice effect) and score higher the second time. Too far: Greater influence of time variables (e.g., learning, maturation, etc.)

Equivalent (Alternate) Forms Reliability Equivalent forms reliability Two forms of the same instrument are used. Items are matched for difficulty. Advantage: Two tests of the same difficulty level that can be administered within a short time frame without the influence of practice effects.

Internal Consistency Measures Split-Half Reliability Takes all available items on a test and divides the items in half. Establishes reliability of half the test with the other half. Does not establish reliability of the entire test—reliability increases with the number of items. Kuder-Richardson 20 Used to check consistency across items of an instrument with right or wrong answers. Coefficient Alpha Used to check consistency across items of an instrument where credit varies across responses.

Interrater Reliability The consistency of a test across examiners. One person administers a test, a second person rescores the test. The scores are then correlated to determine how much variability exists between the scores. Very important for subjective-scoring tests.

Which Type of Reliability is Best? Three reliability types Consistency over time Consistency of items on a test Consistency of scorers Optimal r scores .60 is adequate .80 is very good (preferred) Which one is chosen depends upon the purpose of the assessment. Reliability coefficient is a group statistic and can be influenced by the make-up of the group. It is important to review the manual to determine the make-up of the group.

Standard Error of Measurement Basic assumption of assessment: ERROR EXISTS Variables that affect scores exist for a variety of reasons: Poor testing environment Errors in the test Student variables (e.g., hungry, tired) This variance is called error and is the standard error of measurement. Instruments with small standard error of measurement are preferred. A single test may not accurately reflect a student’s true score.

Calculating Standard Error of Measurement To estimate the amount of error present in an obtained score SEM = SD √ 1 – r SEM = Standard Error of Measurement SD = Standard Deviation r = Reliability coefficient SEM is based on normal distribution theory. Confidence Interval The range of scores for an obtained score + the SEM

Application of SEM The range of error and the range of a student’s score may vary substantially, which may change the interpretation of the score for placement purposes. SEM varies by age, grade and subtest. When SEM is applied to scores, discrepancies may not be significant.

Estimated True Scores A method of calculating the amount of error correlated with the distance of the score from the mean of the group. The further a score is from the mean, the greater chance for error. A true score is always assumed to be nearer to the mean than the obtained score. Estimated true scores can be used to establish a range of scores. Estimated True Scores = M + r (X - M) M = mean of group r = reliability coefficient X = obtained score

Test Validity Does the test actually measure what it is supposed to measure? Criterion-related validity: Comparing scores with other criteria known to be indicators of the same trait or skill Concurrent Validity: Two tests are given within a very short timeframe (often the same day). If scores are similar, the tests are said to be measuring the same trait. Predictive Validity: Measures how well an instrument can predict performance on some other variable.

Content Validity Ensuring that the items in a test are representative of content purported to be measured. PROBLEM: Teachers often generalize and assume the test covers more than it does (e.g., the WRAT-3 reading subtest only measures word recognition—not phonemic awareness, phonics, vocabulary, reading comprehension, etc.). Some of the variables of content validity may influence the manner in which results are obtained and can contribute to bias in testing. Presentation Format: The method by which items are presented to the student Response Mode: The method for the examinee to answer items.

Construct Validity A term used to describe a psychological trait, personality trait, psychological concept, attribute or theoretical characteristic. The construct must be clearly defined although they are often abstract concepts. Types of studies that can establish construct validity Developmental changes Correlations with other tests Factor analysis Internal consistency Convergent and discriminate validation Experimental interventions

Validity of Test ~v~ Validity of Use Tests may be used inappropriately even though they are valid instruments. Results obtained may be used an in invalid manner. Tests may be biased and/or discriminate against different groups. Item bias, when an item is answered incorrectly a disproportionate number of times by one group compared to another. Predictive validity may predict accurately for one group and not another.