Reliability & Validity

Slides:



Advertisements
Similar presentations
The Research Consumer Evaluates Measurement Reliability and Validity
Advertisements

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Chapter 5 Measurement, Reliability and Validity.
Part II Sigma Freud & Descriptive Statistics
Reliability and Validity of Research Instruments
Concept of Measurement
Reliability and Validity
Lecture 7 Psyc 300A. Measurement Operational definitions should accurately reflect underlying variables and constructs When scores are influenced by other.
Session 3 Normal Distribution Scores Reliability.
Research Methods in MIS
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
Validity and Reliability
Reliability, Validity, & Scaling
Ch 6 Validity of Instrument
Instrumentation.
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Measurement Validity.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Measurement MANA 4328 Dr. Jeanne Michalski
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
Measurement and Scaling Concepts
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Lecture 5 Validity and Reliability
Questions What are the sources of error in measurement?
Reliability & Validity
Week 3 Class Discussion.
The first test of validity
Measurement Concepts and scale evaluation
Presentation transcript:

Reliability & Validity

Reliability-Having confidence in the consistency of the test results. Reliability of a test refers to how well it provides a consistent (stable) set of results across similar test situations, time periods, and examiners or the extent to which measurements are repeatable.

Correlation- A statistical method of observing the degree of relationship between two sets of data or two sets of variables (e.g., hours spent studying and grades) Correlation coefficient -The numerical representation of the strength and direction of the relationship between two sets of variables or data. Represented by a real number between -/+ 1.00 (the stronger the correlation= the more reliable) * In investigating how data are related, it is important to determine if the two sets of data represent positive, negative, or no correlation.

Positive Correlation * In a positive correlation, when a student scores high on the first variable or test, the student will also score high on the second measure (e.g. calories eaten and weight) * The data below illustrate a positive correlation: Student 1- 75 Student 2- 88 Student 3- 90 Student 4- 63 Student 1- 72 Student 2- 89 Student 3- 93 Student 4- 64 Set 1 Set 2

Negative Correlation In a negative correlation, when a student scores high on one variable or test, the student will score low on the other variable or test (e.g. days absent and test scores) The data below illustrate a negative correlation: Student 1- 88 Student 2- 99 Student 3- 56 Student 4- 97 Student 1- 32 Student 2- 45 Student 3- 15 Student 4- 12 Set 1 Set 2

Two sets of data are presented below. Determine if the data sets represent a positive, negative, or no relationship. Data Set 1 94 78 89 58 62 77 75 45 95 Data Set 2 95 80 90 59 60 78 76 47 97 One way to determine the direction of the relationship is to plot the scores on a Scatter plot. The data for set 1 and set 2 are plotted on the next slide.

100 95 90 85 80 75 70 65 60 55 50 45 Set 1 * ** 45 50 55 60 65 70 75 80 85 90 95 100 Set 2

The direction of the line plotted on the scatter plot provides a clue about the relationship. If the relationship is positive, the direction of the line looks like this:

If the data represent a negative correlation, the direction of the line in the scatter plot looks like this:

No Correlation When there is little or no relationship between scores. The scattergram does not indicate a distinct line Figure 4.4 See scattergrams on p. 121 in Overton text

Methods of Studying Reliability Test-retest reliability- A study that employs the re-administration of a single instrument to check for consistency across time. (Sensitive to practice effects and effects of instruction!) Equivalent forms reliability-Consistency of a test using like forms that measure the same skill, domain, or trait; also known as alternate forms reliability. (Form A & B of a test)

Methods of Studying Reliability, Continued Internal consistency-Methods to study the reliability across the items of the test (within the test). Split-half reliability-studying the reliability across items by comparing the data of the two halves of the test. Kuder-Richardson 20- (K-R 20)- A formula used to check consistency across items of an instrument that has items scored as 1 or 0 or right/wrong. Coefficient Alpha- A formula used to check the consistency across items of instruments with responses of varying credit. For example, items may be scored as 0,1,2,or 3 points (e.g. a rubric).

Methods of Studying Reliability, Continued Interrater Reliability- The consistency of a test to measure a skill, trait, or domain across examiners. This type of reliability is most important when responses are subjective or open-ended. Reliability coefficients may vary across age and grade levels of a specific instrument!

Random Error Random error describes random events that have nothing to do with the quality being measured but impact the results Random error impacts the reliability of a test, and consequently all possible sources of random error must be addressed

Standard Error of Measurement Each test score is made of two parts: true score and error. A student’s true score may only be estimated. The standard error of measurement is a method used to estimate the amount of error of a test. It represents the typical amount of error of any obtained score. The standard error of measurement is used to estimate a range of scores within which the student’s true scores exists. Obtained Score = True Score + Error

Standard Error of Measurement, Continued The standard error of measurement is calculated using the following formula: SEM = SD 1-.r Where SEM = the standard error of measurement SD = the standard deviation of the norm group of scores obtained during the development of the instrument r = the reliability coefficient

Example of Calculating SEM For a specific test, the standard deviation is 4. The reliability coefficient is .89. The SEM would be: 4 1- .89 .11 This represents the amount of error on this test instrument. 4 4 x .33 = 1.32 SEM = 1.32

The range of possible scores is Applying the SEM The SEM was 1.32. A student’s obtained score was 89. To determine the range of possible true scores, add and subtract the SEM (a certain factor) from the obtained score of 89. 89 + 1.32(1) = 90.32 89 - 1.32(1) = 87.68 The range of possible scores is 87.68 - 90.32

Confidence Intervals Confidence intervals allow us to make statements about the proximity of the obtained score to the true score. A 68% confidence interval allows us to say that even though we don't know the true score, there is only a 32% chance that it is not within the range specified. A 95% confidence interval allows us to say that there is only a 5% chance that the person's true score is not within the range specified.

An Example Milly's obtained score on the Spelling Accuracy Test was 53. The SEM of the test is 1.85. We wish to be very confident in our estimate of Milly's score and choose a 95% confidence interval. See factor list…The factor to multiply the SEM by to achieve a 95% CI is 1.96 so 1.96 x 1.85=3.62. 53 + or – 3.62= Lower bound= 49.4 and Upper bound = 56.7 There is a 95% probability that Milly's true score on the Spelling Accuracy Test falls in the region between 49.38 and 56.62.

Selecting the Best Test Instruments When considering which tests will be the most reliable, it is important to select a test that has the highest reliability coefficient and the smallest standard of error. This will mean that the results obtained are more likely to be more consistent with the student’s true ability. The obtained score will contain less error.

Validity-Having confidence that the test is measuring what it is supposed to measure. Validity indicates the degree of quality of the test instrument. Validity coefficient ranges between -/+1.0. Validity is a relative rather than absolute notion.

Criterion-related Validity established by comparing performance on an accepted standard or criterion There are two ways to study criterion-related validity. Concurrent-related validity- When a test is compared with a similar measure administered within a short period of time. Predictive validity-When a test is compared with a measure in the future. For example, when college entrance exams are compared with student performance in College (GPAs).

Content Validity In order for a test to have good content validity, it must have items that are representative of the domain or skill being assessed. During the development of the test, items are selected after careful study of the items and the domain they represent. To ascertain content validity, submit the test to a panel of expert judges. The judges should examine the test in terms of: Completeness Appropriateness Format Bias

Construct Validity Construct validity means that the instrument has the ability to assess the psychological constructs it was meant to measure. A construct is a psychological trait or characteristic such as creativity or mathematical ability.

Studying Construct Validity Developmental changes Correlation with other tests Factor Analysis Internal consistency Convergent or discriminant validation Experimental interventions Overton, 138 & 139

External, Internal, and Social Validity External validity is concerned with generalizability (e.g., Do the results obtained in one particular situation apply in other situations?) Internal validity is the fundamental basis for interpreting the results of an intervention (e.g., Did the intervention make a difference in this specific instance?) Social validity refers to the social value and acceptability of an educational intervention.