Evaluation of measuring tools: reliability

Slides:

Advertisements

Similar presentations

Psychology Practical (Year 2) PS2001 Correlation and other topics.

Advertisements

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Topics: Quality of Measurements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

The Department of Psychology

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT

Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

RESEARCH METHODS Lecture 18

PSY 307 – Statistics for the Behavioral Sciences

Chapter Eighteen MEASURES OF ASSOCIATION

Chapter 8 Estimation: Single Population

FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.

Relationships Among Variables

Measurement and Data Quality

Chapter 15 Correlation and Regression

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.

Hypothesis of Association: Correlation

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Tests and Measurements Intersession 2006.

Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.

Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Chapter 16 Data Analysis: Testing for Associations.

Chapter 14 Repeated Measures and Two Factor Analysis of Variance

RELIABILITY Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.

Chapter 6 - Standardized Measurement and Assessment

2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

Lesson 2 Main Test Theories: The Classical Test Theory (CTT)

Lesson 5.1 Evaluation of the measurement instrument: reliability I.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Professor Jim Tognolini

Evaluation of measuring tools: validity

RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS

CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.

Classical Test Theory Margaret Wu.

Reliability & Validity

Correlation and Regression

Analyzing Reliability and Validity in Outcomes Assessment Part 1

PSY 614 Instructor: Emily Bullock, Ph.D.

RESEARCH METHODS Lecture 18

MANA 5341 Dr. George Benson Measurement MANA 5341 Dr. George Benson 1.

The first test of validity

Analyzing Reliability and Validity in Outcomes Assessment

Chapter 8 VALIDITY AND RELIABILITY

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Evaluation of measuring tools: reliability

After we have evaluated the quality of test items and eliminated those that are not considered adequate, we must evaluate the overall quality of the test. In this chapter we discuss the problem of the reliability and accuracy of the measure, trying to find an answer to the question “to what extent the scores obtained by subjects in the test are affected by measurement errors and how much”.

The problem of measurement error

Measurement error is the difference between the empirical score obtained by a subject on a test and his/her true score. Objective: Elaborate tests that lead to the minimum possible measurement error. That the obtained score gives the greatest degree of real information on the characteristics under study. There are other errors, random ones (which ones are studied trough analysis of reliability).

Types of measurement errors

Measurement error: the difference between the empirical score of a subject and their true score. We obtain an individual measure of the accuracy of the test. The standard error of measurement: standard deviation of measurement errors. It’s a measurement of the group because it is calculated for all subjects of the sample. Estimation error of the true score: the difference between the true score of the subject and the true score predicted by the regression model. The standard error of estimation of the true score: standard deviation of estimation errors.

Substitution error: the difference between the score obtained by a subject in a test and that one obtained in another parallel test. It would be committed to replacing the test scores on the X1 by those from a parallel test X2. The standard error of substitution: standard deviation of substitution errors. Prediction error: the difference between the scores obtained by a subject in a test (X1) and predicted scores in the same test (X1') from a parallel test X2. The standard error of prediction = standard deviation of prediction errors.

The linear model of Spearman

X (empirical score)= V (true level)+ E (measurement error) He's going to help us estimate the amount of error that are affecting to the empirical scores and the true level of subjects in the characteristic of study. X (empirical score)= V (true level)+ E (measurement error)

A) E = X – V B) E (e) = 0 C) D) Cov (V, E) = 0 E) F) Cov (X, V) = G) H)

Interpretation of the reliability coefficient

The correlation between the empirical scores obtained by a sample of subjects in two parallel forms of the test. The ratio between the variance of true scores and the variance of empirical scores. As this ratio increases, the measurement error decreases. Reliability index:

Factors that affect reliability

TEST LENGTH If we increase the length of the test (if we add parallel items): More information about the attribute under study. Lower error when estimating the true score of a subject. So, reliability will increase.

SAMPLING VARIABILITY The reliability coefficient can vary depending on the homogeneity of the group. The lower the reliability coefficient the more homogeneous the group. * We assume that the standard error of measurement of a test remains constant independently of the variability of the group in which it is applied.

Reliability as equivalence and stability of measures Coefficient of reliability or equivalence

A test must meet two requirements : It should measure the characteristic that really needs to be measured (be valid). Empirical scores obtained by applying the test should be: Accurate (free of error), and Stable (when we evaluate a trait or characteristic with the same test at different times and under conditions as similar as possible, if the studied trait has not changed, you must obtain similar results: reliability of the test).

a) Parallel forms method 1. Elaborate two parallel forms of one test X and X’. 2. Apply the two tests on a sample of subjects representative of the population targeted by the test. 3. Calculate Pearson’s correlation. X1 and X2:scores obtained by subjects in each form of the test. If applications are made at the same time there is greater control over the conditions of application. Difficulty to elaborate two parallel forms.

b) Test-retest method 1. Apply the same test on two separate occasions to the same sample of subjects. 2. Calculate the correlation X1 and X2: scores obtained by subjects in each of the test applications. It does not require different forms of the same test. Possible influence of memory, the time interval between one application and another, and the attitude of the subject.

Reliability as internal consistency

Methods to estimate the reliability of a test that only require one application: A) Based on the division of the test in two parts: Spearman-Brown Rulon Guttman-Flanagan B) Based on the covariation of items: Cronbach's alpha coefficient

a) Methods Based on the division of the test in two parts The estimation of reliability is not affected by the factors discussed. Save time and effort. 1. Apply the test to a sample of subjects. 2. Once obtained the scores, divide the test in two parts, calculate the correlation between the scores obtained by subjects in both parts and apply a correction formula. The parts should be similar in difficulty and content.

Spearman-Brown The two parts must be parallel, so we should check the assumptions of parallelism (true scores of the subjects are the same in both tests, the variance of measurement errors is the same in both tests).

Spearman-Brown. Example We have applied a numerical aptitude test of 20 items to a sample of 6 subjects. The table results are the scores obtained on even items (X1) and odd ones (X2). Calculate the reliability coefficient assuming that the two parts of the test are parallel. Subjects X1 X2 X12 X22 X1X2 1 8 4 64 16 32 2 7 49 3 6 36 48 5 25 20 56 Total 42 34 302 202 241

Rulon and Guttman-Flanagan They are applied when, despite not being strictly parallel parts, we can consider tau-equivalent (test in which the true scores of subjects of the sample are the same in both forms but the error variances are not necessarily equal) or essentially tau-equivalent (test in which the true score for each subject in one of the test is equal to the other plus a constant).

Rulon y Guttman-Flanagan

b) Method based on the covariation of items It requires the analysis of variance and covariance of the subjects' responses to the items. It is an estimation of the internal consistency of test’s items. Cronbach's alpha coefficient. It is based on the mean correlation among all test’s items.

Cronbach's alpha. Example We have applied a test of visual perception to 6 subjects. The results of the table show the scores of subjects in each of the five test items. Calculate the value of the coefficient of reliability of the test. Subjects 1 2 3 4 5 A B C D E F

Estimation of the true score of the subjects in the attribute of interest

Estimations about the value of a subject's true score on a test and the error that affects the empirical scores obtained in the same test. We can’t calculate the exact value of the subject’s true score, but we can establish a confidence interval within which we will find the score with a given confidence level.

a) Method of Chebychev’s inequality It is applied if there is not any assumption about the empirical scores distribution or the errors distribution. The true score will be between two values, the upper limit and the lower limit.

Example We have administered a numerical reasoning test to 200 subjects. We have obtained: Mean=52, SX=7, rXX=0.73. Estimate the true score of a subject who obtained an empirical score of 65 points on the test. Confidence level of 95%. Too wide interval which involves a vague estimation. It may be due to a low reliability coefficient or that this method does not consider the type of distribution of empirical scores.

b) Estimation based on the normal distribution of errors It assumes a normal distribution both of measurement errors and empirical scores. 1. Calculate Zs in the normal distribution table for the desired confidence level. 2. Calculate the standard error of measurement. 3. Calculate the maximum measurement error that we are willing to admit. 4. Confidence interval in which we will find the true score.

Example The same data than previously. The confidence interval has been reduced significantly.

c) Estimation based on the regression method It is more convenient to make the confidence interval not from empirical scores (which are biased due to measurement errors), but from the estimated true score. 1. Make the regression equation of V on X. To determine the regression equation is useful for: Describe concisely the relationship between variables. Predict the values of a variable depending on the other.

2. Calculate Zs in the normal distribution table for a given level of confidence. 3. Calculate the standard error of estimation Svx. 4. Calculate the maximum error of estimation. 5. Calculate the confidence interval in which we will find the true score.

Example Same data than previouly. Estimate the true score (in raw, differential and typical scores) of a subject who scored 65 points.