LECTURE 6 RELIABILITY. Reliability is a proportion of variance measure (squared variable) Defined as the proportion of observed score (x) variance due.

Slides:



Advertisements
Similar presentations
Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.
Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Topic 12: Multiple Linear Regression
Structural Equation Modeling
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
The Department of Psychology
Chapter 4 – Reliability Observed Scores and True Scores Error
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Structural Equation Modeling
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
The Simple Linear Regression Model: Specification and Estimation
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Simulation Modeling and Analysis Session 12 Comparing Alternative System Designs.
LECTURE 5 TRUE SCORE THEORY. True Score Theory OBJECTIVES: - know basic model, assumptions - know definition of reliability, relation to TST - be able.
SIMPLE LINEAR REGRESSION
Lecture 9: One Way ANOVA Between Subjects
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Education 795 Class Notes Factor Analysis II Note set 7.
LECTURE 16 STRUCTURAL EQUATION MODELING.
Research Methods in MIS
Lecture 5 Correlation and Regression
Reliability, Validity, & Scaling
SIMPLE LINEAR REGRESSION
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Equations in Simple Regression Analysis. The Variance.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Lecture 9 TWO GROUP MEANS TESTS EPSY 640 Texas A&M University.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
G Lecture 7 Confirmatory Factor Analysis
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
G Lecture 91 Measurement Error Models Bias due to measurement error Adjusting for bias with structural equation models Examples Alternative models.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Experimental Statistics - week 3
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Experimental Statistics - week 9
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Chapter 6 Norm-Referenced Reliability and Validity.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
The simple linear regression model and parameter estimation
Measurement Reliability
Reliability Analysis.
Regression Analysis AGEC 784.
Evaluation of measuring tools: validity
Classical Test Theory Margaret Wu.
Evaluation of measuring tools: reliability
Reliability Analysis.
Simple Linear Regression
15.1 The Role of Statistics in the Research Process
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

LECTURE 6 RELIABILITY

Reliability is a proportion of variance measure (squared variable) Defined as the proportion of observed score (x) variance due to true score (  ) variance:  2 x  =  xx’ =  2  /  2 x

Var(  ) Var(x) Var(e) reliability VENN DIAGRAM REPRESENTATION

PARALLEL FORMS OF TESTS If two items x 1 and x 2 are parallel, they have equal true score variance: –Var(  1 ) = Var(  2 ) equal error variance: –Var(e 1 ) = Var(e 2 ) Errors e 1 and e 2 are uncorrelated:  (e 1, e 2 ) = 0  1 =  2

Reliability: 2 parallel forms x 1 =  + e 1, x 2 =  + e 2  (x 1,x 2 ) = reliability =  xx’ = correlation between parallel forms

 x1x1 xx e x2x2 e xx  xx’ =  x  *  x  Reliability: parallel forms

Reliability: 3 or more parallel forms For 3 or more items x i, same general form holds reliability of any pair is the correlation between them Reliability of the composite (sum of items) is based on the average inter-item correlation: stepped-up reliability, Spearman-Brown formula

Reliability: 3 or more parallel forms Spearman-Brown formula for reliability r xx = k r(i,j) / [ 1+ (k-1) r(i,j) ] Example: 3 items, 1 correlates.5 with 2, 1 correlates.6 with 3, and 2 correlates.7 with 3; average is.6 r xx = 3(.6) / [1 + 2(.6) ] = 1.8/2.2 =.87

Reliability: tau equivalent scores If two items x 1 and x 2 are tau equivalent, they have  1 =  2 equal true score variance: –Var(  1 ) = Var(  2 ) unequal error variance: –Var(e 1 )  Var(e 2 ) Errors e 1 and e 2 are uncorrelated:  (e 1, e 2 ) = 0

Reliability: tau equivalent scores x 1 =  + e 1, x 2 =  + e 2  (x 1,x 2 ) = reliability =  xx’ = correlation between tau eqivalent forms (same computation as for parallel, observed score variances are different)

Reliability: Spearman-Brown Can show the reliability of the parallel forms or tau equivalent composite is  kk’ = [k  xx’ ]/[1 + (k-1)  xx’ ] k = # times test is lengthened example: test score has rel=.7 doubling length produces rel = 2(.7)/[1+.7] =.824

Reliability: Spearman-Brown example: test score has rel=.95 Halving (half length) produces  xx =.5(.95)/[1+(.5-1)(.95)]  =.905 Thus, a short form with a random sample of half the items will produce a test with adequate score reliability

Reliability: KR-20 for parallel or tau equivalent items/scores Items are scored as 0 or 1, dichotomous scoring Kuder and Richardson (1937): special cases of Cronbach’s more general equation for parallel tests. KR-20 = [k/(k-1)] [ 1 -  p i q i /  2 y ], where p i = proportion of respondents obtaining a score of 1 and q i = 1 – p i. p i is the item difficulty

Reliability: KR-21 for parallel forms assumption Items are scored as 0 or 1, dichotomous scoring Kuder and Richardson (1937) KR-21 = [k/(k-1)] [ 1 - k  p. q. /  2 c ] p. is the mean item difficulty and q. = 1 – p. KR-21 assumes that all items have the same difficulty (parallel forms) item mean gives the best estimate of the population values. KR-21  KR-20.

Reliability: congeneric scores If two items x 1 and x 2 are congeneric, 1.  1   2 2. unequal true score variance: Var(  1 )  Var(  2 ) 3. unequal error variance: Var(e 1 )  Var(e 2 ) 4. Errors e 1 and e 2 are uncorrelated:  (e 1, e 2 ) = 0

Reliability: congeneric scores x 1 =  1 + e 1, x 2 =  2 + e 2  jj = Cov(t 1, t 2 )/  x1  x2 This is the correlation between two separate measures that have a common latent variable

11 x1x1  x1  1 e1e1 x2x2 e2e2  x2  2  xx’ =  x1  1  12  x2  2 22  12 Congeneric measurement structure

Reliability: Coefficient alpha Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k  ≤ 1 -  2 k /  2 c  est = k/(k-1)[1 -  s 2 k / s 2 c ]

Reliability: Coefficient alpha Alpha = 1. Spearman-Brown for parallel or tau equivalent tests 2. = KR20 for dichotomous items (tau equiv.) = Hoyt, even for  2  x item  0 (congeneric)

Hoyt reliability Based on ANOVA concepts extended during the 1930s by Cyrus Hoyt at U. Minnesota Considers items and subjects as factors that are either random or fixed (different models with respect to expected mean squares) Presaged more general Coefficient alpha derivation

Reliability: Hoyt ANOVA Source dfExpected Mean Square Person (random) I-1  2  +  2  x items + K  2  Items (random) K-1  2  + k  2  x item + I  2 items error (I-1)(K-1)  2  +  2  x item parallel forms =>  2  x item = 0  Hoyt = { ℇ(MS persons ) - ℇ(MS error ) } / ℇ(MS persons ) est  Hoyt = [ (MS persons ) - (MS error ) ] / (MS persons )

Reliability: Coefficient alpha Composite=sum of k parts, each with its own true score and variance C = x 1 + x 2 + …x k Example: sx1 = 1, sx2=2, sx3=3 sc = 5  est = 3/(3-1)[1 -  (1+4+9)/25 ] = 1.5[1 – 14/25] = 16.5/25 =.66

JOE1110 SUZY1011 FRANK0010 JUAN0111 SHAMIKA1111 ERIN0001 MICHAEL0111 BRANDY1100 WALID1011 KURT0010 ERIC1110 MAY1000 SPSS DATA FILE

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 4 Alpha =.1579 SPSS RELIABILITY OUTPUT

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A) Reliability Coefficients N of Cases = 12.0 N of Items = 8 Alpha =.6391 Note: same items duplicated SPSS RELIABILITY OUTPUT

TRUE SCORE THEORY AND STRUCTURAL EQUATION MODELING True score theory is consistent with the concepts of SEM - latent score (true score) called a factor in SEM - error of measurement - path coefficient between observed score x and latent score  is same as index of reliability

COMPOSITES AND FACTOR STRUCTURE 3 Manifest (Observed) Variables required for a unique identification of a single factor Parallel forms implies –Equal path coefficients (termed factor loadings) for the manifest variables –Equal error variances –Independence of errors

 x1x1 xx e x2x2 e xx  x i x j =  x i  *  x j  = reliability between variables i and j x3x3 e xx Parallel forms factor diagram

RELIABILITY FROM SEM TRUE SCORE VARIANCE OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: k  =  2 i = Variance of factor i=1 k = # items or subtests = k  2 x  = k times pairwise average reliability of items

RELIABILITY FROM SEM RELIABILITY OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS:  = k/(k-1)[1 - 1/  ] example  2 x  =.8, K=11  = 11/(10)[1 - 1/8.8 ] =.975

TAU EQUIVALENCE ITEM TRUE SCORES DIFFER BY A CONSTANT:  i =  j +  k ERROR STRUCTURE UNCHANGED AS TO EQUAL VARIANCES, INDEPENDENCE

CONGENERIC MODEL LESS RESTRICTIVE THAN PARALLEL FORMS OR TAU EQUIVALENCE: –LOADINGS MAY DIFFER –ERROR VARIANCES MAY DIFFER MOST COMPLEX COMPOSITES ARE CONGENERIC: –WAIS, WISC-III, K-ABC, MMPI, etc.

 x1x1 x1x1 e1e1 x2x2 e2e2 x2x2  (x 1, x 2 )=  x 1  *  x 2  x3x3 e3e3 x3x3

COEFFICIENT ALPHA  xx’ = 1 -  2 E /  2 X = 1 - [  2 i (1 -  ii )]/  2 X, since errors are uncorrelated  = k/(k-1)[1 -  s 2 i / s 2 C ] where C =  x i (composite score)  s 2 i = variance of subtest  x i  s C = variance of composite Does not assume knowledge of subtest  ii

COEFFICIENT ALPHA- NUNNALLY’S COEFFICIENT IF WE KNOW RELIABILITIES OF EACH SUBTEST,  i  N = K/(K-1)[1-  s 2 i (1- r ii )/ s 2 X ] where r ii = coefficient alpha of each subtest Willson (1996) showed    N   xx’

 x1x1 x1x1 e1e1 x2x2 e2e2 x2x2  X i X i =  2 x i  + s 2 i x3x3 e3e3 x3x3 s1s1 NUNNALLY’S RELIABILITY CASE s2s2 s3s3

Reliability Formula for SEM with Multiple factors (congeneric with subtests) Single factor model:  =  i 2 / [  i 2 +  ii +   ij ]  >  If eij = 0, reduces to  =  i 2 / [  i 2 +  ii ] = Sum(factor loadings on 1 st factor)/ Sum of observed variances This generalizes (Bentler, 2004) to the sum of factor loadings on the 1 st factor divided by the sum of variances and covariances of the factors for multifactor congeneric tests Maximal Reliability for Unit-weighted Composites Peter M. Bentler University of California, Los Angeles UCLA Statistics Preprint No. 405 October 7,

Multifactor models and specificity Specificity is the correlation between two observed items independent of the true score Can be considered another factor Cronbach’s alpha can overestimate reliability if such factors are present Correlated errors can also result in alpha overestimating reliability

 x1x1 x1x1 e1e1 x2x2 e2e2 x2x2 Specificities can be misinterpreted as a correlated error model if they are correlated or a second factor x3x3 e3e3 x3x3 s CORRELATED ERROR PROBLEMS s3s3

 x1x1 x1x1 e1e1 x2x2 e2e2 x2x2 Specificieties can be misinterpreted as a correlated error model if specificities are correlated or are a second factor x3x3 e3e3 x3x3 CORRELATED ERROR PROBLEMS s3s3

SPSS SCALE ANALYSIS ITEM DATA EXAMPLE: (Likert items, 0-4 scale) Mean Std Dev Cases 1. CHLDIDEAL (0-8) BIRTH CONTROL PILL OK SEXED IN SCHOOL POL. VIEWS (CONS-LIB) SPANKING OK IN SCHOOL

CORRELATIONS Correlation Matrix CHLDIDEL PILLOK SEXEDUC POLVIEWS CHLDIDEL PILLOK SEXEDUC POLVIEWS SPANKING

SCALE CHARACTERISTICS Statistics for Mean Variance Std Dev Variables Scale Items Mean Minimum Maximum Range Max/Min Variance Item Variances Mean Minimum Maximum Range Max/Min Variance Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

ITEM-TOTAL STATS Item-total Statistics Scale Scale Corrected Mean Variance Item- Squared Alpha Total Multiple if item Correlation R deleted CHLDIDEAL PILLOK SEXEDUC POLVIEWS SPANKING

ANOVA RESULTS Analysis of Variance Source of Variation Sum of Sq.DF Mean Square F Prob. Between People Within People Measures Residual Total

RELIABILITY ESTIMATE Reliability Coefficients 5 items Alpha =.2625 Standardized item alpha =.3093 Standardized means all items parallel

RELIABILITY: APPLICATIONS

STANDARD ERRORS s e = standard error of measurement = s x [1 -  xx ] 1/2 can be computed if  xx is estimable provides error band around an observed score: [ -1.96s e + x, 1.96s e + x ]

x +1.96s e -1.96s e ASSUMES ERRORS ARE NORMALLY DISTRIBUTED

TRUE SCORE ESTIMATE  est =  xx x + [1 -  xx ] x mean example: x= 90, mean=100, rel.=.9  est =.9 (90) + [1 -.9 ] 100 = = 91

STANDARD ERROR OF TRUE SCORE ESTIMATE S  = = s x [  xx ] 1/2 [1 -  xx ] 1/2 Provides estimate of range of likely true scores for an estimated true score

DIFFERENCE SCORES Difference scores are widely used in education and psychology: Learning disability = Achievement - Predicted Achievement Gain score from beginning to end of school year Brain injury is detected by a large discrepancy in certain IQ scale scores

RELIABILITY OF D SCORES D = x - y s 2 D = s 2 x + s 2 y - 2r xy s x s y r DD = [r xx s 2 x + r yy s 2 y -2 r xy s x s y ]/ [s 2 x + s 2 y - 2r xy s x s y ]

REGRESSION DISCREPANCY D = y - y pred where y pred = bx + b 0 s DD = [(1 - r 2 xy )(1- r DD )] 1/2 where r DD = [r yy + r xx r xy -2r 2 xy ]/ [1- r 2 xy ]

TRUE DISCREPANCY D = DDD = b D y.x (y - y mn ) + b D x.y (x - x mn ) D = [ DDs D = [ b 2 D y.x + b 2 D x.yn +2(b D y.x b D x.y r xy ] =and r DD = {[2-(r xx -r yy ) 2 + (r yy -r xy ) 2 - 2(r yy -r xy )(r xx -r xy )r 2 xy ] / [(1-r xy )(r yy +r xx -2r xy )]} -1