All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.

Slides:



Advertisements
Similar presentations
Chapter 8 Flashcards.
Advertisements

Chapter 16: Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation, Reliability and Regression Chapter 7.
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability & Validity.  Limits all inferences that can be drawn from later tests  If reliable and valid scale, can have confidence in findings  If.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Effect Size and Meta-Analysis
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
2005 All Hands Meeting Measuring Reliability: The Intraclass Correlation Coefficient Lee Friedman, Ph.D.
Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.
RELIABILITY consistency or reproducibility of a test score (or measurement)
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Limitations of Analytical Methods l The function of the analyst is to obtain a result as near to the true value as possible by the correct application.
A quick introduction to the analysis of questionnaire data John Richardson.
Clustered or Multilevel Data
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Linear Regression/Correlation
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Issues in Experimental Design Reliability and ‘Error’
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Statistical Methods, part 1 Module 2: Latent Class Analysis of Survey Error Models for measurement errors Dan Hedlin Stockholm University November 2012.
Sampling and Nested Data in Practice- Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Effect Size Estimation in Fixed Factors Between-Groups ANOVA
Effect Size Estimation in Fixed Factors Between- Groups Anova.
Rater Reliability How Good is Your Coding?. Why Estimate Reliability? Quality of your data Number of coders or raters needed Reviewers/Grant Applications.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Generalizability Theory Nothing more practical than a good theory!
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Academic Research Academic Research Dr Kishor Bhanushali M
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Sample Size Determination
Sampling and Nested Data in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Business Research Methods
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Generalizability Theory A Brief Introduction Greg Brown UCSD.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Statistical analysis.
Chapter 7. Classification and Prediction
Statistical analysis.
Reliability and Validity of Measurement
Evaluation of measuring tools: reliability
Linear Regression and Correlation
Linear Regression and Correlation
Presentation transcript:

All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD

Reliability Coefficients: The problems  Reliability coefficients were often developed in varying literatures without regard to a cohesive theory  Cohesive theories of reliability available in the literature are not widely known  Reliability terms are used inconsistently  Different terms in the literature are at times used to represent the same reliability concept

Levels of the Family Tree Level 1. Study Aim Level 2. # Study Factors Level 3. # Levels within Study Factors Level 4. Score Standardization Level 6. Level of Measurement Level 5. Nesting

The Progenitor Coefficient Correlation Ratio (  2 ) Winer et al., 1991

Correlation ratios  Vary between 0.0 and 1.0  Typically measure the amount of variance accounted for by a factor in the analysis of variance design  Index the strength of association between levels of a study factor and the dependent variable, regardless of whether the functional relationship between study factors and the dependent measure is linear or nonlinear.

The two meanings of error  Definition 1: The error term in analysis of variance models:  Definition 2: All relevant sources of variance in an analysis of variance design besides the source of interest  The two definitions of error are associated with different reliability models and with different reliability coefficients

Levels of the Family Tree Level 1. Study Aim Correlation Ratio Determine Reliability Establish Validity Reliability Measures Effect Size Measures

Correlation Ratio and Reliability Measures Correlation ratios based on variance component estimates derived from random effects models are generally consistent measures of reliability (Olkin & Pratt, 1958).

The Correlation Ratio and Effect Size Measures Winer et al., 1991 Effect Size: Cohen’s Parameter related to power:

Cohen’s f Cohen’s f is the variance of the means across the various levels of an study factor scaled by the common within group variance.

Caveat: There are Two Definitions of the Correlation Ratio  OLD Definition: The correlation ratio is a ratio of sums of squares (Kerlinger, 1964, pp , Cohen, 1965).  Current Definition: The correlation ratio is a ratio of variance component estimates and their fixed effects analogues (eg. Winer et al., 1991). This is the definition of the correlation ratio used in this talk.

Correspondence Among Effect Size Measures Effect Size + Cohen’s f 22 Power* n=20 Small Medium Large Cohen (1988); *Winer et al., 1991, pp : F(2,57,f), p=.05

Shrout and Fleiss (1979) Example Raters Subjects Entries are ratings on a scale of 1 to 10.

Correlation Ratios for Shrout and Fleiss Example: Random Effects Model For both Validity and Reliability Analyses Model: X ij =  +  i +  j +  ij  i : effect of subject, N(0,  ),  i assumed to be independent of  j and  ij  j : effect of raters, N(0, ),  i assumed to be Independent of  i and  ij Both  i and  j are random effects.

Results of Shrout and Fleiss Random Effects Analysis EffectMean Square F-value  *2*2 + Power Raters < ~1.00 Subjects < Error1.019 *Based on variance components estimates using total variance for denominator of correlation ratio. + Based on variance components definition of  2 and previously described relationship between  2 and Cohen’s f.

Claim 2  The  2 for subjects equals the ICC(2,1) for these data (See Shrout and Fleiss, 1979).  Reliability and validity can both be investigated within an analysis of variance framework.

Levels of the Family Tree Level 1. Study Aim Level 2. # Study Factors

Levels of the Family Tree Level 2. Number of Study Factors Reliability Measures Single Factor Designs Multifactorial Designs Intraclass Correlations Generalizability Theory Coefficients

Examples  A single factor reliability design is one where there is only one only source of variance besides subjects (eg., Raters judging all subjects).  A multi-factor reliability design is one there are several sources of variance besides subjects (eg. Raters judging all subjects on 2 days).

Intraclass correlations for single facet reliability studies  Just reviewed by Lee Friedman

Generalizability Theory  Measurement always involves some conditions (eg. raters, items, ambient sound) that could be varied without changing the acceptability of the observations.  The experimental design defines a universe of acceptable observations of which a particular measurement is member.  The question of reliability resolves into the question of how accurately the observer can generalize back to the universe of observations.

Generalizability Theory (continued)  A reliable measure is one where the observed value closely estimates the expected score over all acceptable observations, i.e., the universe score  Generalizability coefficient: Cronback, Gleser, Nanda, & Rajaratnam, 1972

Basic Components of the Generalizability Coefficient  Universe score variance: the estimated variance across the objects of measurement (eg., people) in the sample at hand:  Relative error: For the Shrout & Fleiss example it is the sum of variance components related to people averaged over raters.

Generalizability Theory (continued)  Generalizability coefficient: Brennan, 2001

The Generalizability Coefficient  Generalizability Coefficient:  A large generalizability coefficient means that person variance can be estimated without large effects from other sources of variance that might effect the expected between-subject variation within raters.

Generalizability Theory and Measurement Precision  Generalizability Theory provides a measurement standard: True variation among objects of measurement, eg. people  Generalizability Theory uses the concept of person variance to provide a clear and simple relationship between reliability coefficients, C, and measurement precision: Standard error = ((1-C)/C)  2 person.

Innovative Aspects of Generalizability Theory  Generalizability Theory asserts there exist multiple sources of error rather than the single error term of classical reliability theory.  Analysis of variance can be used to hunt these sources of error.  New definitions: A reliability measure is one that is stable over unwanted sources of variance A valid measure is one that varies over wanted sources of variance

Generalizability Coefficient for Shrout & Fleiss (1979) data

ICC(3,k) and the Generalizability Coefficient (continue)  The generalizability coefficient is equivalent to ICC(3,k) and both are measures of rater consistency  ICC(3,1) can be calculated directly from variance components estimates and is equal to the traditional use of the Correlation Ratio as a measure of amount of variance accounted for.

The Dependability Coefficient  Absolute error = sum of variance components each averaged over their respective numbers of observations  Depedendability coefficient =

Dependability Coefficient for Shrout & Fleiss (1979) data

The Dependability Coefficient and ICC(2,k)  The dependability coefficient of Generalizability Theory is equivalent to ICC(2,k) and both are measures of absolute agreement  ICC(2,1) can be calculated directly from variance components estimates and is equal to the traditional use of the Correlation Ratio as a measure of amount of variance accounted for.

Summary of Results of ICC and Generalizability Theory Comparisons Intraclass Coefficients*Generalizability Theory Coefficients + K=1K>1K=1K>1 Consistency ICC(3,1)=.71 ICC(3,k)=.91 Var.components =.7148 Generalizability =.9093 Absolute Agreement ICC(2,1)=.29 ICC(2,k)=.62 Var.components =.2897 Dependability =.6200 *Values taken from Shrout & Fleiss, Values calculated from GENOVA output

Intraclass and Generalizability Coefficients  Intraclass Correlation Coefficients are special cases of the one-facet generalizability study (Shrout & Fleiss, 1979)  The ICC(2,1), ICC(2,k), ICC(3,1), and ICC(3,k) intraclass correlations discussed by Shrout and Fleiss can be calculated from generalizability software (eg., Genova).

Levels of the Family Tree Level 1. Study Aim Level 2. # Study Factors Level 3. # Levels within Study Factors

Levels of the Family Tree Level 3. Number of Levels within Study Factors Intraclass Coefficients Two Level Designs Multilevel Designs Co-dependency Correlations Multilevel ICCs Generalizability Theory Coefficients Two Level Designs Multilevel Designs Historically no distinction made

Levels of the Family Tree Level 1. Study Aim Level 2. # Study Factors Level 3. # Levels within Study Factors Level 4. Score Standardization

Levels of the Family Tree Level 4. Score Standardization Co-dependency Measures Standardized Scores Raw or Partially Standardized Scores Pearson Product Moment Correlation Intraclass Correlations

Standardized Correlation Ratios Pearson Correlation = =

The Correlation Ratio and Pearson Produce Moment Correlation  When subject scores are standardized within rater, the Pearson Product Moment Correlation is equal to the Correlation ratio, when  2 is defined in terms of total variance  A generalized Product Moment Correlation can be defined across all raters simultaneously using the variance components calculated on standard scores Correlation Ratio (  2 )

Product Moment Correlations Rater Variance components estimate (  2 ) of rater 1 vs rater 3 reliability based on Z-scores =.7448

Multi-level Product Moment Correlation Calculated by standardizing scores within judges then calculating  2 using total variance components definition. For Shrout & Fleiss data this value =.7602 and represents global standardized consistency rating.

Conclusions  The concept of a correlation ratio relates effect size measures to reliability measures  ICCs are Generalizability Theory coefficients for single facet designs  ICC(3,1), ICC(3,k), and the Generalizability Coefficient are all measures of consistency  ICC(2,1), ICC(3,k), and the Dependability Coefficient are all measures of absolute agreement

Conclusions  The Pearson Product Moment Correlation is a single-facet, 2-level Correlation Ratio for standard scores and is, thus, a measure of consistency.  A multilevel Product Moment Correlation is a single- facet, k-level Correlaiton Ratio for standard scores and is a measure of standardized consistency across all raters.

END