Alpha to Omega and Beyond! Presented by Michael Toland Educational Psychology & Dominique Zephyr Applied Statistics Lab.

Slides:



Advertisements
Similar presentations
Reliability and Validity of Researcher-Made Surveys.
Advertisements

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
Reliability, the Properties of Random Errors, and Composite Scores.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
A quick introduction to the analysis of questionnaire data John Richardson.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Comparison of Reliability Measures under Factor Analysis and Item Response Theory —Ying Cheng , Ke-Hai Yuan , and Cheng Liu Presented by Zhu Jinxin.
Education 795 Class Notes Factor Analysis II Note set 7.
LECTURE 16 STRUCTURAL EQUATION MODELING.
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Classical Test Theory By ____________________. What is CCT?
Standard error of estimate & Confidence interval.
Bootstrapping applied to t-tests
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
LECTURE 6 RELIABILITY. Reliability is a proportion of variance measure (squared variable) Defined as the proportion of observed score (x) variance due.
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Instrumentation.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Estimation of Statistical Parameters
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
1 G Lect 7M Statistical power for regression Statistical interaction G Multiple Regression Week 7 (Monday)
1 Conceptual Issues in Observed-Score Equating Wim J. van der Linden CTB/McGraw-Hill.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
G Lecture 7 Confirmatory Factor Analysis
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
G Lecture 91 Measurement Error Models Bias due to measurement error Adjusting for bias with structural equation models Examples Alternative models.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire Design and Testing.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Item Response Theory in Health Measurement
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
 Youth Teasing and Bullying are a major public health problem  ~20% of youths report being bullied or bullying at school in a given year  160,000.
Multitrait Scaling and IRT: Part I Ron D. Hays, Ph.D. Questionnaire.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
Psych 231: Research Methods in Psychology
Classical Test Theory Margaret Wu.
Reliability & Validity
Reliability, validity, and scaling
Evaluation of measuring tools: reliability
By ____________________
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Psychological Measurement: Reliability and the Properties of Random Errors The last two lectures were concerned with some basics of psychological measurement:
Psy 425 Tests & Measurements
Multitrait Scaling and IRT: Part I
UCLA Department of Medicine
Reliability and validity
Presentation transcript:

Alpha to Omega and Beyond! Presented by Michael Toland Educational Psychology & Dominique Zephyr Applied Statistics Lab

Hypothetical Experiment Suppose you measured a person’s perceived self-efficacy with the general self-efficacy scale (GSES) 1,000 times Suppose the measurements we observe vary between 13 and 17 points The person’s perceived self-efficacy score has seemingly remained constant, yet the measurements fluctuate The problem is that it is difficult to get at the true score because of random errors of measurement

True Score vs. Observed Score An observed score is made up of 2 components Observed Score = True score + Random Error True score is a person’s average score over repeated (necessarily hypothetical) administrations The 1,000 test scores are observed scores

Reliability Estimation: Classical Test Theory (CTT) Across people we define the variability among scores in a similar way S X 2 = S T 2 + S E 2 Reliability is the ratio of true-score variance relative to its observed-score variance According to CTT  xx’ = σ 2 T /σ 2 x Unfortunately, the true score variability can’t be measured directly. So, we have figured out other ways of estimating how much of a score is due to true score and measurement error

Concept of Reliability Reliability is not an all-or-nothing concept, but there are degrees of reliability High reliability tells us if people were retested they would probably get similar scores on different versions of a test It is a property of a set of test scores – not a test But we are not interested in just performance on a set of items

Why is reliability important? Reliability affects not only observed score interpretations, but poor reliability estimates can lead to deflated effect size estimates nonsignificant results When something is more reliable it is closer to the true score The more we can differentiate among individuals of different levels When something is less reliable it is further away from the true score The less it differentiates among individuals

Measurement Models Under CTT AssumptionsParallel Tau Equivalent Essentially Tau-EquivalentCongeneric Unidimensional YYYY Equal item covariances YYY Equal item-construct relations YYY Equal item variance YY Equal item error variance Y Alpha (  ) Cronbach (1951) Omega (  ) McDonald (1970, 1999)

Coefficient Alpha cov avg = average covariance among items k = the number of items ScaleV = scale score variance

Why have we been using alpha for so long? Simple equation Easily calculated in standard software (default) Tradition Easy to understand Researchers are not aware of other approaches

Problems with alpha (  ) Unrealistic to assume all items have same equal item-construct relation and item covariances are the same Underestimates population reliability coefficient when congeneric model assumed

What do we gain with Omega? Does not assume all items have the same item-construct relations and equal item covariances (assumptions relaxed) More consistent (precise) estimator of reliability Not as difficult to estimate as folks come to believe

One-Factor CFA Model Item3 33 1 Item2 22 1 Item1 11 1 Item6 66 Item5 55 Item4 4 Perceived General Self-Efficacy

Coefficient Omega i = factor pattern loading for item i k = the number of items  ii = unique variance of item I Assumes latent variance is fixed at 1 within CFA framework

Include CI along Reliability Point Estimate Measures a range that estimates the true population value for reliability, while acknowledging the uncertainty in our reliability estimate Recommended by APA and most peer reviewed journals

Mplus input for alpha (  )

Mplus output for alpha (  )

Mplus input for omega

Mplus output for omega

APA style write-up for coefficients with CIs  =.61, Bootstrap corrected [BC] 95% CI [.56,.66]  =.62, Bootstrap corrected [BC] 95% CI [.56,.66]

Limitations of CTT Reliability Coefficients and Future QIPSR Talks Although a better estimate of reliability than alpha, CTT still assumes a constant amount of reliability exists across the score continuum However, it is well known in the measurement community that reliability/precision is conditional on a person’s location along the continuum Modern measurement techniques such as Item Response Theory (IRT) do not make this assumption and focus on items instead of the total scale itself

References Revelle, W., Zinbarg, R. E. (2008). Coefficients Alpha, Beta, Omega, and the glb: Comments on Sijtsma. Psychometrika, 74, doi: /s z Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide. Practical Assessment, Research and Evaluation, 17. Retrieved from: Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of coefficients alpha and theta for Likert rating scales. Journal of Modern Applied Statistical Methods, 6. Retrieved from: Starkweather, J. (2012). Step out of the past: Stop using coefficient alpha; there are better ways to calculate reliability. Benchmarks RSS Matters Retrieved from: Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach's alpha. Psychometrika, 74, doi: /s Peters., G-J. Y. (2014). The alpha and the omega of scale reliability and validity: Why and how to abandon Cronbach's alpha and the route towards more comprehensive assessment of scale quality. The European Health Psychologist, 16, 56–69. Retrieved from: Crutzen, R. (2014). Time is a jailer: What do alpha and its alternatives tell us about reliability?. The European Health Psychologist, 16, Retrieved from: Dunn, T., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, doi: /bjop Geldhof, G., Preacher, K. J., & Zyphur, M. J. (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19, doi: /a

Acknowledgements APS Lab members Angela Tobmbari Zijia Li Caihong Li Abbey Love Mikah Pritchard