The psychometrics of Likert surveys: Lessons learned from analyses of the 16pf Questionnaire Alan D. Mead.

Slides:



Advertisements
Similar presentations
MEASUREMENT Goal To develop reliable and valid measures using state-of-the-art measurement models Members: Chang, Berdes, Gehlert, Gibbons, Schrauf, Weiss.
Advertisements

Test Development.
How to Make a Test & Judge its Quality. Aim of the Talk Acquaint teachers with the characteristics of a good and objective test See Item Analysis techniques.
Developing a Questionnaire
Fit of Ideal-point and Dominance IRT Models to Simulated Data Chenwei Liao and Alan D Mead Illinois Institute of Technology.
Topic 4B Test Construction.
Testing What You Teach: Eliminating the “Will this be on the final
Item Response Theory in Health Measurement
Surveys and Questionnaires. How Many People Should I Ask? Ask a lot of people many short questions: Yes/No Likert Scale Ask a smaller number.
Seven possibly controversial but hopefully useful rules Paul De Boeck K.U.Leuven.
A Top Ten List of Measurement-related Erros Alan D. Mead Illinois Institute of Technology.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Now that you know what assessment is, you know that it begins with a test. Ch 4.
Measurement and Data Quality
Part #3 © 2014 Rollant Concepts, Inc.2 Assembling a Test #
Instrumentation.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Chapter Eight The Concept of Measurement and Attitude Scales
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
” Interface” Validity Investigating the potential role of face validity in content validation Gábor Szabó, Robert Märcz ECL Examinations EALTA 9 - Innsbruck,
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Using Measurement Scales to Build Marketing Effectiveness CHAPTER ten.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Basic concepts in measurement Basic concepts in measurement Believe nothing, no matter where you read it, or who said it, unless it agrees with your own.
Item Factor Analysis Item Response Theory Beaujean Chapter 6.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Item Response Theory in Health Measurement
Chapter 14: Affective Assessment
Chapter 6 - Standardized Measurement and Assessment
EVALUATION SUFFECIENCY Types of Tests Items ( part I)
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
Objective Examination: Multiple Choice Questions Dr. Madhulika Mistry.
Assessment in Education ~ What teachers need to know.
Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.
Quantitative Methods in the Behavioral Sciences PSY 302
Survey Methodology Reliability and Validity
Using questionnaires in research: The statistical aspects
Sample Power No reading, class notes only
Selecting the Best Measure for Your Study
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
PSYCHOMETRIC TESTS.
Test-Taking Strategies
MEASUREMENT: RELIABILITY AND VALIDITY
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.
Understanding Results
The Impact of Item Response Theory in Educational Assessment: A Practical Point of View Cees A.W. Glas University of Twente, The Netherlands University.
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
Writing Survey Questions
III Choosing the Right Method Chapter 10 Assessing Via Tests
Reliability and Validity of Measurement
Reliability, validity, and scaling
Test Development Test conceptualization Test construction Test tryout
III Choosing the Right Method Chapter 10 Assessing Via Tests
Mohamed Dirir, Norma Sinclair, and Erin Strauts
SURVEYS VERSUS INSTRUMENTS
Secondary Data Analysis Lec 10
What is the nature of descriptive measures to gather data?
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
Gathering data, cont’d Subjective Data Quantitative
Starter: 1. Suggest two more pieces of observational data that could be collected by the psychologist, one qualitative and one quantitative.    2. One.
REVIEW I Reliability scraps Index of Reliability
Chapter 8 VALIDITY AND RELIABILITY
MEASUREMENT AND QUESTIONNAIRE CONSTRUCTION:
The Concept of Measurement and Attitude Scales
Presentation transcript:

The psychometrics of Likert surveys: Lessons learned from analyses of the 16pf Questionnaire Alan D. Mead

Measures are just measures, right? One measurement theory (CTT or IRT or g theory, etc.) for all types of measures I thought so, before I started teaching at IIT But the “devil’s in the details” and these details vary considerably “Getting started” and Item writing Item analysis Implementation I will use the 16pf as an example; most of these points are true for all Likert instruments

16pf6 Planned Improvements Shorter, more reliable Likert scales Adaptive Reasoning/B scale Improved inattention validity index Updated norms Measuring the same factor structure as 16pf5 Gender neutral reports Even better support for organizational applications

Everything we know about writing personality/Likert items Get it? This page is BLANK! (Funny joke, but not actually a laughing matter!)

Seriously, what we know about writing personality/Likert items Most item writing guidance is (explicitly) designed for ability testing: Rule 1: Use either the best answer format or the correct answer format Rule 11: Avoid cuing one item with another; keep items independent of one another Many guidelines are trivial/common sense Rule 4: Allow time for editing and other types of item revisions Rule 5: Use good grammar, punctuation, and spelling Some guidelines are bad advice Finally, a very few guidelines actually apply Existing guidance either ignores Likert-specific issues…

Seriously, what we know about writing personality/Likert items Most item writing guidance is (explicitly) designed for ability testing: Many guidelines are trivial/common sense Some guidelines are bad advice Rule 22: Word the stem positively; avoid negative phrasing. Finally, a very few guidelines actually apply Rule 35: Avoid specific determiners, such as “never” and “always” Existing guidance either ignores Likert-specific issues or is based on weak empiricism (“So-and-so found that 4-options scales were optimal [by some unstated criteria], so use those.”)

Scoring and Item Analysis Generally, Likert items have “directions” rather than correct answers Consider these Emotional Stability items: “I rarely feel blue” (positively worded) “I have days where I just cannot cope.” (negatively worded) The “direction” would be reversed for Neuroticism These “are” both Emotional Stability and Neuroticism items

Scoring and Item Analysis (cont.) Several unique aspects of item analysis Item mean is not very important Corrected item-total correlation (CITC) is critical CITC should be greater than cross-scale correlations Factor analysis (item loading) may be important Greater concern about item homogeneity Scales tend to be far shorter (3 items!?!) But individual items are informative But may preclude analyses requiring long scales

IRT Analysis Much greater variety of statistical models Linear models (SEM) and non-linear models (IRT) Various ordinal models (GRM, GPCM) Nominal polytomous models (Nominal model) Multidimensional models Ideal-point models (above are all dominance models) Calibration issues Larger N required for polytomous models Some response options cannot be fit Short scales preclude asymptotic chi-square tests of fit

IRT Analysis (cont.) Items have far greater Information Which increases with more response points (to some limit) Perversely, Information is less important Item have wide Information Less motivation to use “IRT scoring” Item “difficulty” not a significant issue CTT Likert scoring is generally good enough Short scales may damage theta-hat estimates

Administration Traditional CAT has far less value MCAT makes more sense Strong preference for multiple items/screen More formatting choices Traditional horizontal “scale” format; Vertical Technologically enhanced formats like sliders Choice of response scales How many points? (4-7?) Include a midpoint? Forced-choice (“team player” or “talkative”?) Choice of anchors (agreement, frequency, etc.)

Usage and Validity Reporting issues “Score security” (differing threats) Effects of bi-polarity Validity Greater care must be taken to ensure content validity Criterion-related validities are more modest Construct validity may be more important

Questions? amead@alanmead.org Thank You! Questions? amead@alanmead.org