Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010.

Slides:



Advertisements
Similar presentations
Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Advertisements

Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
The Research Consumer Evaluates Measurement Reliability and Validity
Reliability and Validity
VALIDITY AND RELIABILITY
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Test Validity: What it is, and why we care.
MEASUREMENT. Measurement “If you can’t measure it, you can’t manage it.” Bob Donath, Consultant.
Concept of Measurement
Beginning the Research Design
Reliability and Validity
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Measurement and Data Quality
Validity and Reliability
Experimental Research
Instrumentation.
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
MGTO 324 Recruitment and Selections Validity I (Construct Validity) Kin Fai Ellick Wong Ph.D. Department of Management of Organizations Hong Kong University.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Tests and Measurements Intersession 2006.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Measurement Validity.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Measurement Issues General steps –Determine concept –Decide best way to measure –What indicators are available –Select intermediate, alternate or indirect.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
MEASUREMENT: PART 1. Overview  Background  Scales of Measurement  Reliability  Validity (next time)
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Measurement and Scaling Concepts
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
MGMT 588 Research Methods for Business Studies
Evaluation of measuring tools: validity
Reliability & Validity
Reliability and Validity of Measurement
Measurement Concepts and scale evaluation
Presentation transcript:

Crash Course in Psychometric Theory David B. Flora SP Area Brownbag February 8, 2010

 Research in social and personality psychology is about abstract concepts of theoretical importance, called “constructs.”  Examples include “prejudice,” “self-esteem,” “introversion,” “forgiveness,” and on and on…  The success of a research study depends on how well constructs of interest are measured.  The field of “Test Theory” or “Psychometrics” is concerned with the theory and accompanying research methods for the measurement of psychological constructs.

 Psychometric theory evolved from the tradition of intelligence, or “mental ability”, testing.  Spearman (1904) invented factor analysis to aid in the measurement of intelligence.  The psychophysics tradition is also foundational to psychometric theory, as per Thurstone’s (1928) law of comparative judgment for scaling of social stimuli.  A test question is a stimulus; the answer to the question is a behavioural response to the stimulus.

Classical True Score Model x i = t i + e i x i is the observed value for person i from an operationalization of a construct (e.g., a test score). t i is that person’s true score on the construct. e i is measurement error. The variable t is a latent variable: An unobservable variable that is measured by the observable variable x.

 Lord & Novick’s (1968) preferred definition of the true score (paraphrased): For a given person, there is a “propensity” distribution of possible outcomes of a measurement that reflects the operation of processes such momentary fluctuations in memory and attention or in strength of an attitude. The person’s true score is the mean of this propensity distribution. Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores.

Validity x i = t i + e i ort i = x i  e i  Validity denotes the scientific utility of the scores, x, obtained with a measuring instrument (i.e., a test).  But there is more to it than just the size of e i.  Validity is mostly concerned with whether x measures the t that we want it to…  Note that validity is a property of the scores obtained from a test, not the test itself.

Nunnally & Bernstein (1994), Psychometric Theory (3 rd ed.), p. 84: “Validation always requires empirical investigations, with the nature of the measure and form of validity dictating the needed form of [empirical] evidence.” “Validation usually is a matter of degree rather than an all-or-none property, and validation is an unending process.” “Strictly speaking, one validates the use to which a measuring instrument is put rather than the instrument itself. Tests are often valid for one purpose but not another.”

You may have heard of  Internal validity  External validity  Face validity  Content validity  Construct validity  Criterion validity  Predictive validity  Postdictive validity  Concurrent validity  Factorial validity  Convergent validity  Discriminant validity  Incremental validity  Ecological validity

Standards  Standards for Educational and Psychological Testing (1966; 1974; 1985; 1999) is developed jointly by AERA, APA, and NCME.  The Standards view validity as a unitary concept.  Rather than there being separate types of validity, there are three main types of validity evidence. 1. Content-related evidence 2. Construct-related evidence 3. Criterion-related evidence

Content-related validity evidence  Content validity refers to the extent to which a set of items (or stimuli) adequately reflects a content domain.  E.g., selection of vocabulary words for Grade 6 vocabulary test from the domain of all words taught to 6 th graders.  Evidence is based on theoretical judgment.  Same as face validity? - self-report judgment of overall health

Construct-related validity evidence  Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests.  Mainly concerned with associations between test scores and other variables that are dictated by theory.  Multi-trait multi-method correlation matrix (Campbell & Fiske, 1959): Is the test strongly correlated with other measures of the same construct? (convergent validity) Is the test less strongly correlated with measures of different constructs than with measures of the same construct? (discriminant validity)

Floyd & Widaman (1995), p. 287:  “Construct validity is supported if the factor structure of the [instrument] is consistent with the constructs the instrument purports to measure.”  “If the factor analysis fails to detect underlying constructs [i.e., factors] that explain sufficient variance in the [items] or if the constructs detected are inconsistent with expectations, the construct validity of the scale is compromised.” Floyd, F. J., & Widaman, K. F. (1995). Factor analysis in the development and refinement of clinical assessment instruments. Psychological Assessment, 7,

Criterion-related validity evidence  Evidence is based on empirical association with some important “gold standard” criterion.  Encompasses predictive and concurrent validity.  Difficult to distinguish from construct validity - Theoretical reason for association is critical for construct validity, less important for criterion validity.  E.g., relationship between a stress measure and physical health?

Do we really need your new scale? Does it have incremental validity? “Incremental validity is defined as the degree to which a measure explains or predicts a phenomenon of interest, relative to other measures. Incremental validity can be evaluated on several dimensions, such as sensitivity to change, diagnostic efficacy, content validity, treatment design and outcome, and convergent validity.” Haynes, S. N., & Lench, H. (2003). Incremental validity of new clinical assessment measures. Psychological Assessment, 15,

Reliability  Reliability is necessary, but not sufficient, for construct validity.  Lack of reliability (i.e., measurement error) introduces bias in analyses and reduces statistical power.  What exactly is reliability? x i = t i + e i Reliability = Var(t i ) / Var(x i ) Reliability is the proportion of true score variance to total observed variance.

 Since we can’t directly observe Var(t i ), we must turn to other methods for estimating reliability…  Parallel-forms reliability  Split-half reliability  Internal consistency reliability (coefficient alpha)  Test-retest reliability  Inter-rater reliability Each is an estimate of the proportion of true score variability to total variability.

Coefficient alpha (  )  Original formula actually given by Guttman (1945), not Cronbach (1951)!  An average of all inter-item correlations, weighted by the number of items, k:  The expected correlation of one test with an alternate form containing the same number of items.

Coefficient alpha (  )  The more items, the larger .  A high  does NOT imply unidimensionality (i.e., that items all measure a single factor).   is a lower-bound estimate of true reliability…

How does factor analysis fit in? “Common factor model” for a “congeneric” set of items measuring a single construct: x ij = j f i + u ij x ij is the jth item on a multi-item test f i is the common factor score on the factor, or latent variable for person i. j is the factor loading of test item j. j is the factor loading of test item j. u ij is the factor score unique factor j for person i. It represents a mixture of systematic influence on random error influence on item x: u ij = (s ij + e ij ) u ij = (s ij + e ij )

 If we define t ij = j f i and assume that the systematic unique influence is negligible, so that u ij ≈ (0 + e ij )…  …then the common factor model gives the Classical True Score model for scores on item j: x ij = j f i + u ij x ij = t i + e ij  Coefficient  will be underestimated to the extent that the factor loadings, j, vary across items.  More accurate reliability estimates can be calculated using the factor loadings. -Perspective shifts from internal consistency to latent variable relationship

Tangential things you should know…  Principal components analysis (PCA) is NOT factor analysis. When you run a PCA, you are NOT estimating the common factor model.  Situations where PCA is appropriate are quite rare in social and personality psychology.  The Pearson product-moment correlation is often NOT adequate for describing the relationships among item- level categorical variables!  When factor analyzing items, we should usually use something other than product-moment correlations.  One approach is to analyze polychoric correlations.

Modern Psychometric Theory  Another approach that properly models item- level variables as categorical is Item Response Theory (IRT).  IRT represents a collection of models for relating individual items within a test or scale to the latent variable(s) they measure.  IRT leads to test scores with smaller measurement error than traditional item sums or means.

IRT  The properties of each item are summarized with an item characteristic curve (ICC).  The slope of the curve indicates item discrimination, i.e., the strength of relationship between the item and the latent construct.  The horizontal location of the curve indicates item difficulty or severity.

 X-axis, “theta,” represents latent trait or construct.  Y-axis represents probability of a positive item response. Item characteristic curves (ICCs) for four binary items with equal discrimination but varying “difficulty.”

Item characteristic curves (ICCs) for four binary items with varying discrimination and varying difficulty  Items 1 and 2 have stronger discrimination than 3 and 4.  Item 1 has the lowest difficulty, item 4 the highest.

 A “test information function”  Shows precision of measurement as a function of latent trait level

IRT scores  Scale scores constructed using IRT - take into account item discrimination, whereas simple sum (or mean) scores assume all items measure the construct equally well - have a proper interval scale of measurement, whereas simple sum scores are typically ordinal, strictly speaking - have measurement error that varies across the range of the construct, whereas simple sum scores assume a single reliability value for the whole range

The big picture  IRT was often presented as an alternative approach to test theory at odds with classical test theory (CTT).  Current perspective is that CTT and IRT complement and enhance each other. -For example, the mathematical link between IRT and factor analysis is now well understood.  A well validated test will still produce scores with measurement error.  Ideas from CTT, IRT, and structural equation modeling can be implemented to produce powerful results that account for measurement error, thus modeling relationships among the constructs themselves rather than the operational variables.