Download presentation
Presentation is loading. Please wait.
Published byHarold Douglas Modified over 9 years ago
1
Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University
2
Overview A.Psychometrics B.Classical Test Theory C.Reliability D.Validity
3
A. Psychometrics Psychological measurement Reliability Validity Tests Items (Jones & Thissen, 2007; Kaplan & Saccuzzo, 2012)
4
B. Classical Test Theory Foundation for Reliability (Kline, 2005)
5
For those who like pictures…
6
Proportion of True to Observed Score BDI Score (X) Depression Level (True Score) Observed Score
7
BDI Score (X) Measurement Error (E) Depression Level (True Score)
8
Adding it up… Depression Level (True score) Error Depression Level + Measurement Error Observed Score
9
C. Reliability What does it mean to be reliable? Consistency of scores over time, across test forms, or across variable testing conditions Types of Reliability Test-Retest Inter-item (internal) Inter-rater (Anastasi, 1988)
10
C.1. Test-Retest Reliability Are test scores stable over time? Give test to same group at 2 points in time and correlate test scores Must consider stability of construct when establishing test-retest interval interpreting test-retest correlation
11
C.2. Internal (inter-item) Consistency Assumption: A composite score has to be made up of items that are measuring the same phenomenon Heterogenous items will produce a lower internal consistency reliability coefficient Measures of internal consistency: Split Half Cronbach’s Alpha (coefficient α) Kuder Richardson-20 (KR20; for dichotomous items) (Pedhazur & Schmelkin, 1991)
12
Interpreting Reliability Coefficients What is a reasonable level of reliability? Research ≥.80 Clinical ≥.90 Factors to consider when evaluating a reliability coefficient: Stability of construct Dimensional nature of construct (uni- vs. multi-) Number of items (short tests are less reliable)
13
C.3. Inter-Rater Reliability Accuracy (consistency) with which different raters arrive at the same scores Extremely important for tests that require any rater judgment (eg, WAIS vocabulary) Agreement is computed with Kappa statistic Ranges from -1.0 - +1.0 K = 1.0 perfect agreement, 0 chance agreement, -1.0 less than chance agreement.40 -.75 “fair” >.75 “excellent” (Fleiss, 1981)
14
D. Validity If something is valid, what does that mean? Validity: degree to which a test measures that which it purports to measure Types Content Criterion-related Construct
15
D.1. Content Validity How well does the instrument sample from the domain of interest? Lack of adequate item sampling can lead to invalid findings Examples GBQ (see p. 144 of article) WAIS Assess with Expert raters
16
D.2. Criterion-Related Validity Does the test score correlate with other measures as we would expect? Concurrent validity: test score relates to a criterion measured at the same time Predictive validity: test score predicts a future criterion Validity coefficient: correlation coefficient between test score and criterion measure
17
D.3. Construct Validity Is there evidence that the measure adequately assesses the construct of interest? Do test scores change over time or as a result of certain events, as theorized? Are items homogeneous, or do certain items “hang together?” (Factor Analysis)
18
Factor Analysis Statistical method for examining underlying constructs (latent traits) within a test Uses correlation matrices to identify underlying relationships among test items Example: GBQ
19
Overview Psychometrics Psychological measurement Classical Test Theory Reliability Test-Retest Inter-item (internal) Inter-rater Validity Content Criterion-related Construct (Trochim, 2006)
20
Resources Software SPSS PSPP R Videos Educator.com CLI: Research Seminars Andy Field Websites Social Research Methods Institute for Digital Research and Education Statistics Help for Students Stat Pages
21
References Anastasi, A. (1988). Psychological testing (6th ed.). New York, NY: MacMillan. Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York, NY: John Wiley & Sons. Jones, L. V., & Thissen, D. (2007). A history and overview of psychometrics. Handbook of statistics, 26, 1-28. Kaplan, R., & Saccuzzo, D. (2012). Psychological testing: Principles, applications, and issues. Belmont, CA: Cengage Learning. Kline, T. J. B. (2005). Classical test theory: Assumptions, equations, limitations, and item analyses. In T. J. B. Kline, Psychological testing: A practical approach to design and evaluation (pp. 91-106). Thousand Oaks, CA: Sage. Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design and analysis: An integrated approach. Hillsdale, NJ: Lawrence Earlbaum. Trochim, W. M. K. (2006). Reliability and validity. Retrieved from http://www.socialresearchmethods.net/kb/relandval.php
22
Questions tim.steenbergh@indwes.edu christopherdevers@gmail.com EdProfessor.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.