Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Marshall Modified over 9 years ago
1
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability
3
Aim of co-judge procedure, to discern: Consistency within coder Consistency between coders Take care when making inferences based on little information, Phenomena impossible to code become missing values Interrater reliability
4
Percent agreement: Common but not recommended Cohen’s kappa coefficient Kappa is the proportion of the optimum improvement over chance attained by the coders, 1 = perfect agreement, 0 = agreement is no better than that expected by chance, -1 = perfect disagreement Kappa’s over.40 are considered to be a moderate level of agreement (but no clear basis for this “guideline”) Correlation between different raters Intraclass correlation. Agreement among multiple raters corrected for number of raters using Spearman-Brown formula ( r ) Interrater reliability
5
Percent exact agreement = Number of observations agreed on Total number of observations Interrater reliability of categorical IV (1) Categorical IV with 3 discreet scale-steps 9 ratings the same % exact agreement = 9/12 =.75
6
Interrater reliability of categorical IV (2) unweighted Kappa Kappa: Positive values indicate how much the raters agree over and above chance alone Negative values indicate disagreement If agreement matrix is irregular Kappa will not be calculated, or misleading
7
Interrater reliability of categorical IV (3) unweighted Kappa in SPSS CROSSTABS /TABLES=rater1 BY rater2 /FORMAT= AVALUE TABLES /STATISTIC=KAPPA /CELLS= COUNT /COUNT ROUND CELL.
8
Interrater reliability of categorical IV (4) Kappas in irregualar matrices If rater 2 is systmatically “above” rater 1 when coding an ordinal scale, Kappa will be misleading possible to “fill up” with zeros K =.51K = -.16
9
Interrater reliability of categorical IV (5) Kappas in irregular matrices If there are no observations in some row or column, Kappa will not be calculated possible to “fill up” with zeros K not possible to estimate K =.47
10
Interrater reliability of categorical IV (6) weighted Kappa using SAS macro PROC FREQ DATA = int.interrater1 ; TABLES rater1 * rater2 / AGREE; TEST KAPPA; RUN; Papers and macros available for estimating Kappa when unequal or misaligned rows and columns, or multiple raters:
11
Interrater reliability of continuous IV (1) Average correlation r = (.873 +.879 +.866) / 3 =.873 Coders code in same direction!
12
Interrater reliability of continuous IV (2)
13
Interrater reliability of continuous IV (3) Design 1 one-way random effects model when each study is rater by a different pair of coders Design 2 two-way random effects model when a random pair of coders rate all studies Design 3 two-way mixed effects model ONE pair of coders rate all studies
14
Comparison of methods (from Orwin, p. 153; in Cooper & Hedges, 1994) Low Kappa but good AR when little variability across items, and coders agree
15
Interrater reliability in meta-analysis and primary study
16
Meta-analysis: coding of independent variables How many co-judges? How many objects to co-judge? (sub-sample of studies, versus sub-sample of codings) Use of “Golden standard” (i.e., one “master-coder”) Coder drift (cf. observer drift): are coders consistent over time? Your qualitative analysis is only as good as the quality of your categorisation of qualitative data Interrater reliability in meta-analysis vs. in other contexts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.