Presentation is loading. Please wait.

Presentation is loading. Please wait.

JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.

Similar presentations


Presentation on theme: "JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability."— Presentation transcript:

1 JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability

2 Background & Overview Piloted since 2003-2004 Began with our own version of training & calibration Overarching question: Scorer interrater reliability  i.e., Are we confident that the TE scores our candidates receive are consistent from one scorer to the next?  If not -- the reasons? What can we do out it? Jenna: Analysis of data to get these questions Then we’ll open it up to “Now what?” “Is there anything we can/should do about it?” “Like what?”

3 Presentation Overview Is there interrater reliability among our PACT scorers? How do we know? Two methods of analysis Results What do we do about it? Implications

4 Data Collection 8 credential cohorts – 4 Multiple subject and 4 Single subject 181 Teaching Events Total – 11 rubrics (excludes pilot Feedback rubric) – 10% randomly selected for double score – 10% TEs were failing and double scored 38 Teaching Events were double scored (20%)

5 Scoring Procedures Trained and calibrated scorers  University Faculty  Calibrate once per academic year Followed PACT Calibration Standard-Scores must:  Result in same pass/fail decision (overall)  Have exact agreement with benchmark at least 6 times  Be within 1 point of benchmark All TEs scored independently once  If failing, scored by second scorer and evidence reviewed by chief trainer

6 Methods of Analysis Percent Agreement  Exact Agreement  Agreement within 1 point  Combined (Exact and within 1 point) Cohen’s Kappa (Cohen, 1960)  Indicates percentage of agreement accounted for from raters above what is expected by chance Cohen (1960). Cohen, J. (1960). A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20 (1), pp.37–46.

7 Percent Agreement Benefits  Easy to Understand Limitations  Does not account for chance agreement  Tends to overestimate true agreement (Berk, 1979; Grayson, 2001). Berk, R. A. (1979). Generalizability of behavioral observations: A clarification of interobserver agreement and interobserver reliability. American Journal of Mental Deficiency, 83, 460-472. Grayson, K. (2001). Interrater Reliability. Journal of Consumer Psychology, 10, 71-73.

8 Cohen’s Kappa Benefits  Accounts for chance agreement  Can be used to compare across different conditions (Ciminero, Calhoun, & Adams, 1986). Limitations  Kappa may decrease if low base rates, so need at least 10 occurances (Nelson and Cicchetti, 1995) Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (Eds.). (1986). Handbook of behavioral assessment (2nd ed.). New York: Wiley. Nelson, L. D., & Cicchetti, D. V. (1995). Assessment of emotional functioning in brain impaired individuals. Psychological Assessment, 7, 404–413.

9 Kappa coefficient Kappa= Proportion of observed agreement – chance agreement 1- chance agreement Coefficient ranges from -1.0 (disagreement) to 1.0 (perfect agreement) Altman, D.G. (1991). Practical Statistics for Medical Research. London: Chapman and Hall. Fleiss, J. L. (1981). Statistical methods for rates and proportions. NY:Wiley. Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-174.

10 Percent Agreement

11

12 Pass/Fail Disagreement

13 Cohen’s Kappa

14 Interrater Reliability Compared

15 Implications Overall interrater reliability poor to fair  Consider/reevaluate protocol for calibration  Calibration protocol may be interpreted differently  Drifting scorers? Use multiple methods to calculate interrater reliability Other?

16 How can we increase interrater reliability? Your thoughts...  Training protocol  Adding “Evidence Based” Training- Jeanne Stone, UCI  More calibration

17 Contact Information Jenna Porter  jmporter@csus.edu jmporter@csus.edu David Jelinek  djelinek@csus.edu djelinek@csus.edu


Download ppt "JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability."

Similar presentations


Ads by Google