Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in Higher Education (AALHE) Conference, Lexington, Kentucky, June 3, 2013 Dr. Yan Zhang Cooksey University of Maryland University College

Outline of Today’s Presentation Background and purposes of the full-day grading project Procedural methods of the project Discuss the results and decisions informed by the assessment findings Lessons learned through the process

Purposes of the Full-day Grading Project To simplify the current assessment process To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

UMUC Graduate School Previous Assessment Model: 3-3-3 Model

Previous Assessment Model: 3-3-3 Model (Cont.)

Strengths:Weaknesses: Tested rubrics Added faculty workload Reasonable collection points Lack of consistency in assignments Larger samples - more data for analysis Variability in applying scoring rubrics

C2 Model: Common activity & Combined rubric

Compare 3-3-3 Model to (new)C2 Model Current 3-3-3 ModelCombined Activity/Rubric (C2) Model Multiple Rubrics: one for each of 4 SLEs Single rubric for all 4 SLEs Multiple assignments across graduate school Single assignment across graduate school One to multiple courses/4 SLEs Single course/4 SLEs Multiple raters for the same assignment/course Same raters/assignment/course Untrained raters Trained raters

Purposes of the Full-day Grading Project To simplify the current assessment process To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

Procedural Methods of the Grading Project Data Source Rubric Experimental design for data collection Inter-rater reliability

Procedural Methods of the Grading Project (Cont.) Data Source (student papers/redacted) Course name# of Papers BTMN904027 BTMN904129 BTMN90807 DETC6309 MSAF67020 MSAS67013 TMAN68016 Total 121

Procedural Methods of the Grading Project (Cont.) Common Assignment Rubric (rubric design and refinement) 18 Raters (faculty members)

Procedural Methods of the Grading Project (Cont.) Experimental design for data collection  randomized trial (Group A&B)  raters’ norming and training  grading instruction

Procedural Methods of the Grading Project (Cont.) Inter-rater reliability (literature) SStemler (2004): in any situation that involves judges (raters), the degree of inter-rater reliability is worthwhile to investigate, as the value of inter-rater reliability has significant implication for the validity of the subsequent study results. IIntraclass Correlation Coefficients (ICC) were used in this study.

Results and Findings Two-sample t-test Group Statistics Group #NMean Std. Deviation Std. Error Mean Differ_Rater1and2 Group A-Experiment Group 483.2491.0860.0494 Group B-Control Group 540.0241.2463.0536

Results and Findings (Cont.) Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means FSig.tdf Sig. (2- tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference LowerUpper Differ_Rater 1and2 Equal variances assumed 11.311.0013.0561021.002.2246.0735.0804.3688 Equal variances not assumed 3.0801020.315.002.2246.0729.0815.3677

Results and Findings (Cont.) Inter-rater Reliability: Intraclass Correlations Coefficients (ICC) Overall Intraclass Correlation Coefficient Intraclass Correlation Group AGroup B Single Measures.288.132 Average Measures.447.233 One-way random effects model where people effects are random. Group A-Experiment Group; Group B-Control Group

Results and Findings (Cont.) Intraclass Correlation Coefficient by Criterion Criterion Average Measures Intraclass Correlation Group A 1 Conceptualization/Content/Ideas [THIN].461 2 Analysis/Evaluation [THIN].372 3 Synthesis /Support [THIN].459 4 Conclusion/Implications [THIN].163 5 Selection/Retrieval [INFO].461 6 Organization [COMM].532 7 Writing Mechanics [COMM].648 8 APA Compliance [COMM].450 9 Technology Application [TECH].303

Results and Findings (Cont.) Inter-Item Correlation for Group A Reliability Statistics a Cronbach's Alpha Cronbach's Alpha Based on Standardized Items N of Items.895.9009 a. Group# = Group A-Experiment

Results and Findings (Cont.) Inter-Item Correlation Matrix a Criterion 1 Criterion 2 Criterion 3 Criterion 4 Criterion 5 Criterion 6 Criterion 7 Criterion 8 Criterion 9 Criterion 1 [THIN]1.000.707.575.811.296.687.518.319.397 Criterion 2 [THIN].7071.000.868.788.198.788.478.325.403 Criterion 3 [THIN].575.8681.000.743.344.843.494.541.424 Criterion 4 [THIN].811.788.7431.000.314.820.500.344.379 Criterion 5 [INFO].296.198.344.3141.000.301.444.523.241 Criterion 6 [COMM].687.788.843.820.3011.000.540.555.428 Criterion 7 [COMM].518.478.494.500.444.5401.000.510.081 Criterion 8 [COMM].319.325.541.344.523.555.5101.000.445 Criterion 9 [TECH].397.403.424.379.241.428.081.4451.000

Lessons Learned through the Process Get faculty excited about assessment! Strategies to improve inter-rater agreement  More training  Clear rubric criteria  Map assignment instructions to rubric criteria Make decisions based on the assessment results  Further refined the rubric and common assessment activity

Resources McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30-46 (Correction, 1(1), 390). Nunnally, J. (1978). Psychometric theory (2 nd ed.). New York: McGraw-Hill. Stemler, S.E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating. Practical Assessment, Research & Evaluation, 9(4). Retrieved from http://pareonline.net/getvn.asp?v=9&n=4.http://pareonline.net/getvn.asp?v=9&n=4 Shrout, P.E. & Fleiss, J.L. (1979). Intraclass Correlations: Uses in Assessing Rater reliability. Psychological Bulletin, 2, 420-428. Retrieved from http://www.hongik.edu/~ym480/Shrout-Fleiss-ICC.pdf. http://www.hongik.edu/~ym480/Shrout-Fleiss-ICC.pdf

Stay Connected… Dr. Yan Zhang Cooksey Director for Outcomes Assessment The Graduate School, University of Maryland University College Email: yan.cooksey@umuc.eduyan.cooksey@umuc.edu Homepage: http://assessment-matters.weebly.comhttp://assessment-matters.weebly.com

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Similar presentations

Presentation on theme: "Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Similar presentations

Presentation on theme: "Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in."— Presentation transcript:

Similar presentations

About project

Feedback