Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Slides:

Advertisements

Similar presentations

Standardized Scales.

Advertisements

Topics: Quality of Measurements

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-

Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Reliability & Validity.  Limits all inferences that can be drawn from later tests  If reliable and valid scale, can have confidence in findings  If.

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

Part II Sigma Freud & Descriptive Statistics

Methods for Estimating Reliability

The Ways and Means of Psychology STUFF YOU SHOULD ALREADY KNOW BY NOW IF YOU PLAN TO GRADUATE.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

What z-scores represent

EXPERIMENTAL DESIGN Random assignment Who gets assigned to what? How does it work What are limits to its efficacy?

UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.

Today Concepts underlying inferential statistics

Research Methods in MIS

Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.

Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University.

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.

PTP 560 Research Methods Week 11 Question on article If p

Measurement in Exercise and Sport Psychology Research EPHE 348.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Exploring the Equivalence and Rater Bias in AC Ratings Prof Gert Roodt – Department of Industrial Psychology and People Management, University of Johannesburg.

T tests comparing two means t tests comparing two means.

ANOVA. Independent ANOVA Scores vary – why? Total variability can be divided up into 2 parts 1) Between treatments 2) Within treatments.

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

Tests and Measurements Intersession 2006.

Mapping A Strategy to Attract the Politically Engaged Student to East Evergreen University Consultants: Elizabeth Goff Scott Gravitt Kim Huett Carolyn.

Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.

Quantitative SOTL Research Methods Krista Trinder, College of Medicine Brad Wuetherick, GMCTE October 28, 2010.

Science Fair Overview Judges Training Saturday, February 21, 2009.

Arkansas Tech University

Lecture 9 TWO GROUP MEANS TESTS EPSY 640 Texas A&M University.

All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.

Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.

Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.

Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.

An Exploration into Learners’ Autonomy in the SAC of Chongqing University Professor Xiaoling Zou Chongqing University

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:

Authentic Discovery Projects in Statistics GCTM Conference October 16, 2009 Dianna Spence NGCSU Math/CS Dept, Dahlonega, GA.

MEASUREMENT: PART 1. Overview  Background  Scales of Measurement  Reliability  Validity (next time)

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Chapter 6 - Standardized Measurement and Assessment

Graduate Program Assessment: A Pilot Study Using a Common Activity and Combined Rubric Rana Khan, Ph.D., Director, Biotechnology Program Datta Kaur Khalsa,

School of Nursing Health Literacy Among Informal Caregivers of Persons With Memory Loss Judith A. Erlen, PhD, RN, FAAN; Jennifer H. Lingler, PhD, RN; Lisa.

1 Ensuring Assessment Consistency Dr. Jalal Kawash and Dr. Robert Collier May 13, 2014 Department of Computer Science.

T tests comparing two means t tests comparing two means.

1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.

An Institutional Writing Assessment Project Dr. Loraine Phillips Texas A&M University Dr. Yan Zhang University of Maryland University College October 2010.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Reviewing Syllabi to Document Teaching Culture and Inform Decisions Claudia J. Stanny Director, Center for University Teaching, Learning, & Assessment.

Oneway ANOVA comparing 3 or more means. Overall Purpose A Oneway ANOVA is used to compare three or more average scores. A Oneway ANOVA is used to compare.

Test-Retest Reliability (ICC) and Day to Day Variation.

CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.

1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.

Descriptive Statistics Report Reliability test Validity test & Summated scale Dr. Peerayuth Charoensukmongkol, ICO NIDA Research Methods in Management.

Lecture 5 Validity and Reliability

Week 3 Class Discussion.

Making Sense of Advanced Statistical Procedures in Research Articles

Mapping the ACRL Framework and Nursing Professional

The first test of validity

UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE

Psych 231: Research Methods in Psychology

What are their purposes? What kinds?

Investigations using the

Presentation transcript:

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in Higher Education (AALHE) Conference, Lexington, Kentucky, June 3, 2013 Dr. Yan Zhang Cooksey University of Maryland University College

Outline of Today’s Presentation Background and purposes of the full-day grading project Procedural methods of the project Discuss the results and decisions informed by the assessment findings Lessons learned through the process

Purposes of the Full-day Grading Project To simplify the current assessment process To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

UMUC Graduate School Previous Assessment Model: Model

Previous Assessment Model: Model (Cont.)

Strengths:Weaknesses: Tested rubrics Added faculty workload Reasonable collection points Lack of consistency in assignments Larger samples - more data for analysis Variability in applying scoring rubrics

C2 Model: Common activity & Combined rubric

Compare Model to (new)C2 Model Current ModelCombined Activity/Rubric (C2) Model Multiple Rubrics: one for each of 4 SLEs Single rubric for all 4 SLEs Multiple assignments across graduate school Single assignment across graduate school One to multiple courses/4 SLEs Single course/4 SLEs Multiple raters for the same assignment/course Same raters/assignment/course Untrained raters Trained raters

Purposes of the Full-day Grading Project To simplify the current assessment process To validate the newly developed common rubric measuring four core student learning areas (written communication, critical thinking, technology fluency, and information literacy)

Procedural Methods of the Grading Project Data Source Rubric Experimental design for data collection Inter-rater reliability

Procedural Methods of the Grading Project (Cont.) Data Source (student papers/redacted) Course name# of Papers BTMN BTMN BTMN90807 DETC6309 MSAF67020 MSAS67013 TMAN68016 Total 121

Procedural Methods of the Grading Project (Cont.) Common Assignment Rubric (rubric design and refinement) 18 Raters (faculty members)

Procedural Methods of the Grading Project (Cont.) Experimental design for data collection  randomized trial (Group A&B)  raters’ norming and training  grading instruction

Procedural Methods of the Grading Project (Cont.) Inter-rater reliability (literature) SStemler (2004): in any situation that involves judges (raters), the degree of inter-rater reliability is worthwhile to investigate, as the value of inter-rater reliability has significant implication for the validity of the subsequent study results. IIntraclass Correlation Coefficients (ICC) were used in this study.

Results and Findings Two-sample t-test Group Statistics Group #NMean Std. Deviation Std. Error Mean Differ_Rater1and2 Group A-Experiment Group Group B-Control Group

Results and Findings (Cont.) Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means FSig.tdf Sig. (2- tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference LowerUpper Differ_Rater 1and2 Equal variances assumed Equal variances not assumed

Results and Findings (Cont.) Inter-rater Reliability: Intraclass Correlations Coefficients (ICC) Overall Intraclass Correlation Coefficient Intraclass Correlation Group AGroup B Single Measures Average Measures One-way random effects model where people effects are random. Group A-Experiment Group; Group B-Control Group

Results and Findings (Cont.) Intraclass Correlation Coefficient by Criterion Criterion Average Measures Intraclass Correlation Group A 1 Conceptualization/Content/Ideas [THIN] Analysis/Evaluation [THIN] Synthesis /Support [THIN] Conclusion/Implications [THIN] Selection/Retrieval [INFO] Organization [COMM] Writing Mechanics [COMM] APA Compliance [COMM] Technology Application [TECH].303

Results and Findings (Cont.) Inter-Item Correlation for Group A Reliability Statistics a Cronbach's Alpha Cronbach's Alpha Based on Standardized Items N of Items a. Group# = Group A-Experiment

Results and Findings (Cont.) Inter-Item Correlation Matrix a Criterion 1 Criterion 2 Criterion 3 Criterion 4 Criterion 5 Criterion 6 Criterion 7 Criterion 8 Criterion 9 Criterion 1 [THIN] Criterion 2 [THIN] Criterion 3 [THIN] Criterion 4 [THIN] Criterion 5 [INFO] Criterion 6 [COMM] Criterion 7 [COMM] Criterion 8 [COMM] Criterion 9 [TECH]

Lessons Learned through the Process Get faculty excited about assessment! Strategies to improve inter-rater agreement  More training  Clear rubric criteria  Map assignment instructions to rubric criteria Make decisions based on the assessment results  Further refined the rubric and common assessment activity

Resources McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), (Correction, 1(1), 390). Nunnally, J. (1978). Psychometric theory (2 nd ed.). New York: McGraw-Hill. Stemler, S.E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating. Practical Assessment, Research & Evaluation, 9(4). Retrieved from Shrout, P.E. & Fleiss, J.L. (1979). Intraclass Correlations: Uses in Assessing Rater reliability. Psychological Bulletin, 2, Retrieved from

Stay Connected… Dr. Yan Zhang Cooksey Director for Outcomes Assessment The Graduate School, University of Maryland University College Homepage: