Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.

Slides:



Advertisements
Similar presentations
Correlation, Reliability and Regression Chapter 7.
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Lesson Six Reliability.
Part II Sigma Freud & Descriptive Statistics
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Reliability n Consistent n Dependable n Replicable n Stable.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
A quick introduction to the analysis of questionnaire data John Richardson.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Business Statistics - QBM117 Least squares regression.
Chapter 2 Research Process Part 1: Aug 29, Research Methods Importance of scientific method Research Process – develop ideas, refine ideas, test.
CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.
Aim: How do we calculate and interpret correlation coefficients with SPSS? SPSS Assignment Due Friday 2/12/10.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
The revised FLACC behavioural pain score: Reliability and validation for pain assessment in children with cerebral palsy Line Kjeldgaard Pedersen, MD Ole.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Requirements for the Course
Rater Reliability How Good is Your Coding?. Why Estimate Reliability? Quality of your data Number of coders or raters needed Reviewers/Grant Applications.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Quantitative SOTL Research Methods Krista Trinder, College of Medicine Brad Wuetherick, GMCTE October 28, 2010.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Empirically Based Characteristics of Effect Sizes used in ANOVA J. Jackson Barnette, PhD Community and Behavioral Health College of Public Health University.
Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.
STANAG OPI Testing Julie J. Dubeau Bucharest BILC 2008.
Research Methods Mark Scheme. (a) Identify the type of experimental design used in this study. (1 mark) AO3 = 1 mark For correct identification of the.
INTEROBSERVER AND INTRAOBSERVER VARIABILITY IN THE C-EOS. COMPARISON BETWEEN EXPERIENCED SPINE SURGEONS AND TRAINEES. María del Mar Pozo-Balado, PhD José.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Graduate Program Assessment: A Pilot Study Using a Common Activity and Combined Rubric Rana Khan, Ph.D., Director, Biotechnology Program Datta Kaur Khalsa,
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Quantitative research Meeting 7. Research method is the most concrete and specific part of the proposal.
Statistics Correlation and regression. 2 Introduction Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
STATISTICAL TESTS USING SPSS Dimitrios Tselios/ Example tests “Discovering statistics using SPSS”, Andy Field.
A2 Agreement Trial ICT November Key Points from Moderation  Majority of centres applied the assessment criteria successfully  Tasks selected and.
Chapter 11 REGRESSION Multiple Regression  Uses  Explanation  Prediction.
Test-Retest Reliability (ICC) and Day to Day Variation.
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
VALIDITY What is validity? What are the types of validity? How do you assess validity? How do you improve validity?
Correlation and Linear Regression
Measurement Reliability
Requirements for the Course
Regression and Correlation of Data Summary
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Engage libraries and library faculty in action based research that enables us to measure our impact on key institutional themes.
PSY 614 Instructor: Emily Bullock, Ph.D.
Natalie Robinson Centre for Evidence-based Veterinary Medicine
Presentation transcript:

Inter-rater reliability in the KPG exams The Writing Production and Mediation Module

Inter-rater reliability in KPG AIM: To check the effectiveness of the instruments employed throughout the rating process Rating Grid – Assessment Criteria Training Material & Training Seminars On-the-spot consultancy to raters

Script Raters Profile Experienced teachers Underwent initial training in rating KPG scripts Undergo specialized training for every test administration

Script rater training Specialized training on rating scripts based on expectations for every activity  Analysis of expected output  Presentation of rated scripts  Actual rating of selected samples Rating scripts under supervision

The rating procedure Each script is rated by two script raters randomly selected from a pool of trained raters Second ratings are independent of the first (no identifying information, no marks or symbols) Constant monitoring/consultancy during the process

METHODOLOGY OF STUDY Computing Inter-rater reliability

Sampling Random sample of at least 40% of the total number of scripts Periods: May 2005 to November 2007 Levels: B1, B2 & C1

Intraclass Correlation Coefficient ICC vs. Pearson’s r The ICC is an improvement over Pearson's as it takes into account the differences in ratings, along with the correlation between raters. ICC in SPSS Average measure reliability analysis for one-way random effects

Interpretation of ICC r <0.40  poor agreement 0.40≤ r ≤0.75  good agreement r >0.75  excellent agreement (Fleiss, 1981) r <0.00  poor agreement 0.00 ≤r ≤0.20  slight 0.21 ≤r ≤0.40  fair 0.41 ≤r ≤0.60  moderate 0.61 ≤r ≤0.80  substantial 0.81 ≤r ≤1.00  almost perfect (Landis & Koch, 1977)

KPG module 2 Free writing production Mediation

Findings MAY 2005 NOVEMBER 2005 MAY 2006 NOVEMBER 2006 MAY 2007 NOVEMBER 2007 B2 - FREE WRITING PRODUCTION 0,740,700,760,680,760,72 C1 - FREE WRITING PRODUCTION 0,570,560,630,520,590,66 B1 - FREE WRITING PRODUCTION 0,760,73

Findings

MAY 2005 NOVEMBER 2005 MAY 2006 NOVEMBER 2006 MAY 2007 NOVEMBER 2007 B2 - MEDIATION0,770,750,740,720,800,69 C1 - MEDIATION0,620,600,680,530,690,71 B1 - MEDIATION 0,830,88

Findings

Totals Descriptive Statistics N Min. Max.MeanMAY054,57,77,67NOV054,56,75,65MAY064,63,76,70NOV064,52,72,61MAY076,59,83,73NOV076,66,88,73

Totals

Conclusion Correlations are high – Positive impact of instruments Trendlines are sloping upwards – Experience in rating and training are directly related to rater agreement indices

Further research Task Analysis to investigate correlation between item difficulty and ICC In process: Detailed task analysis project carried out by linguists and psychologists  AIM: To determine the variables affecting the difficulty of a task