1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.

Slides:



Advertisements
Similar presentations
Correlation, Reliability and Regression Chapter 7.
Advertisements

Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Defining, Measuring and Manipulating Variables. Operational Definition  The activities of the researcher in measuring and manipulating a variable. 
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Epidemiologic Methods- Fall Course Administration Format –Lectures: Tuesdays 8:15 am, except for Dec. 10 at 1:30 pm –Small Group Sections: Tuesdays.
Methods for Estimating Reliability
Agricultural and Biological Statistics
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Developing the Research Question
Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.
Concept of Measurement
Intermediate methods in observational epidemiology 2008 Quality Assurance and Quality Control.
A quick introduction to the analysis of questionnaire data John Richardson.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Statistics 800: Quantitative Business Analysis for Decision Making Measures of Locations and Variability.
PTP 560 Research Methods Week 11 Question on article If p
Simple Linear Regression
Epidemiologic Methods. Definitions of Epidemiology The study of the distribution and determinants (causes) of disease –e.g. cardiovascular epidemiology.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
Clinical Research: Sample Measure (Intervene) Analyze Infer.
1 Lecture 2: Types of measurement Purposes of measurement Types and sources of data Reliability and validity Levels of measurement Types of scale.
Instrumentation.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
© OCS Consulting The flexible extension to your IT team 1 Embedding equivalence t-test results in Bland Altman Plots visualising rater reliability Jim.
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
Statistical Evaluation of Data
Chi Square 22. Parametric Statistics Everything we have done so far assumes that data are representative of a probability distribution (normal curve).
Descriptive Statistics
Rater Reliability How Good is Your Coding?. Why Estimate Reliability? Quality of your data Number of coders or raters needed Reviewers/Grant Applications.
Reliability & Validity
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
MOI UNIVERSITY SCHOOL OF BUSINESS AND ECONOMICS CONCEPT MEASUREMENT, SCALING, VALIDITY AND RELIABILITY BY MUGAMBI G.K. M’NCHEBERE EMBA NAIROBI RESEARCH.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Preparing to Analyse Data C.Adithan Department of Pharmacology JIPMER Pondicherry
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Measures of Association: Correlation Analysis Assistant Prof. Özgür Tosun.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.
Measurement Reliability
Data measurement, probability and Spearman’s Rho
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
STATISTICS MADE EASY Nachiket Shankar 11/02/2017 OBGYAN 2017.
Introduction to Measurement
SA3202 Statistical Methods for Social Sciences
Introduction to Statistics
Natalie Robinson Centre for Evidence-based Veterinary Medicine
The first test of validity
Zheng Xie, Chai Gadepalli, Barry M.G. Cheetham,
15.1 The Role of Statistics in the Research Process
Intermediate methods in observational epidemiology 2008
Presentation transcript:

1 Measuring Agreement

2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent or Disease present  Staging of carcinomas Will different methods lead to the same results? Will different raters lead to the same results?  Measurements of blood pressure How consistent are measurements made  Using different devices?  With different observers?  At different times?

3 Investigating agreement Need to consider  Data type Categorical or continuous  How are the data repeated? Measuring instrument (s), rater(s), time(s)  The goal Are ratings consistent? Estimate the magnitude of differences between measurements Investigate factors that affect ratings  Number of raters

4 Data type Categorical  Binary Disease absent, disease present  Nominal Hepatitis  Viral A, B, C, D, E or autoimmune  Ordinal Severity of disease  Mild, moderate, severe Continuous  Size of tumour  Blood pressure

5 How are data repeated? Same person, same measuring instrument  Different observers Inter-rater reliability  Same observer at different times Intra-rater reliability  Repeatability  Internal consistency Do the items of a test measure the same attribute?

6 Measures of agreement Categorical  Kappa Weighted Fleiss’ Continuous  Limits of agreement  Coefficient of variation (CV)  Intraclass Correlation (ICC) Cronbach’s   Internal consistency

7 Number of raters Two Three or more

8 Categorical data: two raters Kappa Magnitude quoted ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate, >0.60 to 0.80 Substantial, >0.80 Almost perfect Degree of disagreement can be included Weighted kappa  Values close together do not count to disagreement as much as those further apart  Linear / quadratic weightings

9 Categorical data: > two raters Different tests for  Binomial data  Data with more than two categories Online calculators 

10 Example 1 Two raters  Scores 1 to 5 Unweighted kappa 0.79, 95% CI (0.62 to 0.96) Linear weighting 0.84, 95% CI (0.70 to 0.98) Quadratic weighting 0.90, 95% CI (0.77 to 1.00)

11 Example 2 Binomial data Two raters Two ratings each Inter-rater agreement Intra-rater agreement

12 Example 2 ctd. Inter-rater agreement  Kappa 1,2 = (P<0.001)  Kappa 1,3 = (P=0.765)  Kappa 2,3 = (P=0.696) Intra-rater agreement  Kappa 1 = (P<0.001)  Kappa 2 = (P<0.001)  Kappa 3 = (P=1.000)

13 Continuous data Test for bias Check differences not related to magnitude Calculate mean and SD of differences Limits of agreement Coefficient of variation ICC

14 Test for bias Student’s paired t (mean) Wilcoxon matched pairs (median) If there is bias, agreement cannot be investigated further

15 Example 3: Test for bias Paired t test P=0.362 No bias

16 Check differences unrelated to magnitude Clearly no relationship

17 Calculate Mean and SD differences this is s NMean Std. Deviation Difference Valid N (listwise) 17 this is mean

18 Limits of agreement Lower limit of agreement (LLA) = mean ×s = Upper limit of agreement (ULA) = mean ×s = % of differences between a pair of measurements for an individual lie in (-37.6, 47.5)

19 Coefficient of variation Measure of variability of differences  Expressed as a proportion of the average measured value Suitable when error (the differences between pairs) increases with the measured values  Other measures require this not to be the case 100 × s ÷ mean of the measurements  100 × ÷  4.85%

20 Intraclass Correlation Continuous data Two or more sets of measurements Measure of correlation that adjusts for differences in scale Several models  Absolute agreement of consistency  Raters chosen randomly or same raters throughout  Single or average measures

21 Intraclass Correlation ≥0.75 Excellent 0.4 to 0.75 Fair to Good <0.4 Poor

22 Cronbach’s α Internal consistency  Total scores  Several components. α ≥0.8 good ≥0.7 adequate

23 Investigating agreement Data type Categorical  Chi squared Continuous  Limits of agreement  Coefficient of variation  Intraclass correlation How are the data repeated? Measuring instrument (s), rater(s), time(s) Number of raters  Two Straightforward  Three or more Help!