OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation.

Slides:



Advertisements
Similar presentations
Psychology Practical (Year 2) PS2001 Correlation and other topics.
Advertisements

Steven Kane, MD David Robinson, MD Atlanta Medical Center.
Coding and Intercoder Reliability
The Research Consumer Evaluates Measurement Reliability and Validity
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
MILESTONES, EPAS, NAS…AND OTHER ACGME JARGON Committee on Graduate Medical Education September 24, 2012 Sara LP Ross, MD.
JENNA PORTER DAVID JELINEK SACRAMENTO STATE UNIVERSITY Statistical Analysis of Scorer Interrater Reliability.
Assessing verbal communication skills of medical students J Voges E Jordaan * L Koen DJH Niehaus Department of Psychiatry, University of Stellenbosch and.
Funded through the ESRC’s Researcher Development Initiative Department of Education, University of Oxford Session 3.3: Inter-rater reliability.
A quick introduction to the analysis of questionnaire data John Richardson.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
 Alterman DM, Jones TM, Daley BJ, Goldman MH Department of Surgery.
Core Competency Assessment in Emergency Medicine from Design to Implementation National Hispanic Medical Conference Christian Arbelaez, MD, MPH Associate.
Quantifying of avascular necrosis of femoral head The clinical problem Determining the risk of femoral head collapse in a patient with AVNFH.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2008 Department of Kinesiology.
An Introduction to Measurement and Evaluation Emily H. Wughalter, Ed.D. Summer 2010 Department of Kinesiology.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Lipoatrophy and lipohypertrophy are independently associated with hypertension: the effect of lipoatrophy but not lipohypertrophy on hypertension is independent.
Clinical Competency Committees What Faculty need to know Academic Affairs Committee ACEP,JMTF, CORD 1.
Rubric The instrument takes the form of a grid or rubric, which divides physics problem solving into five sub-skill categories. These sub-skills are based.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Inter-rater reliability in the KPG exams The Writing Production and Mediation Module.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
INTEROBSERVER AND INTRAOBSERVER VARIABILITY IN THE C-EOS. COMPARISON BETWEEN EXPERIENCED SPINE SURGEONS AND TRAINEES. María del Mar Pozo-Balado, PhD José.
Inter-rater Reliability of Clinical Ratings: A Brief Primer on Kappa Daniel H. Mathalon, Ph.D., M.D. Department of Psychiatry Yale University School of.
Classification of Early Onset Scoliosis (C-EOS) Has Almost Perfect Inter and Intra Observer Reliability Micaela Cyr, BA Tricia St. Hilaire, MPH Zhaoxing.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
Chapter 6 - Standardized Measurement and Assessment
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.
An Institutional Writing Assessment Project Dr. Loraine Phillips Texas A&M University Dr. Yan Zhang University of Maryland University College October 2010.
TEMPLATE AND PRINTING BY: GRMERC Consortium Members: Grand Valley State University, Michigan State University, Saint Mary’s.
Assessing Specialty Specific Milestones of ‘Off-Service’ Rotators During Emergency Medicine Rotation Lauren Walter, MD, FACEP, FAAEM and Andrew Edwards,
The CORD-EM Speaker Evaluation Form for Medical Conference Planners 1 Andrew W Phillips MD MEd, 2 David Diller MD, 3 Sarah R Williams MD, 4 Yoon Soo Park.
Figure 1: Examples of the UCOAG for knee OA. Intra- and Inter-rater Reliability Test-retest Reliability Number of participants 9997 Males : females 34.
Procedure Logging - What's old is new again Theodore Gaeta, DO, MPH Michael Cabezon, MD Annette Visconti, MD New York Methodist Hospital Introduction METHODS.
Introduction Methods Objectives Results Conclusions Figures/Graphs Implementation of clinical innovations are often evaluated at the operational and patient.
Coding with R-PAS: Does Prior Training with the Exner Comprehensive System Impact Interrater Reliability Compared to Those Examiners with Only R-PAS Based.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Introduction References Objectives Conclusions Results Faculty provision of performance feedback is critical for residents to improve their clinical skills.
Multisource Feedback in a Simulation-Based Milestone Assessment of Emergency Medicine Residents Jeffrey Siegelman MD, Sidhant Nagrani MD, Anna Gajewski.
Oral Health Training & Calibration Programme
An Introduction to Measurement and Evaluation
Measurement Reliability
An Alternative Certification Examination “ACE”, to assess the domains of professional practice.  Morris M1*, Gillis AE2, Smoothey CO3, Hennessy M1, Conlon.
Measures of Agreement Dundee Epidemiology and Biostatistics Unit
Nicole Michael, BA John Smith, MD Tricia St. Hilaire, MPH
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Michael Jong, MD1, Bryan G. Kane, MD1,3, Nicole C
National Trends in Emergency Medicine Residency ECG Curriculum and Interpretation Practices Céline Pascheles, MD, Leslie Bilello MD, Jenna Singleton MD,
CLICK TO GO BACK TO KIOSK MENU
Natalie Robinson Centre for Evidence-based Veterinary Medicine
CLICK TO GO BACK TO KIOSK MENU
Robust Assessment Instrument for Student Problem Solving
CLICK TO GO BACK TO KIOSK MENU
CLICK TO GO BACK TO KIOSK MENU
CLICK TO GO BACK TO KIOSK MENU
Gender Bias in Nursing Assessments of Emergency Medicine Residents
Christopher S. Kiefer MD, Erica B. Shaver MD,
Sociology Outcomes Assessment
Mapping the ACRL Framework and Nursing Professional
“Hawks vs Dove” Phenomenon in Faculty Attending Evaluations of
Zheng Xie, Chai Gadepalli, Barry M.G. Cheetham,
An Introduction to Correlational Research
15.1 The Role of Statistics in the Research Process
Cuyamaca College Library
Behavior Rating Inventory of Executive Function (BRIEF2): Analyzing and Interpreting Ratings from Multiple Raters Melissa A. Messer1, MHS, Jennifer A.
Descriptive Statistics
UCLA Department of Medicine
Presentation transcript:

OBJECTIVE INTRODUCTION Emergency Medicine Milestones: Longitudinal Interrater Agreement EM milestones were developed by EM experts for the Accreditation Council for Graduate Medical Education Used to recurrently assess competency- based developmental outcomes of postgraduate trainees Little is known regarding how closely resident self-evaluations compare to faculty evaluations as determined by the training program’s Core Competency Committee Statistics, such as Cohen’s kappa, may be useful for measuring agreement between resident and faculty evaluation scores Interrater agreement is a way to statistically measure agreement between two or more independent raters 1 Takes into account agreement due to chance Ranges from poorer than chance (-1) to better than chance (+1) agreement 2 See Table 1 To determine whether resident self-evaluation scores were consistent with their corresponding Core Competency Committee faculty evaluation scores in semiannual EM milestones assessments over time Alan H. Breaud, MPH 1 ; Andrew L. Chu, BS 2 ; Lauren Sigman, MD 3 ; Kerrie P. Nelson, PhD 4 ; Kerry M. McCabe, MD 1,2 1 Boston Medical Center, Boston, MA; 2 Boston University School of Medicine, Boston, MA; 3 LAC+USC Medical Center, Los Angeles, California; 4 Boston University School of Public Health, Boston, MA LIMITATIONS METHODS We collected milestone scores of postgraduate year (PGY) 1 through 4 EM residents training at one urban, academic medical center from spring 2014, fall 2014, and spring 2015 Residents’ self-assessed using the milestones at each time point and their scores were matched to their corresponding faculty evaluation score Faculty evaluation score determined by Core Competency Committee consensus Calculated quadratically-weighted kappa statistical values (95% CI) were calculated to assess the degree of chance-corrected association between the self-evaluations and faculty evaluations at each time point Quadratically weighted kappa statistics allows for “partial credit” for ordered data 3,4 Able to account for disagreements dependent on degree of separation The weights decrease with increasing distance between categories Quadratic weighting provides “partial credit” to off-diagonals (light purple) The diagonal (dark purple) reflects perfect agreement Limited number of time points collected Sample size varied among different time points as well as within each time point Unable to analyze individual PGY trends due to this Potential for high weighted kappa when exact agreement is low 4 1. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational & Psychological Measurement, 20(1), Fleiss, J. L., & Cohen, J. (1973). The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability. Educational and Psychological Measurement, 33(3), 613– Cohen, J. (1968). Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213– Graham, P., & Jackson, R. (1993). The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46(9), 1055–1062. REFERENCES Ref: Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. Table 1. General guideline for interpreting the quadratically-weighted kappas Figure 1. Application of quadratic weighting

RESULTS A weighted kappa range of , , and was found for spring 2014, fall 2014, and spring 2015, respectively Moderate to strong chance-corrected association for nearly all milestone assessments Milestone assessing competence with vascular access in spring 2014 had the highest chance-corrected association Kappa: 0.88 (95% CI 0.81 – 0.94) The milestone assessing ultrasound had consistently moderate chance-corrected association Kappa range: 0.43 – 0.59 Sample sizes for each self-assessed milestone ranged from 32 to 45 responses CONCLUSIONS Residents’ self-evaluation of their own competency-based development as defined by the milestones assessment tool is, on average, in alignment with their corresponding faculty Core Competency Committee evaluations Further directions include collecting more time points in order to examine postgraduate year data Table 2. Quadratically-Weighted Chance-Corrected Associations for milestones at each time point Kappa Statistic Strength of Agreement <0.00Poor Slight Fair Moderate Substantial Almost Perfect Emergency Medicine Milestones: Longitudinal Interrater Agreement Alan H. Breaud, MPH 1 ; Andrew L. Chu, BS 2 ; Lauren Sigman, MD 3 ; Kerrie P. Nelson, PhD 4 ; Kerry M. McCabe, MD 1,2 1 Boston Medical Center, Boston, MA; 2 Boston University School of Medicine, Boston, MA; 3 LAC+USC Medical Center, Los Angeles, California; 4 Boston University School of Public Health, Boston, MA