Reliability in assessment Cees van der Vleuten Maastricht University www.ceesvandervleuten.com Certificate Course on Assessment 6 May 2015.

Slides:



Advertisements
Similar presentations
Where are we with assessment and where are we going?
Advertisements

Workplace assessment Dr. Kieran Walsh, Editor, BMJ Learning. 2.
Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Evaluating the Reliability and Validity of the Family Conference OSCE Across Multiple Training Sites Jeffrey G. Chipman MD, Constance C. Schmitz PhD, Travis.
Assessment of Professionals M. Schürch. How do you assess performance? How do you currently assess the performance of your residents? What standards do.
Dr. Mehdi Sayyah 1 Assessment of Competence An Introduction.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
A Model for Assessment ASSESSMENT TYPES. OBJECTIVE Assessment instruments vary considerably in their uses to test different levels of competence. 5/10/2015.
Workplace-based Assessment. Overview Types of assessment Assessment for learning Assessment of learning Purpose of WBA Benefits of WBA Miller’s Pyramid.
Objective vs. subjective in assessment Jaime Correia de Sousa, MD, MPH Horizonte Family Health Unit Matosinhos Health Centre - Portugal Health Sciences.
Dissemination and Critical Evaluation of Published Research Peg Bottjen, MPA, MT(ASCP)SC.
An overview of Assessment. Aim of the presentation Define and conceptualise assessment Consider the purposes of assessment Describe the key elements of.
An overview of Assessment. Aim of the presentation Define and conceptualise assessment Consider the purposes of assessment Describe the key elements of.
Clinical Assessment: at the heart of healthcare education Trevor Gibbs.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Constructing a test. Aims To consider issues of: Writing assessments Blueprinting.
Assessment of Clinical Competence in Health Professionals Education
Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.
Research Methods in MIS
Measuring Learning Outcomes Evaluation
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Classical Test Theory By ____________________. What is CCT?
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Assessment of Communication Skills in Medical Education
MGTO 324 Workshop 4: Reliability. Activity 1 Use the data management concepts discussed in the last lesson to input all the data provided on the given.
Assessment Tools. Contents Overview Objectives What makes for good assessment? Assessment methods/Tools Conclusions.
Work based assessment Challenges and opportunities.
Chris Evans, University of Winchester Dr Paul Redford, UWE Chris Evans, University of Winchester Dr Paul Redford, UWE Self-Efficacy and Academic Performance:
Assessing clinical judgment using the script concordance test: The importance of using specialty-specific experts to develop the scoring key Petrucci AM.
Assessing Educational Effectiveness Lisa M. Beardsley-Hardy, PhD, MPH, MBA Director of Education General Conference of Seventh-day Adventists.
MBBS, MPH, MCPS, MRCGP (UK), FRIPH (UK), FHAE (UK) TRAINEE EVALUATION METHOD Ass. Prof. Dr. Abdul Sattar KHAN Family & Community Medicine Department College.
Measuring Complex Achievement
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
 Mennin Consulting, 2006 Overview of Assessment ESME 2006 Stewart Mennin, PhD.
Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Student assessment Assessment tools AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.
Workplace based assessment Dr Nav Chana. Assessment is not simple… Medical competence is content and case specific Medical competence is content and case.
Generalizability Theory Nothing more practical than a good theory!
Student assessment Assessment tools AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Assessment Tools.
THE MEASUREMENT OF USER INFORMATION SATISFACTION (BLAKE IVES ET.AL) Presented by: IRA GERALDINA
Self-assessment Accuracy: the influence of gender and year in medical school self assessment Elhadi H. Aburawi, Sami Shaban, Margaret El Zubeir, Khalifa.
Instructional Designs with Impact Diane Magrane, M.D. Associate Vice President Faculty Development and Leadership March 2007 APGO/Solvay Scholars.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Assessment and Testing
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Competency Based Medical Education Coming Soon to A University Near You! Dr. Janice Chisholm October 21, 2015.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
PGCE Evaluation of Assessment Methods. Why do we assess? Diagnosis: establish entry behaviour, diagnose learning needs/difficulties. Diagnosis: establish.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Copyright © 2005 Avicenna The Great Cultural InstituteAvicenna The Great Cultural Institute 1 Student Assessment.
Reliability and Validity
Gathering the Evidence (all levels)
Questions What are the sources of error in measurement?
Clinical Assessment Dr. H
Competence Committees
David Hope, Avril Dewar and Helen Cameron Background
הערכת משתנים שאינם קוגניטיביים במרכז הערכה למיון מועמדים לרפואה
Assessment of Clinical Competencies
VTCSOM Basic Science Faculty Overall
Please note: For the purposes of the following presentation, any reference to “All Levels or “From National 3 to Advanced Higher” does not include National.
The High Stakes Assessment
Presentation transcript:

Reliability in assessment Cees van der Vleuten Maastricht University Certificate Course on Assessment 6 May 2015

Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

What is reliability? Correlation (r x,y )

What is reliability? High correlation (r x,y -> 1.0) Low correlation (r x,y -> 0.0)

Measurement influence

Reliability in achievement tests Test = item r = Split-half reliability coefficient,

Reliability in achievement tests Test = item r across all colours = Cronbach’s alpha

Reliability and test length Reliability Test length Spearman-Brown Prophecy formula Actual Predicted See:

Item-response theory Generalizability theory Three reliability theories Classical test theory Further reading: De Champlain, A. F. (2010). A primer on classical test theory and item response theory for assessments in medical education. Medical education, 44(1), Bloch, R., & Norman, G. (2012). Generalizability theory for the perplexed: A practical introduction and guide: AMEE Guide No. 68. Medical teacher, 34(11),

Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

Reliabilities across methods Testing Time in Hours MCQ Case- Based Short Essay PMP Oral Exam Long Case OSCE Practice Video Assess- ment Norcini et al., Stalenhoef-Halling et al., Swanson, Wass et al., Van der Vleuten, Norcini et al., 1999 In- cognito SPs Mini CEX Ram et al., Gorter, 2002 This table has been published in: Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical education, 39(3), See:

Reliability oral examination (Swanson, 1987) Testing Time in Hours Two New Examiners for Each Case New Examiner for Each Case Same Examiner for All Cases Number of Cases Here multiple sources of error (cases, examiners) are combined in a single reliability estimate. This is the strength of generalizability theory.

Reliabilities across methods Testing Time in Hours MCQ Case- Based Short Essay PMP Oral Exam Long Case OSCE Practice Video Assess- ment Norcini et al., Stalenhoef-Halling et al., Swanson, Wass et al., Van der Vleuten, Norcini et al., 1999 In- cognito SPs Mini CEX Ram et al., Gorter, 2002 This table has been published in: Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical education, 39(3), See:

Checklist or rating scale reliability in OSCE 1 1 Van Luijk & van der Vleuten, 1990

The literature clearly suggests Reliability is a matter of sampling Across contexts Across assessors or any other factor influencing the assessment Objectivity is NOT the same as reliability Many subjective judgments make a robust judgment There are no intrinsically more reliable methods of assessment Most of our assessments in actual practice are not very reliable!

Overview What is reliability conceptually? Evidence of the literature? How to improve reliability?

Consequently…… One single measure is no measure Combine information Across time Across multiple measures Be aware of substantial false-positive and false-negative errors in a single measure.

Reliability Expected % false decisions 1,000 0,9510 0,8020 0,7025 0,6030 0,5033 0,0050

Finally…… Reliability and sampling are strongly related Objectification and standardization do not intrinsically lead to more reliability Do not objectify or standardize where it is not needed (e.g. when assessing complex skills in the real world).

This Powerpoint can be found at: