Statistics of EBO 2010 Examination EBO General Assembly Sunday June 21st, 2010 (Tallin, Estonia) Danny G.P. Mathysen MSc. Biomedical Sciences EBOD Assessment.

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Consistency in testing
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 4 – Reliability Observed Scores and True Scores Error
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
MEQ Analysis. Outline Validity Validity Reliability Reliability Difficulty Index Difficulty Index Power of Discrimination Power of Discrimination.
Explaining Cronbach’s Alpha
Hypothesis Testing making decisions using sample data.
MSc Epidemiology Exams what, why, when, how. Paper 1 Covers extended epidemiology, STEPH and clinical trials Purpose of today’s talk: –Explain format.
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Confidence Intervals, Effect Size and Power
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
STATISTICAL INFERENCE PART V
CLEAR 2008 Annual Conference Anchorage, Alaska Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly.
SETTING & MAINTAINING EXAM STANDARDS Raja C. Bandaranayake.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Chi-square Test of Independence
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Statistical Evaluation of Data
Chapter 9 Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences
Lessons Learned about Assessing Quantitative Literacy MAA PREP Workshop: Creating and Strengthening Interdisciplinary Programs in Quantitative Literacy.
Today Concepts underlying inferential statistics
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Statistical hypothesis testing – Inferential statistics I.
Οξούζογλου Λεωνίδας Επιβλέπων Καθηγητής: Οικονομίδης Αναστάσιος Εξεταστής 1: Σατρατζέμη Μαρία Εξεταστής 2: Ξυνόγαλος Στυλιανός ΔΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ.
Descriptive statistics Inferential statistics
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Part #3 © 2014 Rollant Concepts, Inc.2 Assembling a Test #
STATISTICAL INFERENCE PART VII
Chapter 11Prepared by Samantha Gaies, M.A.1 –Power is based on the Alternative Hypothesis Distribution (AHD) –Usually, the Null Hypothesis Distribution.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Topic 5 Statistical inference: point and interval estimate
Chapter 8 Introduction to Hypothesis Testing
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA url:
Group 2: 1. Miss. Duong Sochivy 2. Miss. Im Samphy 3. Miss. Lay Sreyleap 4. Miss. Seng Puthy 1 ROYAL UNIVERSITY OF PHNOM PENH INSTITUTE OF FOREIGN LANGUAGES.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS Introduction.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Chapter 16 Data Analysis: Testing for Associations.
Introduction to Statistical Inference Jianan Hui 10/22/2014.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1.What is Pearson’s coefficient of correlation? 2.What proportion of the variation in SAT scores is explained by variation in class sizes? 3.What is the.
Statistical Techniques
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.
Hypothesis Testing and Statistical Significance
Psychometrics: Exam Analysis David Hope
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
ESTIMATION.
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Introduction of IELTS Test
Using EduStat© Software
Data Analysis and Standard Setting
Classical Test Theory Margaret Wu.
Partial Credit Scoring for Technology Enhanced Items
Using statistics to evaluate your test Gerard Seinhorst
15.1 The Role of Statistics in the Research Process
Tests are given for 4 primary reasons.
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Presentation transcript:

Statistics of EBO 2010 Examination EBO General Assembly Sunday June 21st, 2010 (Tallin, Estonia) Danny G.P. Mathysen MSc. Biomedical Sciences EBOD Assessment and Executive Officer Antwerp University Hospital, Department of Ophthalmology Wilrijkstraat 10, B-2650 Edegem, Belgium

Score calculation for Written Paper EBOD 2010 Candidate Population Written examination (MCQs) –310 candidates Oral examination –308 candidates –1 candidate did not show up for Viva Voce –1 candidate did show only for some of the Viva Voce topics

Score calculation for Written Paper EBOD 2010 Scoring rules Question Number (1  52) Item Number (A  E) T(True) F(False) D(Don’t know) Marks obtained? +1 In case ONLY the correct answer was completed 0 In case ONLY the D option was completed –0.5 In case ONLY the incorrect answer was completed In case T AND F were completed In case NOTHING was completed (blank item) In case D was COMBINED with T and/or F

Score calculation for Written Paper EBOD 2010 Scoring rules Candidate score for MCQ-1(simulation): –ATrue(Correct Answer: True)+1 –BFalse(Correct Answer: False)+1 –CTrue(Correct Answer: True) –DDon’t know(Correct Answer: True)0 –ETrue(Correct Answer: False)–0.5

Advantages for EBO candidates of T/F items –Reliable in case of translation (English, French, German)  choice of language will not result in being (dis)advantaged –Accessibility (e.g. dyslexia)  not too complicated for candidates –Duration of the examination  stress level of candidates can be kept to a minimum –Relatively easy to process  results can be presented on-site Disadvantage for EBO candidates of T/F items –Probability of guessing right = 50 %  level of weakest candidates is overestimated (  oral examination) EBOD 2010 Negative Marking

Hypothesis on the influence of negative marking –Average scores will drop (punishment of incorrect answers) –Spread of candidate scores will enlarge (  room for discrimination) –Rit-value of individual items will increase –Reliability of EBOD will increase Argument against negative marking expressed by European Board of Anaesthesiology –Negative marking is discriminating towards female candidates EBOD 2010 Negative Marking

How to overcome the disadvantages of T/F items? –Introduction of negative marking Increase of discriminative power of examination Reduction of guess factor –wild guesses will be punished (weakest candidates) –guesses by reasoning (partial knowledge) will be rewarded NEGATIVE MARKING AT EBOD 2010 EBOD 2010 Spread of Scores Min Max Mean Stdev

Score calculation for Written Paper EBOD 2010 Statistical Output (SpeedWell)

EBOD 2009 –Degree of Difficulty (P-value) of 0.79 (overestimated due to guessing) –Estimation of a large proportion of candidates guessing (> 33 %) EBOD 2010 –Introduction of the “Don’t know” option  reduction of wild guesses  used on average for 15 % of items (or 39 items) per candidate –Degree of Difficulty (P-value) of 0.66 EBOD 2010 Degree of Difficulty

Point biserial correlation coefficient (Rit) –Estimator of the correlation between the individual item scores X i (either -0.5, 0 or 1), and the total MCQ scores Y i (ranging from 61.5 to 209) of the candidates 0.14 correlation between item and total MCQ score EBOD 2010 Point Biserial Correlation 0.18

Cronbach’s coefficient alpha (r) = 0.87 (2009: 0.78) –Estimator of the lower bound of the internal consistency (degree to which all MCQs leaves are measuring the same, i.e. knowledge of candidates) of EBOD 2010 (95% CI: 0.86 – 0.89) internal consistency of EBOD MCQ-test is good EBOD 2010 Internal Consistency

EBOD 2010 Written Examination –310 Candidates –168 Male Candidates –142 Female Candidates Percentage of candidates using the “Don’t know” option –Male candidates:used on average for 13% of items (34 items) –Female candidates:used on average for 16% of items (42 items) –Statistically significant (p = 0.02) EBOD 2010 Male vs. Female Candidates

Average absolute candidate scores –Male candidates: –Female candidates: –NOT statistically significant (p > 0.05) Distribution of converted candidate scores (1-10) –NOT statistically significant (p > 0.05) when comparing all scores –NOT statistically significant (p > 0.05) when comparing ≤ 5 versus ≥ 6 EBOD 2010 Male vs. Female Candidates Male Female

In general: –Average scores dropped (  ) –Spread of results became larger (13.0  24.8) –Internal consistency (Cronbach-α) improved (0.78  0.87) –P-value was less overestimated due to D option (0.79  0.66) –Rit-value improved (0.14  0.18) When comparing male and female candidates: –Female candidates (D option ticked for 42 items on average) are more prudent when guessing is concerned compared to male candidates (D option ticked for 34 items on average) (p = 0.02) –However, without negative impact on ability to pass EBOD 2010! EBOD 2010 Negative Marking: Conclusions