Psychometrics: Exam Analysis David Hope

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Test Development.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Item Analysis What makes a question good??? Answer options?
Calculating & Reporting Healthcare Statistics
Lesson Fourteen Interpreting Scores. Contents Five Questions about Test Scores 1. The general pattern of the set of scores  How do scores run or what.
1.2: Describing Distributions
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Central Tendency and Variability
Measures of Central Tendency
1 Tendencia central y dispersión de una distribución.
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Covariance and correlation
Foundations of Educational Measurement
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistics Recording the results from our studies.
Descriptive Statistics Descriptive Statistics describe a set of data.
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses.
1 Cronbach’s Alpha It is very common in psychological research to collect multiple measures of the same construct. For example, in a questionnaire designed.
Techniques to improve test items and instruction
Review Measures of central tendency
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Group 2: 1. Miss. Duong Sochivy 2. Miss. Im Samphy 3. Miss. Lay Sreyleap 4. Miss. Seng Puthy 1 ROYAL UNIVERSITY OF PHNOM PENH INSTITUTE OF FOREIGN LANGUAGES.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Describing distributions with numbers
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Descriptive Statistics Descriptive Statistics describe a set of data.
DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
1 Lesson Mean and Range. 2 Lesson Mean and Range California Standard: Statistics, Data Analysis, and Probability 1.1 Compute the range, mean,
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)
Central Tendency & Dispersion
L643: Evaluation of Information Systems Week 13: March, 2008.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
Summary Statistics: Measures of Location and Dispersion.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
+ Progress Reports Homework Test Corrections Signed 1.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
CCGPS Coordinate Algebra Unit 4: Describing Data.
Descriptive Statistics Dr.Ladish Krishnan Sr.Lecturer of Community Medicine AIMST.
Making Sense of Statistics: A Conceptual Overview Sixth Edition PowerPoints by Pamela Pitman Brown, PhD, CPG Fred Pyrczak Pyrczak Publishing.
Central Tendency  Key Learnings: Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and making predictions from.
Exploratory Data Analysis
Classroom Analytics.
Central Tendency and Variability
Slides to accompany Weathington, Cunningham & Pittenger (2010), Statistics Review (Appendix A) Bring all three text books Bring index cards Chalk? White-board.
Numerical Measures: Centrality and Variability
Measures of Central Tendency and Dispersion
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Module 8 Statistical Reasoning in Everyday Life
Using statistics to evaluate your test Gerard Seinhorst
Analyzing test data using Excel Gerard Seinhorst
Descriptive Statistics
Tests are given for 4 primary reasons.
Presentation transcript:

Psychometrics: Exam Analysis David Hope

Background Our exams are information How much information is in a given exam? Or a given question? How can we get more information from our assessment? How can we ensure our information is correct?

Questions for Psychometrics What does the exam tell us about our students? Was the exam well designed? Do the questions ‘work’? Can we separate the good candidates from the bad? How do we know when to drop or change our questions?

More Questions Is it fair to sum up candidate performance with a single number? How many areas of ability are being tested? Can students compensate by being brilliant on one domain – while being unsatisfactory on another?

Assessment in undergraduate medical education: advice supplementary to Tomorrow’s Doctors (2009) Emphasizes the need for regular psychometric analysis Requests key statistics on reliability Nearly all the suggested statistics require variability in student performance Recent GMC reports on medical schools show their willingness to intervene if psychometric analyses show problems

Exam Analysis A step by step look at all key exam variables Problem questions are always highlighted in reports Suggestions of how to improve the exam’s psychometric performance are included

Distribution of Scores The shape of the distribution is very informative – Are candidates clumped together, or widely separated? – Are failing students separated from the main body of students? – Nearly all the statistics emphasised as important by

Frequencies These plots provide information on how many candidates achieved a given score. Note that in both cases a small number of candidates have achieved very high or low marks that separate them from the rest of the group

Density Plot These plots provide the same information as the frequency plots, but as a single smooth shape that is easier to interpret.

Key statistics Mean – The mean is a measure of central tendency (the sum of scores divided by the number of scores) – A high mean indicates most candidates performed well on the exam or item Median – The score or case separating the bottom and top half of the scores – The median (unlike the mean) is not heavily influenced by a small number of significant outliers

Key statistics – Exam interpretation Mean – The mean is a very useful indicator of difficulty – Did candidates perform better than expected? – Or did they struggle on items predicted to be easy? Median – When there are lots of outliers the median may be more useful than the mean

Key Statistics Standard Deviation (SD) – A measure of dispersion – A high SD indicates greater dispersal around the mean Range – A measure of dispersion – Calculated by subtracting the lowest score from the highest score Min & Max – The lowest and highest scores in the exam respectively

Key Statistics – Exam interpretation Standard Deviation – Where this is low candidate performance is very homogenous – This will likely cause many problems – A high SD is useful Range – Similar to SD – Less useful as it is heavily influenced by outliers Min/Max – Establishes how your best and worst performers did – What is the exam ‘floor’?

Cronbach’s alpha A measure of internal consistency – Are the questions on the exam testing the same area? – Typically we want the number to be in excess of 0.7 Many have criticized this statistic with good reason – but it remains commonly used External bodies often evaluate exams this way – and some will require improvement where the statistic is poor

Key statistic – Exam interpretation Cronbach’s alpha – Where the number is low (<.7 and even more so <.6) this indicates exam wide problems and a lack of consistency – Where the number is very high (>.9) this indicates the exam has oversampled from a single domain – Evaluating the individual questions is the best solution if the value is problematic – Low variability is a major cause of problems with alpha

Discrimination Exams are intended to discriminate – Will the question separate the good from the bad? – Discrimination measures the capacity for an item to distinguish between high and low performers on a test – Items which do not discriminate well are not helping us determine candidate performance

Discrimination Higher numbers are better – A NEGATIVE number is especially worrisome – This means it discriminates in the wrong direction – Candidates who do well overall perform worse at this question – Values worse than -0.1 usually indicate a serious problem with the question

Facility Facility indicates the proportion of candidates who got the question correct Requires interpretation for multi-mark questions It is always worth looking at those questions which have very high or very low facility

Item-total correlation Scores on a question are correlated with exam scores – Higher values mean the candidates who do well at this question do well overall – When values are close to zero the question tells us nothing about overall performance – When values are NEGATIVE this means candidates who do well on this question do badly overall

Summarizing group performance We divide all candidates into four ability groups and rank them from worst to best Ideally there should be a steady increase in score from the worst to the best group If this does not occur there is likely a problem in the question

Testing Group Performance

Responses It is usually helpful to review what option candidates chose Ideally candidates who got the question wrong should be evenly distributed among all the incorrect distractor questions In practice this is rare BUT looking at the most popular distractors can provide information on why candidates answered in this way If most candidates pick the same (wrong) answer this suggests a misconception or even a miscoding

Common Problems with Assessment Most assessment items rely on variability for many of these statistics Low internal consistency is usually the biggest issue Low discrimination or low item-total correlations indicate poorly performing questions Questions where every candidate scores 100% can reduce the effectiveness of the exam It is very common in our exams for a large proportion of the paper to have 20% or more questions answered correctly by over 90% of an (often 250 strong) student body

New Routine Analysis Considerably shorter (generally less than 10 pages as opposed to over 100) Easily searchable Highlights good as well as bad questions Retains the same information as the earlier version

Summary Page provides a histogram and density plot of scores along with key statistics about the exam The Cronbach’s alpha is especially important

Exam Analysis This provides key statistics on each question Problematic items are highlighted By using the ‘filter’ buttons (the arrows on the first row) you can sort smallest to largest or largest to smallest This allows you to look at the best questions as well as the worst You can also search for keywords and so bring up only questions from a particular section of the exam

Responses This lists the options selected by the candidates It provides the number of questions (variables) and students (observations) Each question is named and appears in order ‘n’ is the total, missing is the number of missing cases, and unique is how many options were chosen The next set of numbers gives the option, followed by the raw number and percentage in brackets For the first question in this example 4 people selected 0 (6%) while 66 people selected 1 (94%)

Post-exam support I can provide support in revising questions, interpreting examination scores and assessing the key statistics I attend post-exam assessment panels and exam boards and can attend any informal meeting to review exam performance If there are specific things not in a report you would like to see done this can be arranged