Download presentation
Presentation is loading. Please wait.
Published byTamsyn Rogers Modified over 8 years ago
1
Using Rasch modeling to investigate the psychometric properties of the OSCE = 51.86*0.4 + 43.79*0.4 + 70.17*0.2 Aim To present a prototype of a validated psychometric evaluation of an OSCE that is grounded in the Rasch unidimensional measurement, a unified theoretical conception of measurement. Introduction Objective Structured Clinical Examination (OSCE) is an assessment tool for clinical skills and competence that has been widely used in the health sciences, particularly in medical education. The goal for OSCE examination is to make reproducible pass/fail decisions and to position candidates according to their demonstrated abilities. The instruments used, namely the OSCE stations must measure candidate ability consistently. In other words, there should be invariant comparison and measurement. Background OSCE Design 4 th year Postgraduate MBBS Students (N=80) 11 stations (@ 20 minutes) - 10 clinical stations and 1 station on Personal & Professional Development - conducted in 2 sessions Clinical stations based on medical disciplines/surgical specialties Real and simulated patients 50 examiners (one examiner for each station) Summed scores across station as overall OSCE score for students Objective to justify the validity of taking the summed score across stations as the overall OSCE score; to investigate and account for examiner leniency/stringency in the OSCE scores Methods The Polytomous Rasch Model (PRM) In its simplest form, when a candidate is rated for performance in a task, the log odds of a candidate being rated in category x is modeled in the PRM as a function of the candidate’s ability and the task difficulty: is the probability of examinee v being rated in category X is the probability of examinee v being rated in category X – 1 is the ability of examinee v is the difficulty of task m The Multi-Facets Rasch Model (MFRM) The MFRM is an extension of the PRM which can be applied to partition the variance in ratings of examinee’s performance into factors such as examinee ability, task difficulty and examiner severity (S j ) Limitation The design of OSCE where each student was rated by one examiner in each station, and different set of examiners for each station did not meet the data collection design required for Rasch analysis based on the MFRM to account for examiner severity. Delimitation Data was fitted to the PRM to investigate the validity of the summed score and to make sure the individual station data fits the PRM before taking the summed station scores as overall OSCE score ( Objective 1). When the data fits the model, the total raw score across stations are the sufficient statistics that contain all information about the examinee’s ability and station difficulty. Investigation of examiner severity and adjustment of raw scores at this stage. Examiner severity examined using raw scores by comparing mean rating by one examiner and the mean average ratings by other examiners to the same group of students in all other stations. If the difference in mean ratings is > + 2 SEM (significant leniency) or < -2SEM (significant stringency), ratings by the examiner will be adjusted by linear equating taking the average ratings of other examiners in all other stations as reference. Uniqueness of the Rasch Model The ideal standards of construct validity (invariance comparison, unidimensionality, sufficiency) embedded in a mathematical formula; Separation of parameter estimates through conditional probability; Examinee-free item difficulty estimation & item-free examinee ability estimation; Data Analysis Each station (max mark 20) analysed as one item Raw scores for station collapsed into 10 categories (0 to 9) to be fitted to the PRM (using RUMM2020) Compare observed rating pattern to the expected pattern of ratings predicted by the PRM Concurrent examination of data fit to the model at both individual items and overall test level Misfit (test, item or examinee) indicates anomaly in rating pattern, warrant further qualitative investigation on conceptualization of construct, item quality, physical conditions of examination, examinee’s unique circumstance that might impact on the rating Results (Rasch analysis) Overall Test of Unidimensionality Figure 1 Summary Test of Fit Statistics -for the Overall OSCE Exam The Item-Trait Interaction fit statistics evaluate the suitability of the data for the construction of a variable (clinical competence) and its measures (Wright & Masters, 1982; Wright & Stone, 1979). It is a formal test of unidimensionality for the clinical tasks in all the 11 stations and the validity of the summed scores as the measure of the examinee’s clinical competence. A non-significant χ 2 probability for the test of fit of data to the model (χ 2 = 17.06, df=22, p=0.76) indicates that the 11 stations in the OSCE exam map on to a common underlying latent construct, which is clinical competence. Therefore, it is justified to take the summed score across stations as indicator of examinee’s level of clinical competence. Individual Item Fit & Item Difficulty Estimates Figure 2 Individual items fit to the PRM (by item difficulty order) χ 2 fit statistics in the last column of Figure 2 shows the statistical evidence of data fit to the Rasch model at each individual station level. Each station consistently/invariantly separating the examinees in terms of their clinical competence. Task difficulty estimates for each station Station 9 – least challenging tasks Station 10 –most challenging tasks Graphical Evidence of Item Fit Individual Examinee Fit & Clinical Competence Estimations When the data fits the Rasch model, the program then transforms the ordinal raw scores into a metric linear interval scale using the unit of logits Estimate of modeled error variance for each estimate of examinee ability - a quantification of the precision and to describe the range (confidence interval) within which each person’s true ability falls. Figure 4 Individual Person Fit and Location Estimates (Excerpt) – by Location Order Targeting & Item Map Figure 5 Figure 6 Item Map The difficulty of items and the distribution of examinee ability represented visually on a item-person map or targeting map Examinees and test items on the same scale- testing examinee abiltiy in relation to the task not to other examinees Distribution of examinees on the continuum of clinical competence is skewed to the left as compared to the distribution of the categories of task/item difficulty - as expected for OSCE Local Independence between Stations – evidence of construct validity Item I0001 I0002 I0003 I0004 I0005 I0006 I0007 I0008 I0009 I0010 I0011 I0001 1.000 I0002 0.024 1.000 I0003 -0.115 -0.068 1.000 I0004 0.044 -0.016 -0.228 1.000 I0005 0.013 0.040 -0.267 -0.069 1.000 I0006 -0.124 -0.048 -0.089 -0.062 -0.052 1.000 I0007 -0.055 -0.017 -0.016 -0.179 0.123 -0.198 1.000 I0008 0.008 -0.176 0.064 -0.183 -0.249 -0.061 -0.432 1.000 I0009 -0.002 -0.186 -0.250 0.031 0.186 0.150 -0.189 -0.110 1.000 I0010 -0.361 -0.141 -0.102 -0.078 -0.197 -0.142 -0.160 0.054 -0.234 1.000 I0011 -0.281 -0.303 0.010 -0.076 -0.283 -0.182 -0.043 0.023 -0.043 0.011 1.000 Figure 7 Residual Correlation Matrix Low residual correlations between items Correlations based on factors other than clinical competence Another evidence of unidimensionality Conclusion Rasch modeling is a formal test of invariant comparisons across items in a test and therefore the unidimensionality of the latent construct across multiple stations in OSCE examination. It provides the evidence for the validity of the sum-scores for the overall OSCE examination. Suggestion for Assessment Practice in SoM Rasch modeling is a practical tool for quality control and quality assurance measure for OSCE examinations as described above – complementary to CCT Establishing an item bank for OSCE which includes the psychometric properties of individual station/items based on the linear measures of item/task difficulty, to enable linking of OSCE stations across medical/surgical disciplines and across level of trainings through the co-calibration of test items or test linking/equating. Standard setting of OSCE based on Rasch measurement. The integration of Rasch modeling in scaling and item analysis for all assessment components (such as written exams (MCQ, EMQ, SAQ) performance assessment such as OSCE, Mini-CEX, Professional Portfolio, Clinical Audit etc). The establishing one common scale (one ruler), to link all different test forms/formats (horizontal tests linking). The same scale also to be used to link assessment data across different stages/year of training (vertical test linking). A path towards the realization of the vision for competence-based medical education – An arduous but not insurmountable task! Andrich D, Lyne A, Sheridan B, Luo G. (2006). RUMM 2020. Perth: RUMM Laboratory Fisher, W.P. Jr. (2001) Invariant thinking vs. invariant measurement. Rasch Measurement Transactions, 14:4, 778-81. Linacre, J. M. (2009). A user’s guide to Facets Rasch measurement computer program, version 3.66.0. Chicago: Winsteps.com. Schumacker, R.E., Smith, E.V. (2007). A Rasch perspective. Educational and Psychological Measurement, 67(3), 394-409 Wright, B.D., & Masters, G.N. Rating Scale Analysis. Chicago MESA Press, 1982. Wright, B.D., & Stone, M.H. Best Test Design. Chicago: MESA Press, 1979. References Carole Steketee, Michele Gawlinski & Elina Tor Medical Education Support Unit, School of Medicine Fremantle, Notre Dame University Australia
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.