Assessment Research Centre Online Testing System (ARCOTS) Monjurul Alom ,Nafisa Awwal, Patrick Griffin , Daniel Jimenez Barrios, Masa Pavlovic ,Pam Robertson, Hillary Slater
Background Assessment and Learning Partnership (ALP) started in 2009 (2010) aim to investigate effect of teachers using assessment data when planning instructions on students outcomes longitudinal study 3 Years two testing periods per year (6 months apart) over 100 000 students over 250 000 assessments
Available assessments : Current Victorian DEECD , Independent and CEOM school students grade 3 to 10 over 200 000 students participated over 600 000 assessments administrated and reported Available assessments : Problem solving (3) Interactive problem solving (2) Numeracy (9) Reading Comprehension (10)
Assessment framework development of the assessment tool was based on the integration of three theories (Glaser, Rasch and Vygotsky) as described by Griffin(2007) Glaser(1963) –criterion referenced interpretation of the scale-performance described in terms of skills and their difficulties ( Rasch(1960,1980) –link between item difficulty and student ability –same metrics and ‘interval ‘ properties of the scale Vygotsky –ZPD point of intervention
Assessment Link between scale scores and student knowledge and skills Information regarding the intervention point Provide information to enable teachers to qualitatively differentiate between different ability levels
Authentication Protocol ARCOTS Students Teachers Authentication Protocol DBMS Access to the following: Test control Student Reports Student records/results Teacher instruments & Individual reports Access to tests on the domains: Numeracy Reading Comprehension Problem Solving
ACOTS Welcome
Students Page
Example item - Numeracy
Example item – Reading Comprehension
Assessment design Spread of ability in any given classroom enforced the need for the assessment that describes the skills student develop through the duration of the education instead of describing skills from grade level to grade level
There are number of methods available Method used: Vertical scaling Not Equating Provides the link between tests of different difficulty that are administrated to students at different ability levels When calibrated allow for comparability of results across different grade levels There are number of methods available Method used: Scaling :Rasch model Linking : fixed common item parameters method The amount of growth/learning is determined by difference in student performance on common items between two testing times Fixed common item parameter method- item parameters are treated as true values of the parameter
Unidimensional measurement model Total score sufficient statistics Why Rasch model ? Unidimensional measurement model Total score sufficient statistics Invariance property (probability of success given by difference between item difficulty and person ability equal discrimination assumed so if fit -interval properties depending on how well data fits Rash model we can construct sample independent interval level measure
Fixed common items method Fixed common item parameter method- item parameters are treated as true values of the parameter Difference in performance on common items is used to determine difference between students at two adjacent tests The amount of growth/learning is determined by difference in student performance on common items between two testing times
Factors influencing quality of common item set common item set length item placement composition/content representativeness statistical equivalence to the total test item stability –statistical stability
Common item set
Item position
Common Item set Spread
TCC FOR VERTICALLY LINKED TESTS Result of vertical linking is a series of test characteristics curve with each corresponding to the different location on the developmental scale . NOTE THAT EACH TCC is MOST ACCURATE AT DIFFERENT LOCATION ON THE SCALE. That is different test forms measure students who are at the same level of ability with different accuracy -TEST TARGETING
Horizontal equating Base scale established in 2010 New tests developed for every testing period Fixed common items used to equate the tests
Horizontal -Common items yellow test
TCC FOR VERTICALLY LINKED TESTS Result of vertical linking is a series of test characteristics curve with each corresponding to the different location on the developmental scale . NOTE THAT EACH TCC is MOST ACCURATE AT DIFFERENT LOCATION ON THE SCALE. That is different test forms measure students who are at the same level of ability with different accuracy -TEST TARGETING
Test targeting
Test Targeting ARCOTS provides some feedback on accuracy of test targeting -in addition , online PD , facilitators and ARCOTS help provides ongoing support for teacher in targeting and interpreting student results
Items were checked for drift after each testing round Checking for drift Items were checked for drift after each testing round Where possible drift was categorized as: construct relevant construct irrelevant b plots Displacement using Winstep and Conquest
Scatter plots with confidence bands b- plots Scatter plots with confidence bands The judgment about items that have changed in difficulty is made by examining the confidence interval items that fall outside are identified as outliers. Line of Best fit - The shift in mean (intercept) accounts for differences in mean ability distribution if the slope is different from one then here is a difference in variability of ability distribution. simple and quick graphical method for evaluating the stability of the common item set. cantered estimates for different testing times standardized differences are calculated and statistical tests are performed to decide if the differences are significant
B plots yellow test
Teacher View
Reports – Rocket report (Numeracy)
Reports – class report (Reading comprehension)
© Copyright The University of Melbourne 2011