Assessment Research Centre Online Testing System (ARCOTS)

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
College and Career-Readiness Conference Summer 2014 FOR HIGH SCHOOL MATHEMATICS TEACHERS.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
Latent Change in Discrete Data: Rasch Models
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Welcome to class today! Chapter 12 summary sheet Jimmy Fallon video
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Chapter 12 Section 1 Inference for Linear Regression.
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Correlation & Regression
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
Regression and Correlation Methods Judy Zhong Ph.D.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Teaching Today: An Introduction to Education 8th edition
Reliability & Validity
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Examining Relationships in Quantitative Research
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Statistics PSY302 Quiz One Spring A _____ places an individual into one of several groups or categories. (p. 4) a. normal curve b. spread c.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Examining Relationships in Quantitative Research
University of Ostrava Czech republic 26-31, March, 2012.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
Using test data to improve performance Patrick Griffin.
Correlation & Regression Analysis
Chapter 10 Inference for Regression
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Aligning Assessments to Monitor Growth in Math Achievement: A Validity Study Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information Washington.
Using Student Data as a Basis for Feedback to Teachers Ronnie Detrich Wing Institute ABAI, 2011.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Ming Lei American Institutes for Research Okan Bulut Center for Research in Applied Measurement and Evaluation University of Alberta Item Parameter and.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Chapter 9 Introduction to the t Statistic
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Evaluation of measuring tools: validity
CHAPTER 12 More About Regression
Reliability & Validity
Booklet Design and Equating
Measurement Characteristics of Client Assessment
Lecture Slides Elementary Statistics Thirteenth Edition
Confidence in Competence
Statistics PSY302 Review Quiz One Fall 2018
Unit 3 – Linear regression
Neil T. Heffernan, Joseph E. Beck & Kenneth R. Koedinger
Writing the IA Report: Analysis and Evaluation
Simple Linear Regression
CHAPTER 12 More About Regression
Product moment correlation
Measurement Concepts and scale evaluation
Statistics PSY302 Review Quiz One Spring 2017
CHAPTER 12 More About Regression
Lecture Slides Elementary Statistics Twelfth Edition
Presentation transcript:

Assessment Research Centre Online Testing System (ARCOTS) Monjurul Alom ,Nafisa Awwal, Patrick Griffin , Daniel Jimenez Barrios, Masa Pavlovic ,Pam Robertson, Hillary Slater

Background Assessment and Learning Partnership (ALP) started in 2009 (2010) aim to investigate effect of teachers using assessment data when planning instructions on students outcomes longitudinal study 3 Years two testing periods per year (6 months apart) over 100 000 students over 250 000 assessments

Available assessments : Current Victorian DEECD , Independent and CEOM school students grade 3 to 10 over 200 000 students participated over 600 000 assessments administrated and reported Available assessments : Problem solving (3) Interactive problem solving (2) Numeracy (9) Reading Comprehension (10)

Assessment framework development of the assessment tool was based on the integration of three theories (Glaser, Rasch and Vygotsky) as described by Griffin(2007) Glaser(1963) –criterion referenced interpretation of the scale-performance described in terms of skills and their difficulties ( Rasch(1960,1980) –link between item difficulty and student ability –same metrics and ‘interval ‘ properties of the scale Vygotsky –ZPD point of intervention

Assessment Link between scale scores and student knowledge and skills Information regarding the intervention point Provide information to enable teachers to qualitatively differentiate between different ability levels

Authentication Protocol ARCOTS Students Teachers Authentication Protocol DBMS Access to the following: Test control Student Reports Student records/results Teacher instruments & Individual reports Access to tests on the domains: Numeracy Reading Comprehension Problem Solving

ACOTS Welcome

Students Page

Example item - Numeracy

Example item – Reading Comprehension

Assessment design Spread of ability in any given classroom enforced the need for the assessment that describes the skills student develop through the duration of the education instead of describing skills from grade level to grade level

There are number of methods available Method used: Vertical scaling Not Equating Provides the link between tests of different difficulty that are administrated to students at different ability levels When calibrated allow for comparability of results across different grade levels There are number of methods available Method used: Scaling :Rasch model Linking : fixed common item parameters method The amount of growth/learning is determined by difference in student performance on common items between two testing times Fixed common item parameter method- item parameters are treated as true values of the parameter

Unidimensional measurement model Total score sufficient statistics Why Rasch model ? Unidimensional measurement model Total score sufficient statistics Invariance property (probability of success given by difference between item difficulty and person ability equal discrimination assumed so if fit -interval properties depending on how well data fits Rash model we can construct sample independent interval level measure

Fixed common items method Fixed common item parameter method- item parameters are treated as true values of the parameter Difference in performance on common items is used to determine difference between students at two adjacent tests The amount of growth/learning is determined by difference in student performance on common items between two testing times

Factors influencing quality of common item set common item set length item placement composition/content representativeness statistical equivalence to the total test item stability –statistical stability

Common item set

Item position

Common Item set Spread

TCC FOR VERTICALLY LINKED TESTS Result of vertical linking is a series of test characteristics curve with each corresponding to the different location on the developmental scale . NOTE THAT EACH TCC is MOST ACCURATE AT DIFFERENT LOCATION ON THE SCALE. That is different test forms measure students who are at the same level of ability with different accuracy -TEST TARGETING

Horizontal equating Base scale established in 2010 New tests developed for every testing period Fixed common items used to equate the tests

Horizontal -Common items yellow test

TCC FOR VERTICALLY LINKED TESTS Result of vertical linking is a series of test characteristics curve with each corresponding to the different location on the developmental scale . NOTE THAT EACH TCC is MOST ACCURATE AT DIFFERENT LOCATION ON THE SCALE. That is different test forms measure students who are at the same level of ability with different accuracy -TEST TARGETING

Test targeting

Test Targeting ARCOTS provides some feedback on accuracy of test targeting -in addition , online PD , facilitators and ARCOTS help provides ongoing support for teacher in targeting and interpreting student results

Items were checked for drift after each testing round Checking for drift Items were checked for drift after each testing round Where possible drift was categorized as: construct relevant construct irrelevant b plots Displacement using Winstep and Conquest

Scatter plots with confidence bands b- plots Scatter plots with confidence bands The judgment about items that have changed in difficulty is made by examining the confidence interval items that fall outside are identified as outliers. Line of Best fit - The shift in mean (intercept) accounts for differences in mean ability distribution if the slope is different from one then here is a difference in variability of ability distribution. simple and quick graphical method for evaluating the stability of the common item set. cantered estimates for different testing times standardized differences are calculated and statistical tests are performed to decide if the differences are significant

B plots yellow test

Teacher View

Reports – Rocket report (Numeracy)

Reports – class report (Reading comprehension)

© Copyright The University of Melbourne 2011