Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,

Slides:



Advertisements
Similar presentations
Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.
Advertisements

How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
IRT Equating Kolen & Brennan, IRT If data used fit the assumptions of the IRT model and good parameter estimates are obtained, we can estimate person.
AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova
Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift Ying Li and Robert W. Lissitz.
Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)
Chapter 4 Validity.
Language Testing Introduction. Aims of the Course The primary purpose of this course is to enable students to become competent in the design, development,
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Challenge Question: Why is being organized on the first day of school important? Self-Test Questions: 1.How do I get my classroom ready? 2.How do I prepare.
Examing Rounding Rules in Angoff Type Standard Setting Methods Adam E. Wyse Mark D. Reckase.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Classroom Assessment A Practical Guide for Educators by Craig A
Mathematics for Elementary School Teaching:What Is It and How Do Teachers Learn It? Raven McCrory, Michigan State University Deborah Ball, University of.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Now that you know what assessment is, you know that it begins with a test. Ch 4.
Identification of Misfit Item Using IRT Models Dr Muhammad Naveed Khalid.
Item Response Theory. What’s wrong with the old approach? Classical test theory –Sample dependent –Parallel test form issue Comparing examinee scores.
DEVELOPING ALGEBRA-READY STUDENTS FOR MIDDLE SCHOOL: EXPLORING THE IMPACT OF EARLY ALGEBRA PRINCIPAL INVESTIGATORS:Maria L. Blanton, University of Massachusetts.
Adventures in Equating Land: Facing the Intra-Individual Consistency Index Monster * *Louis Roussos retains all rights to the title.
Measuring Changes in Teachers’ Mathematics Content Knowledge Dr. Amy Germuth Compass Consulting Group, LLC.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Introduction: Philosophy:NaturalismDesign:Core designAssessment Policy:Formative + Summative Curriculum Studied:Oxford Curriculum, Text book board Peshawar,
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
© Copyright 2014 Milady, a part of Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly accessible.
1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos.
Validity Is the Test Appropriate, Useful, and Meaningful?
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
Potential Errors In Epidemiologic Studies Bias Dr. Sherine Shawky III.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
University of Georgia – Chemistry Department JExam - A Method to Measure Outcomes Assessment Charles H. Atwood, Kimberly D. Schurmeier, and Carrie G. Shepler.
A COMPARISON METHOD OF EQUATING CLASSIC AND ITEM RESPONSE THEORY (IRT): A CASE OF IRANIAN STUDY IN THE UNIVERSITY ENTRANCE EXAM Ali Moghadamzadeh, Keyvan.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Scaling and Equating Joe Willhoft Assistant Superintendent of Assessment and Student Information Yoonsun Lee Director of Assessment and Psychometrics Office.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
The ABC’s of Pattern Scoring
University of Ostrava Czech republic 26-31, March, 2012.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
PARCC Field Test Study Comparability of High School Mathematics End-of- Course Assessments National Conference on Student Assessment San Diego June 2015.
Item Response Theory Dan Mungas, Ph.D. Department of Neurology University of California, Davis.
Internal Evaluation of MMP Cindy M. Walker Jacqueline Gosz Razia Azen University of Wisconsin Milwaukee.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Chapter 1 Assessment in Elementary and Secondary Classrooms
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Oleh: Beni Setiawan, Wahyu Budi Sabtiawan
Concept of Test Validity
Using Item Response Theory to Track Longitudinal Course Changes
Data Analysis and Standard Setting
Classical Test Theory Margaret Wu.
Reliability & Validity
Booklet Design and Equating
پرسشنامه کارگاه.
Chapter Six Training Evaluation.
Partial Credit Scoring for Technology Enhanced Items
الاختبارات محكية المرجع بناء وتحليل (دراسة مقارنة )
By ____________________
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
Presentation transcript:

Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass, & Cindy M. Walker University of Wisconsin-Milwaukee

Introduction  We have been using the Learning Mathematics for Teaching (LMT) assessments to evaluate the impact of the MMP on teacher content knowledge  Appreciative of the strong theoretical foundation; however several pragmatic challenges exist  Purpose of this presentation is to share our experiences, challenges, and concerns

Item Response Theory (IRT) 101  Mathematical function that relates item parameters (i.e. difficulty and discrimination) to examinee characteristics  IRT ability and item parameter interpretation  IRT parameter estimates are invariant up to a linear transformation (i.e. indeterminacy of scale)  Several competing models to choose from  How does IRT differ from classical test theory (CTT)?

Issue 1: Lack of Item Equating  Multiple sets of item parameters which can occur due to scaling the same items from 1) Different test compositions 2) Different groups of examinees 3) Both  Which set of item parameters should be used?  Will repeated measures be used?  Need to generalize to the population?

Issue 2: Scale Development  In using the LMT measures, projects must decide whether to use established LMT scales or to construct their own assessments by choosing problems from the item pool  Which method is best and when?  Content validity issue  Need to generalize ability estimates  Test length  Matching ability distribution to maximize test information  Equating concern  Should the pre- and post-test measures?  IRT vs. CTT

Issue 2: Scale Development  How do researchers decide which items to use in constructing assessments?  We have found that the LMT item pool often contains too few items to create the preferred level of match to project goals and state standards for student learning  Need to match item characteristics to expected ability distribution  In some content areas there are too few items and/or item characteristics are not ideal

Issue 3: Model Selection  What IRT model would be selected and how does it influence score interpretation?  One issue when modeling dichotomous data using IRT is selecting the most appropriate or best fitting model (i.e., 1-, 2-, or 3-PL)  Why not use polytomous models?  To date items are scored using either CTT (i.e., summing the number of correct items) or using the 2-PL model.  Comparability of models  Role of item discrimination parameter  Score interpretation for CTT and 2-PL

Table 1  Data taken from Mathematical Explorations for Elementary Teachers Course NMeanSDtDFSig. Pre Post Post-Pre

Conclusions  There are two primary issues related to analyzing data from the Michigan measures that needs to be address and improved on. 1) Item equating to ensure the ones are on the same measurement scale.  Benefit of invariance property (i.e., test length and item selection) 2) The second issue is which IRT model is more appropriate for the data and the degree to which fitting different models affects score interpretation.

Questions and Concerns  How have you addressed some of these issues?  What are some issues that you have encountered when using this measure?  Related measurement questions?