Exploring Value-Added Across Multiple Dimensions: A Bifactor Approach Derek Briggs Ben Domingue University of Colorado Maryland Assessment Conference October.

Slides:



Advertisements
Similar presentations
Value Added in CPS. What is value added? A measure of the contribution of schooling to student performance Uses statistical techniques to isolate the.
Advertisements

Common Core State Standards and Essential Standards
COMMON CORE STATE STANDARDS (CCSS) PARENT WORKSHOP.
Mark D. Reckase Michigan State University The Evaluation of Teachers and Schools Using the Educator Response Function (ERF)
PARCC Tests: An Investment in Learning Test quality and rigor increase; Costs for states generally hold steady July 2013.
Smarter Balanced Assessment a Closer Look Level 1 Part 6.
Elementary Principals CCSS Update Robyn Seifert & Rita Reimbold April 10, 2013.
Welcome to Smarter Balanced Math Assessment Claims EDUCATIONAL SERVICE CENTER - NORTH LOS ANGELES UNIFIED SCHOOL DISTRICT Spring 2014 Facilitator Name.
CCSS/SBAC Update Sara Shore 1. SBAC Pilot Test Feb 20- May 10 Volunteer window closes Jan BalancedPilothttps://
CCCS = California Common Core Standards.  Common Core standards corresponds with the original NCLB timeline of 2014  Students need real world skills.
MEASURING COLLEGE VALUE-ADDED: A DELICATE INSTRUMENT Richard J. Shavelson SK Partners & Stanford University AERA Ben Domingue University of Colorado Boulder.
Common Core State Standards (CCSS) Nevada Joint Union High School District Nevada Union High School September 23, 2013 Louise Johnson, Ed.D. Superintendent.
Welcome Dana Families! Dena Hause Dana Middle School Back to School Night September 25, 2014.
PRESENTATION AT THE 12 TH ANNUAL MARYLAND ASSESSMENT CONFERENCE COLLEGE PARK, MD OCTOBER 18, 2012 JOSEPH A. MARTINEAU JI ZENG MICHIGAN DEPARTMENT OF EDUCATION.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Career and College Readiness (CCR) NGSS, CCSS ELA/Literacy, CCSS Mathematics, MMC K-12 Science Framework and NGSS Review in Terms of CCR 1.
SMARTER BALANCED QUESTION TYPES OVERVIEW TEXT TXT EXT Assess a broad range of content. Scoring is objective, fast, and inexpensive to score. Difficult.
Common Core State Standards and Assessments of Student Mastery 1.
Philomath School District Board of Directors Work Session May 10, 2012.
Evaluating Student Growth Looking at student works samples to evaluate for both CCSS- Math Content and Standards for Mathematical Practice.
COMMON CORE OVERVIEW Welcome. NYS Common Core 5 Strands (Same for Prek-12) (Number Sense, Algebra, Geometry, Measurement, Statistics and Probability)
Smarter Balanced Assessment Consortium Parent Information Session Colchester Public Schools 1.
Smarter Balanced Assessment Consortium A Peek at the Assessment System 1 Rachel Eifler January 30, 2014.
Math Practices (Part I) November 2, 2012 Maricela Rincon, Professional Development Specialist
Evaluating the Vermont Mathematics Initiative (VMI) in a Value Added Context H. ‘Bud’ Meyers, Ph.D. College of Education and Social Services University.
A transitional approach. REPORT CARDS PHONE CALLS TO PARENTS S PARENT-TEACHER CONFERENCES STUDENT-TEACHER CONFERENCES EVALUATED STUDENT WORK STANDARDIZED.
Sensitivity of Teacher Value-Added Estimates to Student and Peer Control Variables October 2013 Matthew Johnson Stephen Lipscomb Brian Gill.
Common Core and the Community College May 20, 2014.
CONNECTICUT STATE DEPARTMENT OF EDUCATION State Board of Education Update on Student Performance First Analysis of Smarter Balanced Results September.
Standards Development Process College and career readiness standards developed in summer 2009 Based on the college and career readiness standards, K-12.
NEW STANDARDS FOR THE 21 ST CENTURY Connecticut Common Core Standards.
Common Core Standards for Mathematics Standards for Mathematical Practice Carry across all grade levels Describe habits of mind of a mathematically expert.
Achievethecore.org 1 Setting the Context for the Common Core State Standards Sandra Alberti Student Achievement Partners.
Smarter Balanced Claims Sue Bluestein Wendy Droke.
ASSOCIATION OF WASHINGTON MIDDLE LEVEL PRINCIPALS WINTER MEETING -- JANUARY 24, 2015 Leveraging the SBAC System to Support Effective Assessment Practices.
Common Core State Standards Introduction and Exploration.
WALNUT HIGH SCHOOL CAASPP INFORMATION NIGHT How to Understand Your Child’s report
Welcome Principals Please sit in groups of 3 2 All students graduate college and career ready Standards set expectations on path to college and career.
An overview for parents and families Butler Avenue School Julie Gillispie--March st Century Community Learning Center.
Chapter 6 - Standardized Measurement and Assessment
SBAC Claim 1: Concepts and Procedures Students can explain and apply mathematical concepts interpret and carry out mathematical procedures with precision.
SBAC Claims > Targets > CCSSm Standards Understanding the assessment scheme Module 3, 22 October 2015.
Walnut Valley Unified School District “Understanding Your Child’s 2015 CAASPP Report” California Assessment and Accountability System Performance and Progress.
Performance Task and the common core. Analysis sheet Phases of problem sheet Performance task sheet.
Implementing the Common Core State Standards Monday, January 23rd - 4pm EST Deconstructing the Common Core Standards: Analyzing for Content, Level of Cognition.
Fostering Vocabulary Development and Deeper Conceptual Understanding in the Mathematics Classroom Melissa Christie September 16, 2013.
Common Core State Standards Back to School Night August 29, 2013.
Welcome. Common Core State Standards? English Language Arts and Literacy in History/Social Studies, Science, and Technical Subjects Mathematical Practice.
Argument-Driven Inquiry is designed to make reading, writing, and speaking a central component of doing science. It therefore enables science teachers.
The Common Core State Standards and the Smarter Balanced Assessments.
1 Common Core Standards. Shifts for Students Demanded by the Core Shifts in ELA/Literacy Building knowledge through content-rich nonfiction and informational.
BY MADELINE GELMETTI INCLUDING STUDENTS WITH DISABILITIES AND ENGLISH LEARNERS IN MEASURES OF EDUCATOR EFFECTIVENESS.
California Common Core State Standards for School Counselors.
Common Core State Standards and Assessments of Student Mastery 1.
Research Questions  What is the nature of the distribution of assignment quality dimensions of rigor, knowledge construction, and relevance in Math and.
Update on State Assessment and CCSS Presentation to the West Hartford Parent Teacher Council.
Measuring College Value-Added: A Delicate Instrument
The New Illinois Learning Standards
Smarter Balanced Assessment Results
Educational Analytics
The New Illinois Learning Standards
National Conference on Student Assessment
Presentation transcript:

Exploring Value-Added Across Multiple Dimensions: A Bifactor Approach Derek Briggs Ben Domingue University of Colorado Maryland Assessment Conference October 18,

Outline Motivation: – Value-Added Across Different Outcomes – Coming to Decisions in a High Stakes Accountability Context Longitudinal Item Response Data A Bifactor Analysis Value Added to What? 2

A Brief Aside on Vertical Scales The original title of this talk was “Multidimensionality, Vertical Scales and Value-Added Models” There is a simple bottom line on this: vertical scales are not needed for value-added modeling. It’s a non-issue. Even for models that focus on repeated measures and growth trajectories, the approach taken to create a vertical scale will rarely have an impact on teacher or school rankings. – For details, see working paper by Briggs & Domingue, “The Gains from Vertical Scaling” Vertical scales can play an important role in supporting inferences about student growth in absolute magnitudes. – For a critique of current practice, see Briggs “Measuring Growth with Vertical Scales” (in press) JEM 3

Motivation 4

“Don’t measure yourself by what you have accomplished, but by what you should have accomplished with your ability.” John Wooden, Basketball Coach,

Our Theories and Intuition Tell Us Academic Success is Multidimensional According to Common Core State Standards, students who are college and career ready in reading, writing, speaking and listening: – Demonstrate independence, build strong content knowledge, comprehend as well as critique, value evidence, use technology and digital media, understand other perspectives and cultures. mathematics: – Make sense of problems and persevere in solving them, reason abstractly, construct viable arguments and critique the reasoning of others, model with mathematics, attend to precision, etc. 6

Previous Empirical Evidence The variability in VA by outcome measure is greater than the variability by model specification. – Lockwood et al, JEM, (2007) – MET Study “Learning about Teaching” (2010) – Papay, AERJ (2011) These studies focused on correlations between VA based on different tests within the same content domain (math, reading) 7

Math vs. Reading Data SourceUnit of AnalysisSample Size Modelr(Math,Reading) HawaiiSchools272Colorado Growth Model (MGPs) 0.74 WyomingSchools214Colorado Growth Model (MGPs) 0.53 Denver PSDTeachers180Colorado Growth Model (MGPs) 0.58 LAUSDTeachers10794Fixed Effects Regression, student, demographics, no classroom or school covariates (“LAVAM”) 0.60 LAUSDTeachers3306Fixed Effects Regression, with classroom and school covariate (“altVAM”

Making High-Stakes Decisions about Teachers/Schools Categorical Outcomes (K): 4 = Highly Effective 3 = Effective 2 = Partially Effective 1 = Ineffective Evidence of Value-Added in Student Outcomes Direct Observations of Practice, Other Sources of Evidence 9

Combining Information about Value- Added: Two Approaches Compensatory – Take simple or weighted average of value-added indicator across test outcomes – Classify teachers/schools on basis of quantiles of distribution or confidence intervals. Conjunctive – Classify teachers/schools in i categories on basis of j outcomes. – Make rules that simplify i j decision matrix to k. – Ensure that no teacher/school is ineffective on a given outcome. 10

11 What Are Tests Designed to “Measure?”

“Tests Measure Student Achievement” An achievement score is a function of the content sampled from an instructional domain Teachers/schools may vary in their ability to in teach different subject matter. Agnostic about underlying latent variable. Observed achievement is an estimate of a true score or universe score (G Theory) – Each achievement domain has a different hypothetical universe score. Consistent with compensatpory and conjunctive approaches? 12

“Tests Measure Student Ability (θ)” This is a latent variable perspective. But math and reading “abilities” are poorly defined latent variables. What is distinct and what is the same about these variables? What if reading and math items are really just measuring the same unidimensional latent variable? Spearman’s g? Should this be the focus of value-added inferences? 13

A Novel Application of a Bifactor IRT Model Common Factor “g”? Common Factor “g”? Math Knowledge & Skills? Reading K & S? Reading K & S? Ite m 1 Ite m 2 Ite m 45 … Ite m 1 Ite m 2 Ite m 54 … Items from a Math Test Items from a Reading Test 14

Research Questions 1.Is “achievement” distinct from “ability”? If we remove the influence that is common to both math and reading test performance, what is left? Are the subject-specific variables substantively interpretable across grades? How do the three “theta” variables from the bifactor model compare to the “theta” variables from successive unidimensional IRT models? 2.What insights does a bifactor model give us about different approaches to combining estimates of value-added across test outcomes? 15

Exploratory Strategy Leverage longitudinal item response data to estimate six “theta” variables: UNIDIMENSIONAL 1.Math (2PL IRT) 2.Reading (2PL IRT) 3.Math + Reading (unidimensional 2PL IRT) MULTIDIMENSIONAL 1.Bifactor math (Bifactor 2PL) 2.Bifactor reading (Bifactor 2PL) 3.Bifactor g (Bifactor 2PL) Examine the characteristics of each as a “measure” Compare the use of these different variables as the outcome in a (simple) value-added model 16

Data & Methods 17

Bifactor Model 18 i = items, j = item specific factors, g = general factor Technical Details Software: IRTPro 2.1 (Cai, Thissen, Du Toit) Estimation Method: Bock-Aitken, 49 quadrature points References: Cai, 2010; Cai, Yang & Hansen, 2011; Rijmen, 2009; Rijmen et al 2008.

CSAP Tests in Math and Reading Math, Grades 3-10 Content Standards 1.Number & Operation Sense 2.Algebra, Patterns and Functions 3.Statistics and Probability 4.Geometry 5.Measurement 6.Computational Techniques  All 6 standards emphasize application of content for problem solving and communication.  Mix of MC and CR items Reading, Grades 3-10 Content Standards 1.Reading Comprehension 2.Thinking Skills 3.Use of Literary Information 4.Literature  Subcontent: Fiction, Nonfiction, Vocabulary, Poetry  Mix of MC and CR items 19

Longitudinal Item Response Structure: Students nested in Schools Source: Denver Public School District 20

Student & School Characteristics Across grades 5-9: About 62% of DPS students are eligible for free or reduced price lunch services (FRL). About 10% receive special education services (SPED). Between 10-20% are English Language Learners (ELL) Across DPS Schools: VariableMeanSD FRL65%28% SPED11%8% ELL14%13% 21

Students per School 22 Min62 1 st Qu128 Median210 Mean rd Qu315.2 Max Schools

Value-Added Model Fixed Effects Regression – Pools Grade 6 estimates (middle school) and Grade 9 estimates (high school) Outcome: One of the six “theta” variables created. Covariates: – Prior grade achievement in same outcome, – Free/reduced lunch status – English Language Learner status – Special Education status – Grade 9 dummy variable (grade 6 omitted) Empirical Bayes shrinkage estimators 23

Caveats This is a very simple VAM. Limited set of covariates, no school-level vars. No teacher linkages, only schools. Only a single longitudinal cohort of students. No adjustment for measurement error – (Though we did examine possible adjustments. Results not shown here.) 24

Results 25

Correlational Patterns Across Grades for Unidimensional Math and Reading (.76) (.78) (.78) (.76) (.74) math lower triangle; reading upper triangle main diagonal: math/reading correlations 26 Note how strong these correlations are even after 4 years.

Bifactor Loadings (Grade ) 27 Horizontal blue line at loading of.3

Bifactor Loadings (Grade ) 28

Bifactor Loadings (Grade ) 29 Seems clear that something is amiss with grade 7 data in 2005 so we omit this grade in analyses that follow.

Marginal Reliabilities The bifactor and math and reading estimates are rather noisy estimates. Low reliability at the student level. 30

Correlational Patterns Across Grades for Bifactor Math & Reading (-.23) (-.13) (-.20) (-.18) math lower diagonal; reading upper diagonal main diagonal: math/reading correlations 31

Regression Results with Unidimensional Outcomes Unidimensional Approach MathReadingCombined Prior Grade Theta0.79*0.81*0.86* Free/Reduced Price Lunch Eligible *-0.03 English Language Learner Student has an IEP -0.11*-0.15*-0.10* R 2 for model w/ school fixed effects R 2 for model w/ NO school fixed effects Increase in R 2 due to schools Note: Each outcome is standardized, so coefficients can be interpreted in an effect size metric. * p <.05 32

Regression Results with Bifactor Outcomes Multidimensional Approach MathReadingg Prior Grade Theta0.33*0.48*0.84* Free/Reduced Price Lunch Eligible *0.00 English Language Learner *0.01 Student has an IEP *-0.11* R 2 for model w/ school fixed effects R 2 for model w/ NO school fixed effects Increase in R 2 due to schools Note: Each outcome is standardized, so coefficients can be interpreted in an effect size metric. * p <.05

School “Effects” Distributions from Unidimensional vs. Bifactor Outcomes 34 SD Math = 0.22 SD Read = 0.13 SD Comp = 0.18 Note: These are shrunken VA estimates SD Math = 0.11 SD Read = 0.11 SD g = 0.21

Unidimensional vs. Bifactor Math SD Uni Math = 0.22 SD BF Math = 0.11 Note: These are shrunken VA estimates 35

Unidimensional vs. Bifactor Reading SD Uni Read = 0.13 SD BF Read = 0.11 Note: These are shrunken VA estimates 36

VA Comparisons: Uni Math, Uni Reading vs. g Value-added for math seems mostly redundant with value-added for g (r =.98); but looking at reading separately yields some unique information (r =.82). 37

VA g is equivalent to VA from combined math and reading 38

Math vs. Reading: With and Without g 39

Math vs. Reading VA within Method Bifactor VA by Subject Unidimensional VA by Subject 40

Relationship of VA with School-Level Status Variables If low correlations with these variables was considered an indication of a VA indicator that successfully leveled the playing field, the school effects associated with bifactor math outcomes would “win”. 41

Discussion 42

Summary When math and reading outcomes are quantitatively combined (VAcomp or taking average of VA across subjects), this is essentially equivalent to estimating VA for “g”. Math and reading items tend to load strongly on g – Math items load weakly on math-specific bifactor – Reading items have moderate loadings on reading-specific bifactor Evidence that math and reading bifactors are not just noise. School fixed effects explain more variability in math/reading factors than in traditional unidimensional measures. There is unique information about reading that would be missed if math and reading were combined. 43

Limitations & Next Steps Limitations: No links to teachers. No access to actual test forms (and items) that were administered. Next steps: Examine loadings by content and process standards in test blueprints. Do results generalize to – schools & districts throughout the state? – multiple cohorts of students? – other tests? – more complex VAMs (control for unit-level aggregates)? 44

Tough Conceptual Questions What is g? – Is it sensitive to instruction? – Is it what we want to hold teachers and schools accountable for increasing? If a test measures something beyond g, what is that something? Can it be distinguished? Value-added to what? 45

Claims from the Smarter-Balanced Large-Scale Assessment Consortium In the domain of mathematics: Claim 1: Students can explain and apply mathematical concepts and interpret and carry out mathematical procedures with precision and fluency. Claim 2: Students can solve a range of complex well-posed problems in pure and applied mathematics, making productive use of knowledge and problem- solving strategies. Claim 3: Students can clearly and precisely construct viable arguments to support their own reasoning and to critique the reasoning of others. Claim 4: Students can analyze complex, real-world scenarios and can construct and use mathematical models to interpret and solve problems. 46 Should each of these claims be measured with a unique score? Should we expect variability in teacher efficacy on each? Or are all of these claims wrapped up in g?