1 Causal Rasch Models and Individual Growth Trajectories National Center for the Improvement of Educational Assessment January 18, 2011 A.Jackson Stenner.

Slides:

Advertisements

Similar presentations

Test Development.

Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.

11 Simple Linear Regression and Correlation CHAPTER OUTLINE

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.

Simple Linear Regression and Correlation

Statistical Decision Making

Objectives (BPS chapter 24)

1 Artifact Corrected Correlations between theoretical text complexity and empirical text complexity PCRC – San Diego February 7-10, 2013 A.Jackson Stenner.

© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.

Experimental Design, Statistical Analysis CSCI 4800/6800 University of Georgia Spring 2007 Eileen Kraemer.

Statistics for Business and Economics

9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.

The Simple Regression Model

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

Personality, 9e Jerry M. Burger

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Chapter 9: Introduction to the t statistic

Introduction to Regression Analysis, Chapter 13,

Measurement Problems within Assessment: Can Rasch Analysis help us? Mike Horton Bipin Bhakta Alan Tennant.

Chapter Nine: Evaluating Results from Samples Review of concepts of testing a null hypothesis. Test statistic and its null distribution Type I and Type.

RSBM Business School Research in the real world: the users dilemma Dr Gill Green.

ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland

Hypothesis Testing in Linear Regression Analysis

Determining Sample Size

1 Causal Rasch Models IOMW April 11-12, 2012 Vancouver, Canada A.Jackson Stenner Donald S. Burdick Mark H. Stone.

Modeling errors in physical activity data Sarah Nusser Department of Statistics and Center for Survey Statistics and Methodology Iowa State University.

The Lexile Framework for Reading: Integrating Measurement and Instruction COPENHAGEN Sunday, August 14, 2011 A.Jackson Stenner Chairman & CEO, MetaMetrics.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Technical Adequacy Session One Part Three.

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

Understanding Statistics

Statistics for Business and Economics Chapter 10 Simple Linear Regression.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.

© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.

Correlational Research Chapter Fifteen Bring Schraw et al.

Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.

Perspectives on Text Complexity in 2011 February 23, 2011 Presented by: A. Jackson Stenner.

Question paper 1997.

1 How to Model and Test for the Mechanisms that make Measurement Systems Tick IMEKO Jena, Germany Wednesday, August 31, 2011 A.Jackson Stenner Chairman.

Introduction To Statistics. Statistics, Science, ad Observations What are statistics? What are statistics? The term statistics refers to a set of mathematical.

Chapter Eight: Using Statistics to Answer Questions.

1 The Role of Statistics in Engineering ENM 500 Chapter 1 The adventure begins… A look ahead.

Chapter 8: Simple Linear Regression Yang Zhenlin.

URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.

FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.

How to Fool Yourself with SEM James G. Anderson, Ph.D Purdue University.

Essentials for Measurement. Basic requirements for measuring 1) The reduction of experience to a one dimensional abstraction. 2) More or less comparisons.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.

BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.

Hypothesis Testing and Statistical Significance

Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.

Chapter 10: The t Test For Two Independent Samples.

Chapter 9 Introduction to the t Statistic

Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.

Toward a Theory Relating Text Complexity, Reader Ability and Comprehension AERA New Orleans April 10, 2011 Jackson Stenner Chairman & CEO, MetaMetrics.

A Causal Model for Relating Text Complexity, Reader Ability and Comprehension Pacific Coast Reading Conference February 3-6, 2011 Jackson Stenner Chairman.

Collecting and Interpreting Quantitative Data

Presentation transcript:

1 Causal Rasch Models and Individual Growth Trajectories National Center for the Improvement of Educational Assessment January 18, 2011 A.Jackson Stenner Chairman & CEO, MetaMetrics

2 “Although adopting a probabilistic model for describing responses to an intelligence test, we have taken no sides in a possible argument about responses being ultimately explainable in causal terms.” (Rasch, 1960, p.90)

3 Three well researched constructs  Reader ability  Text Complexity  Comprehension

4 Reader Ability Temperature

5 Reading is a process in which information from the text and the knowledge possessed by the reader act together to produce meaning. Anderson, R.C., Hiebert, E.H., Scott, J.A., & Wilkinson, I.A.G. (1985) Becoming a nation of readers: The report of the Commission on Reading Urbana, IL: University of Illinois

6 An Equation = Reader Ability Text Complexity Comprehension - Conceptual Statistical Raw Score = i e (RA – TC ) i 1 + e (RA – TC i ) RA = Reading Ability TC = Text Calibrations

7 Each of these thermometers is engineered to use the same correspondence table Each of these reading tests is engineered to use the same correspondence table

8 Correspondence Table: C o and Lexile Raw Score CoCo Lexile Raw Score CoCo Lexile Raw Score CoCo Lexile Raw Score CoCo Lexile L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L

9 Aspect/ConstructTemperatureReader Ability Object of measurementPerson InstrumentThermometerReading test Measurement outcomeNumber of theory calibrated cavities (0-45) that fail to reflect green light Count correct on a collection of 45 theory calibrated test items Substantive theoryThermodynamic theoryLexile Theory Unit of measurementDegree Fahrenheit ( o F)Lexile (L) Correspondence table/ calibration equation Exploits a chemical reaction and light absorption to table temperature as a function (Guttman Model) of a sufficient statistic Exploits semantic and syntactic features of test items to table reader ability as a function (Rasch model) of a sufficient statistic Measure/QuantityMeasurement outcome converted into a quantity via the substantive theory Measurement outcome converted into a quantity via the substantive theory Readable technologyNexTemp Thermometer™Oasis™ General objectivityPoint estimates of temperature are independent of the thermometer Point estimates of reader ability are independent of the reading test Anatomy of Two Measurement Procedures

10 Ten Features of Causal Response Models – whether Guttman or Rasch 1. Both measurement procedures depend on within-person causal interpretations of how these two instruments work. NexTemp uses a causal Guttman Model, The Lexile Framework for Reading uses a causal Rasch Model. 2. In both cases the measurement mechanism is well specified and can be manipulated to produce predictable changes in measurement outcomes (e.g. percent correct or percent of cavities turning black). 3. Item parameters are supplied by substantive theory and, thus, person parameter estimates are generated without reference to or use of any data on other persons or populations. Therefore, effects of the examinee population have been completely eliminated from consideration in the estimation of person parameters for reader ability and temperature.

11 4. In both cases the quantitivity hypothesis can be experimentally tested by evaluating the trade-off property. A change in the person parameter can be off-set or traded-off for a compensating change in the measurement mechanism to hold constant the measurement outcome. 5. When uncertainty in item difficulties is too large to ignore, individual item difficulties may be a poor choice to use as calibration parameters in causal models. As an alternative we recommend, when feasible, averaging over individual item difficulties to produce “ensemble” means. These means can be excellent dependent variables for testing causal theories. 6. Index models are not causal because manipulation of neither the indicators nor the person parameter produces a predictable change in the measurement outcome. Ten Features of Causal Response Models – whether Guttman or Rasch cont’d.

12 7. Causal Rasch models are individual centered and are explanatory at both within-subject and between-subject levels. The attribute on which I differ from myself a decade ago is the same attribute on which I differ from my brother today. 8. When data fit a Rasch model differences between person measures are objective. When data fit a causal Rasch model absolute person measures are objective (i.e. independent of instrument). 9. The case against an individual causal account, although popular, has been poorly made. Investigators need only experiment to isolate the causal mechanism in their instruments, test for the trade-off property and confirm invariance over individuals. This has been accomplished for a construct, reader ability, that has been described by scholars as the most complex cognitive activity that humans regularly engage in. Given the success with reading, we think it likely that other behavioral constructs can be similarly measured. 10. Causal Rasch models make possible the construction of generally objective growth trajectories. Each trajectory can be completely separated from the instruments used in its construction and from the performance of any other persons whatsoever. Ten Features of Causal Response Models – whether Guttman or Rasch, cont’d.

13 To causally explain a phenomenon [a measurement outcome] is to provide information about the factors [person processes and instrument mechanisms] on which it depends and to exhibit how it depends on those factors. This is exactly what the provision of counterfactual information…accomplishes: we see what factors some explanandum M [measurement outcome, raw score] depends on (and how it depends on those factors) when we have identified one or more variables such that changes in these (when produced by interventions) are associated with changes in M (Woodward, 2003, p.204).

14 How Many Ways Can We Say X Causes Y? X “elicited a greater” YX “impacts” Y X “accounts for” YX “has been linked to” Y Y “is the result of” XX “didn’t diminish” Y Y “because of” XY “depends on” X X “has led to” YX “largely motivates” Y Y “stemmed from” XX “proved critical to” Y X “fosters” YX “changes” Y X “triggers” YX “affects” Y

15 Psychometrics vs. Metrology Aspect Interpretation of Probability Group Centered Interpretation involves 100 people with the same ability answering a single item Individual Centered Interpretation involves administering 100 items with the same calibration to a single person Person MeasuresA person’s response record is embedded in different samples and each group specific Rasch analysis produces a different measure A person’s response record is evaluated against theory-referenced calibrations Measurement ErrorTraditional test theory uses a sample standard deviation and a sample correlation to compute an SEM which is intended to characterize the individual ISEM is the within person standard deviation over replications of the measurement procedure Data Fit to the ModelVaries with the locally constructed frame of reference, sample dependent Fit is to a theory, thus, sample independent ValidityCorrelational, thus, sample dependent Causal within person, thus, sample independent

16

17

18 r = r” = R 2” = RMSE” = 99.8L Figure 1: Plot of Theoretical Text Complexity versus Empirical Text Complexity for 475 articles “Pizza Problems”

19 What could account for the 8% unexplained variance?  Missing Variables  Improved Proxies/Operationalizations  Expanded Error Model  Rounding Error  Interaction between Individual and Text  Psychometric Uncertainty Principle

20

21 May 2016 (12 th Grade) Text Demands for College and Career May 2007 – Dec Encounters 117,484 Words 2,894 Items 848 Minutes Student th Grade Male Hispanic Paid Lunch

22 Item-Based vs. Ensemble-Based Psychometrics

23 Reading Task-Complexity Plane for Dichotomous Items Native Lexile Added Hardness Added Easiness Production Cloze Auto-Generated Cloze Unit Size Adjustment Applied to Logits

24 Comparing Item-Based vs. Ensemble- Based Psychometrics  Item-Based – Item statistics – Item characteristic curves – DIF for items  Ensemble-Based – Ensemble statistics – Ensemble characteristic curves – DIF for ensembles

25 The Ensemble  Objective: Correspondence Table – Raw score to Lexile measure  What we think we know – Mean and spread of item distributions for a passage  What is assumed to be unknown – Individual item difficulties 1300L (132L)

26 The Process – Iteration 1 STEP 1 Sample 45 Item Difficulties from Ensemble STEP 2 Compute Lexile Measures for Each Raw Score (1 to 44) STEP 3 Table Results Raw Score Lexile Measure 362L 514L 584L L Sample 1

27 The Process – Iteration 2 STEP 1 Sample 45 Item Difficulties from Ensemble STEP 2 Compute Lexile Measures for Each Raw Score (1 to 44) STEP 3 Table Results Raw Score Lexile Measure 362L 514L 584L L Lexile Measure 354L 506L 575L L Sample 1 Sample 2

28 The Process – Iteration 1,000 STEP 1 Sample 45 Item Difficulties from Ensemble STEP 2 Compute Lexile Measures for Each Raw Score (1 to 44) STEP 3 Table Results Raw Score Lexile Measure 362L 514L 584L L Lexile Measure 354L 506L 575L L Sample 1 … Sample 1,000 Mean Lexile Measure 378L 509L 589L L Mean of 1,000

29 Closing No matter how it is sliced and diced, analyses of joint and conditional probability distributions yield no more than patterns of association. Nothing in the response data nor Rasch analyses of these data exposes the processes (features of the object of measurement) or mechanisms (features of the instrument) that are hypothesized to be conjointly causal on the measurement outcomes.

30 A. Jackson Stenner CEO, MetaMetrics Contact Info: