CILVR 2006 Slide 1 May 18, 2006 A Bayesian Perspective on Structured Mixtures of IRT Models Robert Mislevy, Roy Levy, Marc Kroopnick, and Daisy Wise University.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Psychometrics to Support RtI Assessment Design Michael C. Rodriguez University of Minnesota February 2010.

Level 1 Recall Recall of a fact, information, or procedure. Level 2 Skill/Concept Use information or conceptual knowledge, two or more steps, etc. Level.

StatisticalDesign&ModelsValidation. Introduction.

Skills Diagnosis with Latent Variable Models. Topic 1: A New Diagnostic Paradigm.

The Art and Science of Teaching (2007)

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.

Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.

SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational Assessment Robert J. Mislevy, Linda S. Steinberg, and Russell.

©2001 CBMS Math Preparation of Teachers Teachers need to study the mathematics of a cluster of grade levels, both to be ready for the various ways in which.

University of Maryland Slide 1 May 2, 2001 ECD as KR * Robert J. Mislevy, University of Maryland Roy Levy, University of Maryland Eric G. Hansen, Educational.

When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.

Inference & Culture Slide 1 October 21, 2004 Cognitive Diagnosis as Evidentiary Argument Robert J. Mislevy Department of Measurement, Statistics, & Evaluation.

(1) If Language is a Complex Adaptive System, What is Language Assessment? Presented at “Language as a Complex Adaptive System”, an invited conference.

Developing Ideas for Research and Evaluating Theories of Behavior

AERA 2010 Robert L. Linn Lecture Slide 1 May 1, 2010 Integrating Measurement and Sociocognitive Perspectives in Educational Assessment Robert J. Mislevy.

FERA 2001 Slide 1 November 6, 2001 Making Sense of Data from Complex Assessments Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell.

Dr. Robert Mayes University of Wyoming Science and Mathematics Teaching Center

Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.

Science and Engineering Practices

1 New York State Mathematics Core Curriculum 2005.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.

Mathematics Grade Level Considerations for Grades 6-8.

Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.

Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Terry Vendlinski Geneva Haertel SRI International

Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.

Implementing Mathematics K-6 Using the syllabus for consistency of teacher judgement © 2006 Curriculum K-12 Directorate, NSW Department of Education and.

Development of An ERROR ESTIMATE P M V Subbarao Professor Mechanical Engineering Department A Tolerance to Error Generates New Information….

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Quick Glance At ACTASPIRE Math

Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.

WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.

Learning Progressions: Some Thoughts About What we do With and About Them Jim Pellegrino University of Illinois at Chicago.

1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.

The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.

The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium CCSSO NCSA San Diego CA.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Construct-Centered Design (CCD) What is CCD? Adaptation of aspects of learning-goals-driven design (Krajcik, McNeill, & Reiser, 2007) and evidence- centered.

Questions to Ask Yourself Regarding ANOVA. History ANOVA is extremely popular in psychological research When experimental approaches to data analysis.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Question paper 1997.

MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.

Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.

Robert J. Mislevy University of Maryland National Center for Research on Evaluation, Standards, and Student Testing (CRESST) NCME San Diego, CA April 15,

CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.

NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.

Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.

Chapter 6 - Standardized Measurement and Assessment

Yr 7.  Pupils use mathematics as an integral part of classroom activities. They represent their work with objects or pictures and discuss it. They recognise.

Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Knowing What Students Know Ganesh Padmanabhan 2/19/2004.

Chapter 8 Introducing Inferential Statistics.

Animals use their senses for survival

Adopting The Item Response Theory in Operations Management Research

Validity and Reliability

Item Analysis: Classical and Beyond

Analyzing Redistribution Matrix with Wavelet

Item Analysis: Classical and Beyond

Item Analysis: Classical and Beyond

Presentation transcript:

CILVR 2006 Slide 1 May 18, 2006 A Bayesian Perspective on Structured Mixtures of IRT Models Robert Mislevy, Roy Levy, Marc Kroopnick, and Daisy Wise University of Maryland Presented at the Conference “Mixture Models in Latent Variable Research,” May 18-19, 2006, Center for Integrated Latent Variable Research, University of Maryland

CILVR 2006 Slide 2 May 18, 2006 Probability is not really about numbers; it is about the structure of reasoning. Glen Shafer, quoted in Pearl, 1988, p. 77

CILVR 2006 Slide 3 May 18, 2006 Where we are going The structure of assessment arguments Probability-based reasoning in assessment Increasingly complex psychological narratives entail… »Extended view of “data” »More encompassing probability models, from Classical test theory to mixtures of structured Item response theory (IRT) models

CILVR 2006 Slide 4 May 18, 2006 The structure of assessment arguments A construct-centered approach would begin by asking what complex of knowledge, skills, or other attribute should be assessed, … Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? Messick, 1992, p. 17

CILVR 2006 Slide 5 May 18, 2006 An Example: Mental Rotation Tasks Stimulus

CILVR 2006 Slide 6 May 18, 2006 An Example: Mental Rotation Tasks Target

CILVR 2006 Slide 7 May 18, 2006 Total Scores Counts of multiple observations which all signify proficiency--More is better. Everyone takes same tasks. Comparisons/decisions based on total scores X. Nature of tasks outside the model. No distinction between observation (X) and target of inference (proficiency at mental rotation); No probability-based model for characterizing evidence. No notion of “measurement error” (except when your score is lower than you think it ought to be)

CILVR 2006 Slide 8 May 18, 2006 A properly-structured statistical model overlays a substantive model for the situation with a model for our knowledge of the situation, so that we may characterize and communicate what we come to believe—as to both content and conviction—and why we believe it—as to our assumptions, our conjectures, our evidence, and the structure of our reasoning. Mislevy & Gitomer, 1996 Enter probability-based reasoning

CILVR 2006 Slide 9 May 18, 2006 Defining variables A frame of discernment is all the distinctions one can make within a particular model (Shafer, 1976). »“To discern” means “to become aware of” and “to make distinctions among.” In assessment, the variables relate to the claims we would like to make about students and the observations we need to make. All are framed and understood in terms appropriate to the purpose, the context, and psychological perspective that ground the application.

CILVR 2006 Slide 10 May 18, 2006 Conditional independence In assessment, the statistical concept of conditional independence formalizes the working assumption that if the values of the student model variables were known, there would have no further information in the details. We use a model at a given grainsize or with certain kinds of variables not because we think that is somehow ”true”, but rather because it adequately expresses patterns in the data in light of our perspective on knowledge/skill and the purpose of the assessment.

CILVR 2006 Slide 11 May 18, 2006 Classical Test Theory (CTT) Still total scores, but with the idea of replication—multiple “parallel” tests X j that may differ, but are all “noisy” versions of the same “true score”  : X ij =  i + e ij, where e ij ~ N(0,  e 2 ). Details of cognition, observation task by task, and content of tasks lie outside the probability model.

CILVR 2006 Slide 12 May 18, 2006 Classical Test Theory (CTT) Directed graph representation of Bayesian probability model for multiple parallel tests. Note direction of conditional probability in model; contrast w. inference about  once Xs are observed.

CILVR 2006 Slide 13 May 18, 2006 Classical Test Theory (CTT) Posterior inference, via Bayes theorem: The full probability model: Note what’s there, conditional independence, relationships, what’s not there.

CILVR 2006 Slide 14 May 18, 2006 Classical Test Theory (CTT) For Student i, posterior inference is probability distribution for  i ; that is, »Expected value for test score, along with standard deviation of distribution for  i. »People can be seen as differing only as to propensity to make correct responses. Inference bound to particular test form. Also posterior distribution for mean & variance of  s, error variance, and reliability, or

CILVR 2006 Slide 15 May 18, 2006 Item Response Theory (IRT) Modeling now at level of items »Adaptive testing, matrix sampling, test assembly Multiple non-parallel item responses X ij that may differ, but all depend on  i : where  j is the possibly vector-valued parameter of Item j. Conditional independence of item responses given  and parameter(s) for each item.

CILVR 2006 Slide 16 May 18, 2006 Item Response Theory (IRT) Same item ordering for all people. Content of tasks still outside the probability model.  easier  harder  The Rasch IRT model for 0/1 items: Prob(X ij =1|  i,  j ) =  (  i -  j ), where  (x) = exp(x)/[1+exp(x)]. Item 1Item 4Item 5Item 3Item 6Item 2 Person APerson BPerson D  less able  more able 

CILVR 2006 Slide 17 May 18, 2006 A full Bayesian model for IRT The measurement model: Item responses conditionally independent given person & item parameters Distributions for person parameters Distributions for item parameters Distribution for parameter(s) of distributions for item parameters Distribution for parameter(s) of distributions for person parameters

CILVR 2006 Slide 18 May 18, 2006 A full Bayesian model for IRT Posterior inference: For people, posterior distributions for  s, or propensity to make correct responses. How/why outside model. For items, posterior distributions for  s. Some harder, some easier; How/why outside model. Presumed to be the same for all people.

CILVR 2006 Slide 19 May 18, 2006 Summary test scores … have often been though of as “signs” indicating the presence of underlying, latent traits. An alternative interpretation of test scores as samples of cognitive processes and contents, and of correlations as indicating the similarity or overlap of this sampling, is equally justifiable and could be theoretically more useful. The evidence from cognitive psychology suggests that test performances are comprised of complex assemblies of component information-processing actions that are adapted to task requirements during performance. (Snow & Lohman, 1989, p. 317) The “cognitive revolution”

CILVR 2006 Slide 20 May 18, 2006 Research on mental rotation tasks Roger Shepard’s studies in the early 70’s shows difficulty of mental rotation tasks depends mainly on how much it is rotated. Shepard, R. N. & Meltzer, J (1971) Mental rotation of three-dimensional objects. Science, 171, Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental images. In W. G. Chase (Ed.), Visual Information Processing (pp. 75– 176). New York: Academic Press.

CILVR 2006 Slide 21 May 18, 2006 A structured IRT model: The LLTM The linear logistic test model (Fischer, 1973) Rasch model but item parameters conditional on item features q j : where  k is a contribution to difficulty from feature k. Now difficulty modeled as function of task features, as correlated with demands for aspects of knowledge or processing. Conditional independence of item parameters given features: They “explain” item difficulty. Task features bring psychological theory into model. For mental rotation, can use degree of rotation for q j.

CILVR 2006 Slide 22 May 18, 2006 Content of tasks inside the probability model.  easier  harder  Item 1Item 4Item 5Item 3Item 6Item 2 Person APerson BPerson D  less able  more able  Less rotationMore rotation A structured IRT model: The LLTM

CILVR 2006 Slide 23 May 18, 2006 A full Bayesian model for LLTM The measurement model: Item responses conditionally independent given person & item parameters Distributions for item parameters conditional on item features Note on extensions: Rating scale, count, vector-valued observations. Multivariate student models for multiple ways people differ; conditional independence given vector . Different item features relevant to different components of 

CILVR 2006 Slide 24 May 18, 2006 A full Bayesian model for LLTM Posterior inference: For people, posterior distributions for  s, or propensity to make correct responses. How/why interpreted through psychological model. For items, posterior distributions for  s. Still resumed to be the same for all people. Some harder, some easier; How/why inside model. Hypothesized patterns can be checked statistically.

CILVR 2006 Slide 25 May 18, 2006 Structured mixtures of IRT models What makes items hard may depend on solution strategy. »John French (1965) on mental rotation. »Siegler re balance beam--development. »Gentner & Gentner on electrical circuits. »Tatsuoka on mixed number subtraction Theory says what the relationship ought to be; the trick is putting it into a probability model.

CILVR 2006 Slide 26 May 18, 2006 Structured mixtures of IRT models For mental rotation items: Difficulty depends on angle of rotation if mental rotation strategy. Difficulty depends on acuteness of angle if analytic strategy. Can make inferences about group membership using items that are relatively hard under one strategy, relatively easy under the other. Mislevy, R.J., Wingersky, M.S., Irvine, S.H., & Dann, P.L. (1991). Resolving mixtures of strategies in spatial visualization tasks. British Journal of Mathematical and Statistical Psychology, 44,

CILVR 2006 Slide 27 May 18, 2006  easier  harder  Item 1Item 4Item 5Item 3Item 6Item 2 Person APerson BPerson D  less able  rotate  more able  Less rotationMore rotation A structured IRT mixture  easier  harder  Item 4Item 2Item 6Item 1Item 5Item 3 Person BPerson APerson D  less able  analytic  more able  More acuteLess acute OR

CILVR 2006 Slide 28 May 18, 2006 Structured mixtures of IRT models Groups of people distinguished by the way they solve tasks.  ik =1 if Person i is in Group k, 0 if not. People differ as to knowledge, skills, proficiencies within group, expressed by  ik ’s. Items differ as to knowledge, skills, demands within group, expressed by q jk ’s. Thus, LLTM models within groups. Conditional independence of responses given person, item, and group parameters.

CILVR 2006 Slide 29 May 18, 2006 Structured mixtures of IRT models Consider M strategies; each person applies one of them to all items, and item difficulty under strategy m depends on features of the task that are relevant under this strategy in accordance with an LLTM structure. The difficulty of item j under strategy m is The probability of a correct response is

CILVR 2006 Slide 30 May 18, 2006 Structured mixtures of IRT models Item responses conditionally independent given person’s group and person & item parameters relevant to that group. Distributions for item parameters conditional on item features and feature effects relevant to each group

CILVR 2006 Slide 31 May 18, 2006 Structured mixtures of IRT models Posterior inference: For people, posterior probs for  s, or group memberships. Posterior distributions for  s within groups. How/why interpreted through psychological model. For items, posterior distributions for  s for each group. Items differentially difficult for different people, based on theories about what makes items hard under different strategies.

CILVR 2006 Slide 32 May 18, 2006 Conclusion Evolution of psychometric models joint with psychology. Bayesian probability framework well suited for building models that correspond to “narratives” Can’t just “throw data over the wall” like was done with CTT; Need to build coordinated observational model and probability model, from psychological foundation.