Inference & Culture Slide 1 October 21, 2004 Cognitive Diagnosis as Evidentiary Argument Robert J. Mislevy Department of Measurement, Statistics, & Evaluation.

Slides:



Advertisements
Similar presentations
Skills Diagnosis with Latent Variable Models. Topic 1: A New Diagnostic Paradigm.
Advertisements

A Cognitive Diagnosis Model for Cognitively-Based Multiple-Choice Options Jimmy de la Torre Department of Educational Psychology Rutgers, The State University.
The Art and Science of Teaching (2007)
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
Show Me an Evidential Approach to Assessment Design Michael Rosenfeld F. Jay Breyer David M. Williamson Barbara Showers.
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
© 2004 Prentice-Hall, Inc.Chap 1-1 Basic Business Statistics (9 th Edition) Chapter 1 Introduction and Data Collection.
SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational Assessment Robert J. Mislevy, Linda S. Steinberg, and Russell.
University of Maryland Slide 1 May 2, 2001 ECD as KR * Robert J. Mislevy, University of Maryland Roy Levy, University of Maryland Eric G. Hansen, Educational.
SLRF 2010 Slide 1 Oct 16, 2010 What is the construct in task-based language assessment? Robert J. Mislevy Professor, Measurement, Statistics and Evaluation.
CILVR 2006 Slide 1 May 18, 2006 A Bayesian Perspective on Structured Mixtures of IRT Models Robert Mislevy, Roy Levy, Marc Kroopnick, and Daisy Wise University.
Inference & Culture Slide 1 April 29, 2003 Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy Department of Measurement,
Chapter Sampling Distributions and Hypothesis Testing.
AERA 2010 Robert L. Linn Lecture Slide 1 May 1, 2010 Integrating Measurement and Sociocognitive Perspectives in Educational Assessment Robert J. Mislevy.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments 1 Introduction to Comparability Inclusive Assessment Seminar.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.
FERA 2001 Slide 1 November 6, 2001 Making Sense of Data from Complex Assessments Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell.
Today Concepts underlying inferential statistics
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Classroom Assessment A Practical Guide for Educators by Craig A
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
ADL Slide 1 December 15, 2009 Evidence-Centered Design and Cisco’s Packet Tracer Simulation-Based Assessment Robert J. Mislevy Professor, Measurement &
Item Response Theory Using Bayesian Networks by Richard Neapolitan.
Assessment Report Department of Psychology School of Science & Mathematics D. Abwender, Chair J. Witnauer, Assessment Coordinator Spring, 2013.
Causality, Reasoning in Research, and Why Science is Hard
Hypothesis Testing:.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Click to edit Master title style  Click to edit Master text styles  Second level  Third level  Fourth level  Fifth level  Click to edit Master text.
1 G Lect 1a Lecture 1a Perspectives on Statistics in Psychology Applications of statistical arguments Describing central tendency and variability.
Chapter 1 Basics of Probability.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Classroom Assessment A Practical Guide for Educators by Craig A
Chapter 8 Introduction to Hypothesis Testing
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
The Role of Information in Systems for Learning Paul Nichols Charles DePascale The Center for Assessment.
IIT BOMBAYIDP in Educational Technology * Paper Planning Template Resource – Paper-Planning-Template(SPT)Version 1.0, Dec 2013 Download from:
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Learning Progressions: Some Thoughts About What we do With and About Them Jim Pellegrino University of Illinois at Chicago.
The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.
ATTRIBUTEDESCRIPTION Focal Knowledge, Skills, Abilities The primary knowledge / skills / abilities (KSAs) targeted by this design pattern. RationaleHow/why.
Inductive Generalizations Induction is the basis for our commonsense beliefs about the world. In the most general sense, inductive reasoning, is that in.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Basic Business Statistics
On Layers and Objects in Assessment Design Robert Mislevy, University of Maryland Michelle Riconscente, University of Maryland Robert Mislevy, University.
Analyzing Research Data and Presenting Findings
First topic: clustering and pattern recognition Marc Sobel.
Unpacking the Elements of Scientific Reasoning Keisha Varma, Patricia Ross, Frances Lawrenz, Gill Roehrig, Douglas Huffman, Leah McGuire, Ying-Chih Chen,
Intuitive Test Theory Robert J. Mislevy CRESST/ University of Maryland September 9, 2004 Presented at the 2004 CRESST Conference, September 9-10, UCLA,
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
EDUCATIONAL ASSESSMENT. DIAGNOSTIC ASSESSMENT IN EDUCATION The 2001 National Research Council (NRC) report Knowing What Students Know (KWSK) Cognitive.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Robert J. Mislevy University of Maryland National Center for Research on Evaluation, Standards, and Student Testing (CRESST) NCME San Diego, CA April 15,
Chapter 10 Copyright © Allyn & Bacon 2008 This multimedia product and its contents are protected under copyright law. The following are prohibited by law:
National Science Education Standards. Outline what students need to know, understand, and be able to do to be scientifically literate at different grade.
INST 275 – Administrative Processes in Government Lecture 4b – Developing Policy Arguments.
Goal of Stochastic Hydrology Develop analytical tools to systematically deal with uncertainty and spatial variability in hydrologic systems Examples of.
Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Physics 114: Lecture 13 Probability Tests & Linear Fitting
The Dual-strategy model of deductive inference
Classroom Assessment A Practical Guide for Educators by Craig A
Classroom Assessment Validity And Bias in Assessment.
Week 3 Class Discussion.
National Conference on Student Assessment
15.1 The Role of Statistics in the Research Process
Understanding Statistical Inferences
Presentation transcript:

Inference & Culture Slide 1 October 21, 2004 Cognitive Diagnosis as Evidentiary Argument Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD October 21, 2004 Presented at the Fourth Spearman Conference, Philadelphia, PA, Oct , Thanks to Russell Almond, Charles Davis, Chun-Wei Huang, Sandip Sinharay, Linda Steinberg, Kikumi Tatsuioka, David Williamson, and Duanli Yan.

Inference & Culture Slide 2 October 21, 2004 Introduction An assessment is a particular kind of evidentiary argument. Parsing a particular assessment in terms of the elements of an argument provides insights into more visible features such as tasks and statistical models. Will look at cognitive diagnosis from this perspective.

Inference & Culture Slide 3 October 21, 2004 Toulmin's (1958) structure for arguments Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them.

Inference & Culture Slide 4 October 21, 2004 Specialization to assessment The role of psychological theory: »Nature of claims & data »Warrant connecting claims and data: “If student were x, would probably do y” The role of probability-based inference: “Student does y; what is support for x’s?” Will look first at assessment under behavioral perspective, then see how cognitive diagnosis extends the ideas.

Inference & Culture Slide 5 October 21, 2004 Behaviorist Perspective The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur. D.R. Krathwohl & D.A. Payne, 1971, p

The claim addresses the expected value of performance of the targeted kind in the targeted situations.

The student data address the salient features of the responses.

The task data address the salient features of the stimulus situations (i.e., tasks).

The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.

Inference & Culture Slide 10 October 21, 2004 Statistical Modeling of Assessment Data Claims in terms of values of unobservable variables in student model (SM)-- characterize student knowledge. Data modeled as depending probabilistically on SM vars. Estimate conditional distributions of data given SM vars. Bayes theorem to infer SM variables given data. Claims in terms of values of unobservable variables in student model (SM)-- characterize student knowledge. Data modeled as depending probabilistically on SM vars. Estimate conditional distributions of data given SM vars. Bayes theorem to infer SM variables given data.

Inference & Culture Slide 11 October 21, 2004 Specialization to cognitive diagnosis Information-processing perspective foregrounded in cognitive diagnosis Student model contains variables in terms of, e.g., »Production rules at some grain-size »Components / organization of knowledge »Possibly strategy availability / usage Importance of purpose

Inference & Culture Slide 12 October 21, 2004 Responses consistent with the "subtract smaller from larger" bug “Buggy arithmentic”: Brown & Burton (1978); VanLehn (1990)

Inference & Culture Slide 13 October 21, 2004 Some Illustrative Student Models in Cognitive Diagnosis Whole number subtraction: »~ 200 production rules (VanLehn, 1990) »Can model at level of bugs (Brown & Burton) or at the level of impasses (VanLehn) John Anderson’s ITSs in algebra, LISP »~ 1000 production rules »1-10 in play at a given time Reverse-engineered large-scale tests »~10-15 skills Mixed number subtraction (Tatsuoka) »~5-15 production rules / skills

Inference & Culture Slide 14 October 21, 2004 Mixed number subtraction Based on example from Prof. Kikumi Tatsuoka (1982). »Cognitive analysis & task design »Methods A & B »Overlapping sets of skills under methods Bayes nets described in Mislevy (1994): »Five “skills” required under Method B. »Conjunctive combination of skills »DINA stochastic model

Inference & Culture Slide 15 October 21, 2004 Skill 1: Basic fraction subtraction Skill 2: Simplify/Reduce Skill 3: Separate whole number from fraction Skill 4: Borrow from whole number Skill 5: Convert whole number to fractions

W :Sampling theory since so and for items with feature set defining Class 1 D11 D11j : Sue's answer to Item j, Class 1 D2j of Item j D2j of Item j D21j structure and contents of Item j, Class1 C : Sue's probability of answering a Class 1 subtraction problem with borrowing isp1 W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. W :Sampling theory since so and for items with feature set defining Class n D11 D1nj : Sue's answer to Item j, Class n D2j of Item j D2j of Item j D2nj structure and contents of Item j, Class n C : Sue's probability of answering a Classn subtraction problem with borrowing isp n since and so... C: Sue's configuration of production rules for operating in the domain (knowledge and skill) isK Like behaviorist inference at level of behavior in classes of structurally similar tasks.

W :Sampling theory since so and for items with feature set defining Class 1 D11 D11j : Sue's answer to Item j, Class 1 D2j of Item j D2j of Item j D21j structure and contents of Item j, Class1 C : Sue's probability of answering a Class 1 subtraction problem with borrowing isp1 W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. W :Sampling theory since so and for items with feature set defining Class n D11 D1nj : Sue's answer to Item j, Class n D2j of Item j D2j of Item j D2nj structure and contents of Item j, Class n C : Sue's probability of answering a Classn subtraction problem with borrowing isp n since and so... C: Sue's configuration of production rules for operating in the domain (knowledge and skill) isK Structural patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.

Inference & Culture Slide 19 October 21, 2004 W :Sampling theory since so and for items with feature set defining Class 1 D11 D11j : Sue's answer to Item j, Class 1 D2j of Item j D2j of Item j D21j structure and contents of Item j, Class1 C : Sue's probability of answering a Class 1 subtraction problem with borrowing isp1 W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. W :Sampling theory since so and for items with feature set defining Class n D11 D1nj : Sue's answer to Item j, Class n D2j of Item j D2j of Item j D2nj structure and contents of Item j, Class n C : Sue's probability of answering a Classn subtraction problem with borrowing isp n since and so... C: Sue's configuration of production rules for operating in the domain (knowledge and skill) isK This level distinguishes cognitive diagnosis from subscores. A typical (but not necessary) difference is that cognitive diagnosis has many-to-many relationship between observable variables and student-model variables. As partitions, subscores have 1-1 relationships between scores and inferential targets. This level distinguishes cognitive diagnosis from subscores. A typical (but not necessary) difference is that cognitive diagnosis has many-to-many relationship between observable variables and student-model variables. As partitions, subscores have 1-1 relationships between scores and inferential targets.

Inference & Culture Slide 20 October 21, 2004 Structural and stochastic aspects of inferential models Structural model relates student model variables (  s) to observable variables (xs) »Conjunctive, disjunctive, mixture »Complete vs incomplete (e.g., fusion model) »The Q matrix (next slide) Stochastic model addresses uncertainty »Rule based; logical with noise »Probability-based inference (discrete Bayes nets, extended IRT models) »Hybrid (e.g., Rule Space)

Inference & Culture Slide 21 October 21, 2004 The Q-matrix (Fischer, Tatsuoka) Items Features q jk is extent Feature k pertains to Item j Special case: 0/1 entries and a 1-1 relationship between features and student- model variables.

Inference & Culture Slide 22 October 21, 2004 Conjunctive structural relationship Person i:  i = (  i1,  i2, …,  iK ) »Each  ik =1 if person possesses “skill”, 0 if not. Task j: q j = (q j1, q j2, …, q jK ) » A q jk = 1 if item j “requires skill k”, 0 if not. I ij = 1 if (q jk =1   ik =1) for all k, 0 if (q jk =1 but  ik =0) for any k.

Inference & Culture Slide 23 October 21, 2004 Conjunctive structural relationship: No stochastic model Pr(x ij =1|  i, q j ) = I ij No uncertainty about x given  There is uncertainty about  given x, even if no stochastic part, due to competing explanations (Falmagne): x ij = {0,1} just gives you partitioning into all  s that cover of q j, vs. those that miss with respect to at least one skill.

Inference & Culture Slide 24 October 21, 2004 Conjunctive structural relationship: DINA stochastic model Now there is uncertainty about x given  Pr(x ij =1| I ij =0) =  j0 -- False positive Pr(x ij =1| I ij =1) =  j1 -- True positive Likelihood over n items: Posterior :

Inference & Culture Slide 25 October 21, 2004 The particular challenge of competing explanations Triangulation »Different combinations of data fail to support some alternative explanations of responses, and reinforce others. »Why was an item requiring Skills 1 & 2 wrong? –Missing Skill 1? Missing Skill 2? A slip? –Try items requiring 1 & 3, 2 & 4, 1& 2 again. Degree design supports inferences »Test design as experimental design

Bayes net for mixed number subtraction (Method B)

Simplify/reduce (Skill 2) Mixed number skills Borrow from whole number (Skill 4) Separate whole number from fraction (Skill 3) Basic fraction subtraction (Skill 1) Skills 1 & 3 Skills 1, 3, & 4 Skills 1,2,3,&4 6/7 - 4/7 2/3 - 2/3 3 7/ / /5 4 5/ /7 3 1/ /2 4 4/ /12 4 1/ /3 4 1/ / /3 4 1/ / /3 7 3/5 - 4/ /5 Skills 1 & 2 11/8 - 1/8 Skills 1, 3, 4, & 5 Skills 1, 2, 3, 4, & 5 Convert whole number to fraction (Skill 5) Item 12 Item 4 Item 10 Item 11 Item 18 Item 20 Item 7Item 19 Item 15 Item 17 Item 14 Item 9Item 16 Item 6 Item 8 Structural aspects: The logical conjunctive relationships among skills, and which sets of skills an item requires. Latter determined by its q j vector. Bayes net for mixed number subtraction (Method B)

Stochastic aspects, Part 1: Empirical relationships among skills in population (red). Stochastic aspects, Part 1: Empirical relationships among skills in population (red). Simplify/reduce (Skill 2) Mixed number skills Borrow from whole number (Skill 4) Separate whole number from fraction (Skill 3) Basic fraction subtraction (Skill 1) Skills 1 & 3 Skills 1, 3, & 4 Skills 1,2,3,&4 6/7 - 4/7 2/3 - 2/3 3 7/ / /5 4 5/ /7 3 1/ /2 4 4/ /12 4 1/ /3 4 1/ / /3 4 1/ / /3 7 3/5 - 4/ /5 Skills 1 & 2 11/8 - 1/8 Skills 1, 3, 4, & 5 Skills 1, 2, 3, 4, & 5 Convert whole number to fraction (Skill 5) Item 12 Item 4 Item 10 Item 11 Item 18 Item 20 Item 7Item 19 Item 15 Item 17 Item 14 Item 9Item 16 Item 6 Item 8 Bayes net for mixed number subtraction (Method B)

Stochastic aspects, Part 2: Measurement errors for each item (yellow). Stochastic aspects, Part 2: Measurement errors for each item (yellow). Simplify/reduce (Skill 2) Mixed number skills Borrow from whole number (Skill 4) Separate whole number from fraction (Skill 3) Basic fraction subtraction (Skill 1) Skills 1 & 3 Skills 1, 3, & 4 Skills 1,2,3,&4 6/7 - 4/7 2/3 - 2/3 3 7/ / /5 4 5/ /7 3 1/ /2 4 4/ /12 4 1/ /3 4 1/ / /3 4 1/ / /3 7 3/5 - 4/ /5 Skills 1 & 2 11/8 - 1/8 Skills 1, 3, 4, & 5 Skills 1, 2, 3, 4, & 5 Convert whole number to fraction (Skill 5) Item 12 Item 4 Item 10 Item 11 Item 18 Item 20 Item 7Item 19 Item 15 Item 17 Item 14 Item 9Item 16 Item 6 Item 8 Bayes net for mixed number subtraction (Method B)

Probabilities before observations Bayes net for mixed number subtraction

Probabilities after observations Bayes net for mixed number subtraction

For mixture of strategies across people Bayes net for mixed number subtraction

Inference & Culture Slide 33 October 21, 2004 Extensions (1) More general … »Student models (continuous vars, uses) »Observable variables (richer, times, multiple) »Structural relationships (e.g., disjuncts) »Stochastic relationships (e.g., NIDA, fusion) »Model-tracing temporary structures (VanLehn)

Inference & Culture Slide 34 October 21, 2004 Extensions (2) Strategy use »Single strategy (as discussed above) »Mixture across people (Rost, Mislevy) »Mixtures within people (Huang: MV Rasch) Huang’s example of last of these follows…

A. The truck exerts the same amount of force on the car as the car exerts on the truck. B. The car exerts more force on the truck than the truck exerts on the car. C. The truck exerts more force on the car than the car exerts on the truck. D. There’s no force because they both stop. What are the forces at the instant of impact? 20 mph

A. The truck exerts the same amount of force on the car as the car exerts on the truck. B. The car exerts more force on the truck than the truck exerts on the car. C. The truck exerts more force on the car than the car exerts on the truck. D. There’s no force because they both stop. What are the forces at the instant of impact? 10 mph20 mph

A. The truck exerts the same amount of force on the fly as the fly exerts on the truck. B. The fly exerts more force on the truck than the truck exerts on the fly. C. The truck exerts more force on the fly than the fly exerts on the truck. D. There’s no force because they both stop. 10 mph1 mph What are the forces at the instant of impact?

Inference & Culture Slide 38 October 21, 2004 The Andersen/Rasch Multidimensional Model for m strategy categories is an integer between 1 and m; is the pth element in the person i’s vector-valued parameter; is the strategy person i uses for item j; is the pth element in the item j’s vector-valued parameter.

Inference & Culture Slide 39 October 21, 2004 Conclusion: The Importance of Coordination… Among psychological model, task design, and analytic model »(KWSK “assessment triangle”) »Tatsuoka’s work is exemplary in this respect: –Grounded in psychological analyses –Grainsize & character tuned to learning model –Test design tuned to instructional options

Inference & Culture Slide 40 October 21, 2004 Conclusion: The Importance of Coordination… With purpose, constraints, resources »Lower expectations for retrofitting existing tests designed for different purposes, under different perspectives & warrants. »Information & Communication Technology (ICT) project at ETS –Simulation-based tasks –Large scale –Forward design