Inference & Culture Slide 1 April 29, 2003 Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy Department of Measurement,

Slides:



Advertisements
Similar presentations
Inquiry-Based Instruction
Advertisements

Performance Assessment
1 Content-based Interpretations of Test Scores Michael Kane National Conference of Bar Examiners Maryland Assessment Research Center for Education Success.
Copyright © Allyn & Bacon (2007) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Lecture 3 – Skills Theory
Theories of Second language Acquisition
The Art and Science of Teaching (2007)
Assessment: Reliability, Validity, and Absence of bias
Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.
VALIDITY.
SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational Assessment Robert J. Mislevy, Linda S. Steinberg, and Russell.
University of Maryland Slide 1 May 2, 2001 ECD as KR * Robert J. Mislevy, University of Maryland Roy Levy, University of Maryland Eric G. Hansen, Educational.
SLRF 2010 Slide 1 Oct 16, 2010 What is the construct in task-based language assessment? Robert J. Mislevy Professor, Measurement, Statistics and Evaluation.
Chapter Two SCIENTIFIC METHODS IN BUSINESS
Inference & Culture Slide 1 October 21, 2004 Cognitive Diagnosis as Evidentiary Argument Robert J. Mislevy Department of Measurement, Statistics, & Evaluation.
Chapter 9 Principles of Analysis and Interpretation.
(1) If Language is a Complex Adaptive System, What is Language Assessment? Presented at “Language as a Complex Adaptive System”, an invited conference.
Sabine Mendes Lima Moura Issues in Research Methodology PUC – November 2014.
FERA 2001 Slide 1 November 6, 2001 Making Sense of Data from Complex Assessments Robert J. Mislevy University of Maryland Linda S. Steinberg & Russell.
Today Concepts underlying inferential statistics
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Classroom Assessment A Practical Guide for Educators by Craig A
Quantitative and Qualitative Approaches Dr. William M. Bauer
Introduction to Educational Research
ADL Slide 1 December 15, 2009 Evidence-Centered Design and Cisco’s Packet Tracer Simulation-Based Assessment Robert J. Mislevy Professor, Measurement &
Chapter 1 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display,
© 2013 Cengage Learning. Outline  Types of Cross-Cultural Research  Method validation studies  Indigenous cultural studies  Cross-cultural comparisons.
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
HUMAN DEVELOPMENT 1 PSYCHOLOGY 3050: Social Construction of Mind
Click to edit Master title style  Click to edit Master text styles  Second level  Third level  Fourth level  Fifth level  Click to edit Master text.
Determining Sample Size
Terry Vendlinski Geneva Haertel SRI International
Learning Objectives. Objectives Objectives: By the conclusion to this session each participant should be able to… Differentiate between a goal and objectives.
Reporting & Ethical Standards EPSY 5245 Michael C. Rodriguez.
September 19, 2006 CP 6002 Statistics and Research II.
Some Implications of Expertise Research for Educational Assessment Robert J. Mislevy University of Maryland National Center for Research on Evaluation,
TEA Science Workshop #3 October 1, 2012 Kim Lott Utah State University.
SLB /04/07 Thinking and Communicating “The Spiritual Life is Thinking!” (R.B. Thieme, Jr.)
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Measuring Complex Achievement
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
ATTRIBUTEDESCRIPTION Focal Knowledge, Skills, Abilities The primary knowledge / skills / abilities (KSAs) targeted by this design pattern. RationaleHow/why.
Inductive Generalizations Induction is the basis for our commonsense beliefs about the world. In the most general sense, inductive reasoning, is that in.
Validity Is the Test Appropriate, Useful, and Meaningful?
Quantitative and Qualitative Approaches
Educational Objectives
Measurement Validity.
1 The Theoretical Framework. A theoretical framework is similar to the frame of the house. Just as the foundation supports a house, a theoretical framework.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Robert J. Mislevy University of Maryland National Center for Research on Evaluation, Standards, and Student Testing (CRESST) NCME San Diego, CA April 15,
Lecture №1 Role of science in modern society. Role of science in modern society.
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Thinking Critically in Psychology Introduction to Psychology Simon Fraser University.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Introduction Statistics Introduction Origin Meaning
Critical Thinking or how to learn and know that you know what you know, if you know it Terry C. Norris.
The Scientific Method. Scientifically Solving a Problem Observe Define a Problem Review the Literature Observe some More Develop a Theoretical Framework.
Knowing What Students Know Ganesh Padmanabhan 2/19/2004.
CHAPTER ONE EDUCATIONAL RESEARCH. THINKING THROUGH REASONING (INDUCTIVELY) Inductive Reasoning : developing generalizations based on observation of a.
Standards for Decision Making
VALIDITY by Barli Tambunan/
Lecture 02.
Reliability and Validity of Measurement
Gestalt Theory.
Research in Psychology
RESEARCH BASICS What is research?.
Validity and Reliability II: The Basics
Debate issues Sabine Mendes Lima Moura Issues in Research Methodology
Presentation transcript:

Inference & Culture Slide 1 April 29, 2003 Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD April 29, 2003 Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York, April 27-29, This work builds on research with Linda Steinberg and Russell Almond at Educational Testing Service on the structure of educational assessments.

Inference & Culture Slide 2 April 29, 2003 Central Points Educational assessment has changed considerably over the last century. Why? Strikingly different psychological perspectives on nature of learning and knowledge. Can be seen as elaborations of same argument structure. »Wigmore, Toulmin

Inference & Culture Slide 3 April 29, 2003 Messick (1994) on assessment design: [B]egin by asking what complex of knowledge, skills, or other attribute should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society. Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? Thus, the nature of the construct guides the selection or construction of relevant tasks as well as the rational development of construct-based scoring criteria and rubrics.

Inference & Culture Slide 4 April 29, 2003 Toulmin's (1958) structure for arguments Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them.

Inference & Culture Slide 5 April 29, 2003 Perspectives on learning and knowledge Trait/differential (~ ) Behaviorist (~ ) Information-processing (~ ) Sociocultural (~ )

Inference & Culture Slide 6 April 29, 2003 Trait/Differential Perspective A relatively stable characteristic of a person— an attribute, enduring process, or disposition—which is consistently manifested to some degree when relevant, despite considerable variation in the range of settings and circumstances. (Messick, 1989) Interest in people's differential status on common traits Useful in selection, prediction, and educational decisions—not so much for instruction

Inference & Culture Slide 7 April 29, 2003 Spearman’s “Theorem of indifference of the indicator” This means that, for the purpose of indicating the amount of g possessed by a person, any test will do just well as any other, provided only that its correlation with g is equally high.... Another consequence of the indifference of the indicator consists in the significance that should be attached to personal estimates of “intelligence” made by teachers and others. However unlike may be the kinds of observation from which these estimates may have been derived, still insofar as they have a sufficiently broad basis to make the influence of g dominate over that of the s’s [subjects], they will tend to measure precisely the same thing.

Inference & Culture Slide 8 April 29, 2003 An Analytical Reasoning Item Pet Shop Display Arturo is planning the parakeet display for his pet shop. He has five parakeets, Alice, Bob, Carla, Diwakar, and Etria. Each is a different color; not necessarily in the same order, they are white, speckled, green, blue, and yellow. Arturo has two cages. The top cage holds three birds, and the bottom cage holds two. The display must meet the following additional conditions: Alice is in the bottom cage. Bob is in the top cage and is not speckled. Carla cannot be in the same cage as the blue parakeet. Etria is green. The green parakeet and the speckled parakeet are in the same cage. If Carla is in the top cage, which of the following must be true? a) The green parakeet is in the bottom cage. b) The speckled parakeet is in the bottom cage. c) Diwakar is in the top cage. d) Diwakar is in the bottom cage. e) The blue parakeet is in the top cage.

Inference & Culture Slide 9 April 29, 2003 LSAT on AR Items LSAT's description of AR takes a trait perspective: "Analytical reasoning items are designed to measure the ability to understand a structure of relationships and to draw conclusions about the structure." AR items are in the LSAT not because either lawyers or law students routinely have to solve problems just like these in their jobs or their studies, but because there is evidence that students who can solve these kinds of puzzles tend to perform better in law school than students who don't.

1) Note that the warrant requires a conjunction of data about the nature of Sue's performance and the nature of the performance situation.

2) A closer look at the “data”: Must reason from unique work products and item materials, to aspects addressed in the general warrant. 2) A closer look at the “data”: Must reason from unique work products and item materials, to aspects addressed in the general warrant.

Inference & Culture Slide 13 April 29, 2003 Multiple pieces of evidence of the same kind

Inference & Culture Slide 14 April 29, 2003 Multiple pieces of evidence of different kinds

Inference & Culture Slide 15 April 29, 2003 Statistical Modeling of Assessment Data Claims in terms of values of unobservable variables in student model (SM)-- characterize student knowledge. Data modeled as depending probabilistically on SM vars. Estimate conditional distributions of data given SM vars. Bayes theorem to infer SM variables given data. Claims in terms of values of unobservable variables in student model (SM)-- characterize student knowledge. Data modeled as depending probabilistically on SM vars. Estimate conditional distributions of data given SM vars. Bayes theorem to infer SM variables given data.

Inference & Culture Slide 16 April 29, 2003 Behaviorist Perspective The educational process consists of providing a series of environments that permit the student to learn new behaviors or modify or eliminate existing behaviors and to practice these behaviors to the point that he displays them at some reasonably satisfactory level of competence and regularity under appropriate circumstances. … The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur. D.R. Krathwohl & D.A. Payne, 1971, p

The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory.

The claim addresses the expected value of performance of the targeted kind in the targeted situations.

The task data address the salient features of the stimulus situations (i.e., tasks).

The student data address the salient features of the responses.

Inference & Culture Slide 21 April 29, 2003 The Information-Processing Perspective Epitomized in Newell and Simon’s (1972) Human Problem Solving Examines the procedures by which people acquire, store, and use knowledge to solve problems. Modeling problem-solving in terms of the capabilities and the limitations of human thought and memory. Importance of knowledge structures, relationships, procedures in learning domains. Use of rules, production systems, task decompositions, and means-ends analyses.

Inference & Culture Slide 22 April 29, 2003 Responses consistent with the "subtract smaller from larger" bug

Like behaviorist inference at level of behavior in classes of structurally similar tasks.

Patterns among behaviorist claims are data for inferences about unobservable production rules that govern behavior.

Assessing inquiry processes: Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action. Assessing inquiry processes: Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action.

Inference & Culture Slide 27 April 29, 2003 The Sociocultural Perspective Stresses how knowledge is conditioned and constrained by the technologies, information resources, representation systems, and social situations... Incorporates explanatory concepts that have proved useful in fields such as ethnography and sociocultural psychology to study collaborative work, … mutual understanding in conversation, and other characteristics of interaction that are relevant to the functional success of the participants’ activities. Greeno, Collins, & Resnick, 1997, p. 7.

AP Studio Art Portfolios

Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme.

AP Studio Art Portfolios Data from student are (1) works of art and (2) explanation of project goals, approach, rationale.

AP Studio Art Portfolios Student text helps assure performance conditions meet the requirements of the warrant.

AP Studio Art Portfolios Student text contributes to how raters apply general evaluation rubric to this student’s work.

Conversational Competence

Challenges: 1) Time dependencies. 2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence. 3) How constrained? Naturalistic vs. interviewer. Challenges: 1) Time dependencies. 2) Interlocutor’s behavior affects context-- is required by warrant for evidence about certain aspects of competence. 3) How constrained? Naturalistic vs. interviewer.

Inference & Culture Slide 35 April 29, 2003 Conclusion What changes? Developments in psychology, technology, and social factors (e.g., accommodations) continually place demands on assessment that outstrip familiar forms. What doesn’t change? We want to draw inferences about what students know and can do as seen from some perspective; that perspective tells us what kinds of things we need to see them do, in what kinds of situations, to ground those inferences.

Inference & Culture Slide 36 April 29, 2003 Conclusion We see elaborations, extensions, and specializations of enduring principles of evidentiary reasoning. We find continued value in tools such as Toulmin diagrams, Wigmore charts, and Bayesian inference networks to understand yesterday's assessments, manage today's, and design the assessments of tomorrow.