DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments.

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Implications and Extensions of Rasch Measurement.
Elliott / October Understanding the Construct to be Assessed Stephen N. Elliott, PhD Learning Science Institute & Dept. of Special Education Vanderbilt.
1 SESSION 3 FORMAL ASSESSMENT TASKS CAT and IT ASSESSMENT TOOLS.
CPM Media Selection Process and Potential Future Software Capabilities CPM Media Selection Process and Potential Future Software Capabilities Danette Likens.
© 2013 SRI International - Company Confidential and Proprietary Information Center for Technology in Learning SRI International NSF Showcase 2014 SIGCSE.
Show Me an Evidential Approach to Assessment Design Michael Rosenfeld F. Jay Breyer David M. Williamson Barbara Showers.
SRI Technology Evaluation WorkshopSlide 1RJM 2/23/00 Leverage Points for Improving Educational Assessment Robert J. Mislevy, Linda S. Steinberg, and Russell.
University of Maryland Slide 1 July 6, 2005 Presented at Invited Symposium K3, “Assessment Engineering: An Emerging Discipline” at the annual meeting of.
CILVR 2006 Slide 1 May 18, 2006 A Bayesian Perspective on Structured Mixtures of IRT Models Robert Mislevy, Roy Levy, Marc Kroopnick, and Daisy Wise University.
U Iowa Slide 1 Sept 19, 2007 Some Terminology and Concepts for Simulation-Based Assessment Robert J. Mislevy University of Maryland In collaboration with.
A Value-Based Approach for Quantifying Scientific Problem Solving Effectiveness Within and Across Educational Systems Ron Stevens, Ph.D. IMMEX Project.
ADL Slide 1 December 15, 2009 Evidence-Centered Design and Cisco’s Packet Tracer Simulation-Based Assessment Robert J. Mislevy Professor, Measurement &
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
Challenges in Developing a University Admissions Test & a National Assessment A Presentation at the Conference On University & Test Development in Central.
Constructivism and Instructional Design Are they compatible? Summary and Presentation by Anna Ignatjeva.
COPYRIGHT WESTED, 2010 Calipers II: Using Simulations to Assess Complex Science Learning Diagnostic Assessments Panel DRK-12 PI Meeting - Dec 1–3, 2010.
The Five New Multi-State Assessment Systems Under Development April 1, 2012 These illustrations have been approved by the leadership of each Consortium.
A Critical Evaluation of Diagnostic Score Reporting: Some Theory and Applications Sandip Sinharay, Gautam Puhan, and Shelby J. Haberman Copyright 2009.
Terry Vendlinski Geneva Haertel SRI International
Key concepts the creative problem solving process Problem Finding Preparation Incubation Illumination and Idea Generation Evaluation This process can take.
Using Notebooks to Support Student Learning Physical Science Department Mesa Community College, Mesa, AZ Kaatje Kraft How can we best prepare our introductory.
Scratch for Science. Computational Thinking Jeanette Wing, 2006 Core theme in CS education, more and more in other subjects Abstraction Automation eScience.
NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,
Prototypical Level 4 Performances Students use a compensation strategy, recognizing the fact that 87 is two less than 89, which means that the addend coupled.
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Measuring Changes in Teachers’ Mathematics Content Knowledge Dr. Amy Germuth Compass Consulting Group, LLC.
Using Technology to Increase Student Engagement in the STEM Classroom Innovations for Learning Conference: Fayette County Schools June 4, 2013 Presenters:
Measuring Changes in Teachers’ Science Content Knowledge Dr. Anne D’Agostino Compass Consulting Group, LLC.
The Design Phase: Using Evidence-Centered Assessment Design Monty Python argument.
Some Implications of Expertise Research for Educational Assessment Robert J. Mislevy University of Maryland National Center for Research on Evaluation,
June 09 Testing: Back to Basics. Abdellatif Zoubair Abdellatif
1 Diagnostic Measurement and Reporting on Concept Inventories Lou DiBello and Jim Pellegrino DRK-12 PI Meeting Washington, DC December 3, 2010.
1 Issues in Assessment in Higher Education: Science Higher Education Forum on Scientific Competencies Medellin-Colombia Nov 2-4, 2005 Dr Hans Wagemaker.
The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.
The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium CCSSO NCSA San Diego CA.
Introduction to Validity
CT 854: Assessment and Evaluation in Science & Mathematics
Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.
Construct-Centered Design (CCD) What is CCD? Adaptation of aspects of learning-goals-driven design (Krajcik, McNeill, & Reiser, 2007) and evidence- centered.
Opportunities and Challenges for Diagnosing Teachers’ Multiplicative Reasoning NSF DR-K12 PI Meeting December 3, 2010 Andrew Izsák University of Georgia.
How complexity influences evaluation A presentation to the Australasian Evaluation Society Conference Sydney, August 2011 Julie McGeary.
How People Learn – Brain, Mind, Experience, and School (Bransford, Brown, & Cocking, 1999) Three core principles 1: If their (students) initial understanding.
EDUCATIONAL ASSESSMENT. DIAGNOSTIC ASSESSMENT IN EDUCATION The 2001 National Research Council (NRC) report Knowing What Students Know (KWSK) Cognitive.
The Development and Validation of the Evaluation Involvement Scale for Use in Multi-site Evaluations Stacie A. ToalUniversity of Minnesota Why Validate.
1 Assessing Student Understanding David Niemi UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards,
Strategy Flexibility Matters for Student Mathematics Achievement: A Meta-Analysis Kelley Durkin Bethany Rittle-Johnson Vanderbilt University, United States.
Robert J. Mislevy University of Maryland National Center for Research on Evaluation, Standards, and Student Testing (CRESST) NCME San Diego, CA April 15,
Development of the Algebra II Units. The Teaching Principle Effective teaching requires understanding what ALL students know and need to learn and challenging.
Understanding the 2015 Smarter Balanced Assessment Results Assessment Services.
AERA April 2005 Models and Tools for Drawing Inferences from Student Work: The BEAR Scoring Engine Cathleen Kennedy & Mark Wilson University of California,
Struggling for meaning in standards-based assessment Mark Wilson UC Berkeley.
NAEP Achievement Levels Michael Ward, Chair of COSDAM Susan Loomis, Assistant Director NAGB Christina Peterson, Project Director ACT.
Teach like a researcher: The contours and implications of a teaching experiments approach to preparing secondary STEM teachers Ian Parker Renga, Ed.M.
BILC Seminar, Budapest, October 2016
SCALE Quality Indicator System
The evidence is in the specifications
Ray Clifford Brno 6 September
Computational Reasoning in High School Science and Math
The science subject knowledge
Validating Interim Assessments
Tracey R. Hembry, Ph.D. Andrew Wiley, Ph.D.
Addressing the Assessing Challenge with the ASSISTment System
Brian Gong Center for Assessment
DESIGNING CURRICULUM TO MEET INTERNATIONAL TESTING STANDARDS
Unidimensionality (U): What’s it good for in applied research?
A Challenge: The Cultural Landscape
Item Response Theory Applications in Health Ron D. Hays, Discussant
Presentation transcript:

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp, EDMS Department, University of Maryland

Toward a Definition of “Diagnostic Assessment Systems”

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Proposed Panel Definition The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to learn, or knowledge. We use “diagnostic assessment (system)” to refer to assessment processes based on an explicit cognitive model, itself supported by empirical study, of proficient reasoning in a particular domain. The cognitive model must support delineation of students’ and / or teachers’ strengths and weaknesses that can be traced as they move from less to more proficient reasoning in the domain. The principled assessment design process should specify how observed behaviors are used to make inferences about what students or teachers know as they progress. We believe that diagnostic assessment has the potential to inform and assess the outcomes of instruction.

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Conceptualization of Problem Space from Stevens, Beal, & Sprang (2009)

Toward an Understanding of Frameworks & Models

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, The Evidence-centered Design Framework adapted from Mislevy, Steinberg, Almond, & Lukas (2006)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Frameworks vs. Models A “principled assessment design framework” for diagnostic assessment such as evidence-centered design is NOT a “model”. It does NOT prescribe a particular statistical modeling approach. A “statistical / psychometric model” is a mathematical tool that plays a supporting role for generating evidence-based narratives about students’ and / or teachers’ strenghts and weaknesses. Its parameters do NOT have inherent meanings. A “cognitive model” for diagnostic assessment is a theory and data-driven description of how emergent understandings and misconceptions in a domain develop and how these can be traced back to unobservable cognitive underpinnings. It does NOT prescribe a singular assessment approach.

Evidence-based Reasoning for “Traditional” Assessments

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Test Score I1 I2 Ik : Test Score I1 I2 Ik : Test Score I1 I2 Ik : Construct Traditional Construct Operationalization Theoretical RealmEmpirical Realm

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Feedback Utility (Part I – Scoring Card)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Feedback Utility (Part II – Simple Progress Mapping) Level 3 Level 4

Evidence-based Reasoning for “Modern” Assessments

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Complex Assessment Tasks for Diagnosis (Part I) from Seeratan & Mislevy (2008)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Complex Assessment Tasks for Diagnosis (Example II) from Behrens et al. (2009)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Evidence Identification, Aggregation, & Synthesis from Stevens, Beal, & Sprang (2009)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Proficiency Pathways from Stevens, Beal, & Sprang (2009)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Interventional Pathways from Stevens, Beal, & Sprang (2009)

Selected Statistical Tools for Evidence-based Reasoning

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Selected Modeling Approaches for Diagnostic Assessments Approaches Resulting in Continuous Proficiency Scales 1.Unidimensional explanatory IRT or FA models (e.g., de Boeck & Wilson, 2004) 2. Multidimensional CTT sumscores (e.g., Henson, Templin, & Douglas, 2007) 3.Multidimensional explanatory IRT or FA models (e.g., Reckase, 2009) 4.Structural equation models (e.g., Kline, 2010) Approaches Resulting in Classifications of Respondents based on Discrete Scales 1. Bayesian inference networks (e.g., Almond, Williamson, Mislevy, & Yan, in press) 2.Parametric diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010) 3.Non- / Semi-parametric classification approaches (e.g., Tatsuoka, 2009) 4. Adapted clustering algorithms (e.g., Nugent, Dean, & Ayers, 2010)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Psychometric Tools for Diagnostic Assessments New frontiers of educational measurement 1. Educational data mining for simulation- / games-based assessment (e.g., Rupp et al., 2010; Soller & Stevens, 2007; West et al., 2009) 2. Diagnostic multiple-choice items / selected-response items (e.g., Briggs et al., 2006; de la Torre, 2009) 3. Computerized diagnostic adaptive assessment (e.g., Cheng, 2009; McGlohen & Chang, 2008) Useful ideas from large-scale assessment 1. Modeling dependencies in nested response data (e.g., Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007) 2. Item families / task variants & automatic test / form assembly (e.g., Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press) 3. Survey designs using multiple test forms / booklets (e.g., Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010)

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp EDMS Department, University of Maryland 1230-A Benjamin Building College Park, MD Phone: (301) 405 –

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part I) Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer. Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, Borsboom, D., & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85–118). Cambridge, UK: Cambridge University Press. Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), Geerlings, H., Glas, C. A. W., & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika.

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part II) Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24, Haberman, S., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, Jiao, H., von Davier, M., & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO. Kane, M. T. (2006). Validation. In R L. Brennan (Ed.), Educational measurement (4th ed., pp. 17– 64). Portsmouth, NH: Greenwood. Kline, R. (2010). Principles and practice of structural equation modeling (2 nd ed.). New York: Guilford Press. Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press. McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–48). Mahwah, NJ: Erlbaum.

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part III) Nugent, R., Dean, N., & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA. Reckase, M. (2009). Multidimensional item response theory. New York: Springer. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press. Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge. Stevens, R., Beal, C., & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC. Templin, J., & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press. West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA.