On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

Assessment types and activities
Contextualised teacher judgments in a high-stakes assessment system John A Pitman Queensland, Australia.
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
1 Functions of Assessment Why do we assess students? Discuss in your group and suggest the three most important reasons.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Modeling Student Knowledge Using Bayesian Networks to Predict Student Performance By Zach Pardos, Neil Heffernan, Brigham Anderson and Cristina Heffernan.
Department of Industrial Psychology  Faculty of Economic and Management Sciences Nadia Brits Supervisor: Prof. Deon Meiring ACSG Conference 16 March 2011.
© Curriculum Foundation1 Section 3 Assessing Skills Section 3 Assessing Skills There are three key questions here: How do we know whether or not a skill.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
Education 3504 Week 3 reliability & validity observation techniques checklists and rubrics.
VALIDITY.
Modeling for Expert Learning Dr. Mok, Y.F.. Many university students do not study Their decoding is inefficient, making comprehension weak & difficult.
Uses of Language Tests.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments 1 Introduction to Comparability Inclusive Assessment Seminar.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Stronge Teacher Effectiveness Performance Evaluation System
The noted critics Statler and Waldorf. What critical thinking is and why it matters How it can be applied to different academic disciplines What it means.
Item Response Theory for Survey Data Analysis EPSY 5245 Michael C. Rodriguez.
The College Board: Expanding College Opportunity The College Board is a national nonprofit membership association dedicated to preparing, inspiring, and.
Ch 6 Validity of Instrument
TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Reliability and Validity what is measured and how well.
Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.
Understanding Meaning and Importance of Competency Based Assessment
ASSESSMENT IN EDUCATION ASSESSMENT IN EDUCATION. Copyright Keith Morrison, 2004 DOMAIN-REFERENCING Specify the domain – the content field – that is being.
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Students’ and Faculty’s Perceptions of Assessment at Qassim College of Medicine Abdullah Alghasham - M. Nour-El-Din – Issam Barrimah Acknowledgment: This.
Reliability & Validity
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
Introduction to Validity
Illustration of a Validity Argument for Two Alternate Assessment Approaches Presentation at the OSEP Project Directors’ Conference Steve Ferrara American.
Assessing Learning for Students with Disabilities Tom Haladyna Arizona State University.
Construct-Centered Design (CCD) What is CCD? Adaptation of aspects of learning-goals-driven design (Krajcik, McNeill, & Reiser, 2007) and evidence- centered.
California Educational Research Association Annual Meeting Rancho Mirage, CA – December 5, 2008 Hoky Min, Gregory K. W. K. Chung, Rebecca Buschang, Lianna.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
EDUCATIONAL ASSESSMENT. DIAGNOSTIC ASSESSMENT IN EDUCATION The 2001 National Research Council (NRC) report Knowing What Students Know (KWSK) Cognitive.
The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
Chapter 7: Assessment. Group Work – Key points 1. Role of assessment, who does it? (pp ) 2. Components of assessments (pp ) 3. Keys to.
Bridging the Evidence Gap: Level Of Knowledge Use Survey - LOKUS as a Validated Instrument Joseph P. Lane Center on Knowledge Translation for Technology.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Critically Reviewing Literature Dr N L Reynolds. Lecture Objectives To provide guidelines on how to get the most out of the literature and secondary data.
Chapter 6 - Standardized Measurement and Assessment
to become a critical consumer of information.
Foundations of American Education: Perspectives on Education in a Changing World, 15e © 2011 Pearson Education, Inc. All rights reserved. Chapter 11 Standards,
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Chapter 7: Assessment Identifying Strengths and Needs “Assessment is the process of gathering data for the purpose of making decisions about individuals.
Using Data to Implement RtI Colleen Anderson & Michelle Hosp Iowa Department of Education.
ReadiStep and PSAT/NMSQT Summary of Answers and Skills & Advantage: SAT PSAT/NMSQT.
The New MCAIII Science Benchmark Reports for 2015 Minnesota Department of Education Science Assessment Specialist Jim WoodDawn Cameron
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
Assessment for Learning Practical ideas to use in your classroom every day.
Knowing What Students Know Ganesh Padmanabhan 2/19/2004.
Designing Scoring Rubrics
Introduction to the Validation Phase
EDU 385 Session 8 Writing Selection items
Introduction to the Validation Phase
Reliability and Validity
Reliability and Validity
CSEC Physics Workshop- SBA
Joseph P. Lane Center on Knowledge Translation for Technology Transfer
Addressing the Assessing Challenge with the ASSISTment System
Understanding and Using Standardized Tests
CHAPTER 5: Selecting and Using Assessment Instruments
Assessments-Purpose and Principles
Why do we assess?.
Presentation transcript:

On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley

Outline What does “Validity” look like for these papers? What is it that these papers are distinguishing themselves from? Where might one go from here?

Need for strong concern about validity Effect of NCLB requirements: –Schools are instituting frequent “benchmark” tests –Intended to guide teachers as to students strengths abd weaknesses –Often just little copies of the “State test” –Teachers are complaining that it puts a vice-like grip on the curriculum

The Triangle of Learning: standard interpretation

The “vicious” triangle

Validity 1999 AERA/APA/NCME Standards for educational and psychological tests Five types of validity evidence: –Evidence based on test content –Evidence based on response processes –Evidence based on internal structure –Evidence based on external structure –Evidence based on consequences

Paper 1: Falmange et al-ALEKS Reliability => Validity –“the collection of all the problems potentially used in any assessment represents a fully comprehensive coverage of a particular curriculum,..[hence]...[a]rguing that such an assessment, if it is reliable, is also automatically endowed with a corresponding amount of validity is plausible.”

Paper 1: Falmange et al-ALEKS Test content –Theory of the Learning Space “inner fringe” and “outer fringe” –“the summary is meaningful for an instructor” –Database of Problems “a consensus among educators that the database of problems is a comprehensive compendium for testing the mastery of a scholarly subject. This phase is relatively straightforward.” Evidence: Who were the experts?/What did they do?/How much did they agree?

Paper 1: Falmange et al-ALEKS Evidence based on response processes –E.g., for selected K, Do students in K say things that are consistent/inconsistent with that Evidence based on internal structure –E.g., for selected K, Do students in K have high/low success rates at “instances” in K Evidence based on external structure –E.g., comparison with teacher judgments of student ability Evidence based on consequences –E.g., use of “fringes”…does this help/hinder teacher interpretations

Paper 2: Shute et al-ACED Two “validity studies” Study 1: Evidence based on external structure: –Prediction of residuals from external post-test after controlling for pre-test –Informative design on conditions: elaborated feedback better Study 2: Evidence based on response processes –“Usability” study for students with disabilities

Paper 2: Shute et al-ACED Evidence based on test content –reference to earlier paper Evidence based on internal structure –Could easily be investigated, as there is interesting internal structure (Fig. 1) Evidence based on consequences –Probably not any real consequences yet

Paper 3: Heffernan et al -ASSISTment System Evidence based on test content –Items coded by: 2 experts, 7 hrs. –“skill of Venn Diagram” Evidence based on internal structure –Which skill-model fits best--1, 5, 39, 106 skills? –Which number is different? 4.10, 4.11, 4.12, 4.10, 4.10 –1, 5, 39, 106 (twice)

Paper 3: Heffernan et al -ASSISTment System Evidence based on external structure –Prediction of MCAS 23/38 = 61% don’t fit well for the “best” model (WPI-39 (B)).

Paper 3: Heffernan et al -ASSISTment System Evidence based on response processes –? Evidence based on consequences –Probably are real consequences

Paper 4: Junker-ASSISTment System Two “Validity studies” Study 1: Evidence based on external structure –Prediction of MCAS scores Study 2: Evidence based on internal structure –4 internal structure patterns –2 questions Q1: Regarding how scaffolds get easier--what happens when you get a scaffold wrong? Q2: What about the gap?

Paper 4: Junker-ASSISTment System Rest of types of validity--see Paper 3

Looking Beyond What does this group of papers have to offer? What should it be looking out for?

Paper 1: Falmange et al-ALEKS Inner and Outer Fringe –What do teachers think of them, what do they do with them? “Standardized tests,” “psychometrics” as straw men –Alternative: compare ones work to the latest developments in item response modeling (e.g., EIRM)

Paper 2: Shute et al-ACED “Weight of Evidence” –Good alternative to Fisher information –Transparent, easily interpretable Models for people with disabilities –Most likely going to have different internal structure –Need to develop broader view of internal structure criteria

Paper 3: Heffernan et al -ASSISTment System MCAS as starting point for diagnostic testing? –Using released items?!? What is “unidimensionality”

Paper 3: Heffernan et al -ASSISTment System In a latent class model, the latent class looks like this: In an item response model (e.g., Rasch model), unidimensionality looks like this: … See: Karelitz, T.M., Wilson, M.R., & Draney, K.L. (2005). Diagnostic Assessment using Continuous vs. Discrete Ability Models. Paper presented at the NCME Annual Meeting in San Francisco, CA.

Paper 4: Junker-ASSISTment System What is the effect of assuming MCAR/MAR assumptions when neither is true? –Relevant to all CAT –Or of assuming you know the response under NMAR Is there a discrimination paradox in DINA models? Why do scaffold questions get easier?

Future Directions What is a “Knowledge State” (KS) How do we test if it’s a unitary thing? What if it isn’t? –Mixture models--structured KSs Do teachers (and other practitioners) find the KSs useful –How to adjust if they don’t? finer/coarser grained structured