Download presentation
Presentation is loading. Please wait.
1
On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley
2
Outline What does “Validity” look like for these papers? What is it that these papers are distinguishing themselves from? Where might one go from here?
3
Need for strong concern about validity Effect of NCLB requirements: –Schools are instituting frequent “benchmark” tests –Intended to guide teachers as to students strengths abd weaknesses –Often just little copies of the “State test” –Teachers are complaining that it puts a vice-like grip on the curriculum
4
The Triangle of Learning: standard interpretation
5
The “vicious” triangle
6
Validity 1999 AERA/APA/NCME Standards for educational and psychological tests Five types of validity evidence: –Evidence based on test content –Evidence based on response processes –Evidence based on internal structure –Evidence based on external structure –Evidence based on consequences
7
Paper 1: Falmange et al-ALEKS Reliability => Validity –“the collection of all the problems potentially used in any assessment represents a fully comprehensive coverage of a particular curriculum,..[hence]...[a]rguing that such an assessment, if it is reliable, is also automatically endowed with a corresponding amount of validity is plausible.”
8
Paper 1: Falmange et al-ALEKS Test content –Theory of the Learning Space “inner fringe” and “outer fringe” –“the summary is meaningful for an instructor” –Database of Problems “a consensus among educators that the database of problems is a comprehensive compendium for testing the mastery of a scholarly subject. This phase is relatively straightforward.” Evidence: Who were the experts?/What did they do?/How much did they agree?
9
Paper 1: Falmange et al-ALEKS Evidence based on response processes –E.g., for selected K, Do students in K say things that are consistent/inconsistent with that Evidence based on internal structure –E.g., for selected K, Do students in K have high/low success rates at “instances” in K Evidence based on external structure –E.g., comparison with teacher judgments of student ability Evidence based on consequences –E.g., use of “fringes”…does this help/hinder teacher interpretations
10
Paper 2: Shute et al-ACED Two “validity studies” Study 1: Evidence based on external structure: –Prediction of residuals from external post-test after controlling for pre-test –Informative design on conditions: elaborated feedback better Study 2: Evidence based on response processes –“Usability” study for students with disabilities
11
Paper 2: Shute et al-ACED Evidence based on test content –reference to earlier paper Evidence based on internal structure –Could easily be investigated, as there is interesting internal structure (Fig. 1) Evidence based on consequences –Probably not any real consequences yet
12
Paper 3: Heffernan et al -ASSISTment System Evidence based on test content –Items coded by: 2 experts, 7 hrs. –“skill of Venn Diagram” Evidence based on internal structure –Which skill-model fits best--1, 5, 39, 106 skills? –Which number is different? 4.10, 4.11, 4.12, 4.10, 4.10 –1, 5, 39, 106 (twice)
13
Paper 3: Heffernan et al -ASSISTment System Evidence based on external structure –Prediction of MCAS 23/38 = 61% don’t fit well for the “best” model (WPI-39 (B)).
14
Paper 3: Heffernan et al -ASSISTment System Evidence based on response processes –? Evidence based on consequences –Probably are real consequences
15
Paper 4: Junker-ASSISTment System Two “Validity studies” Study 1: Evidence based on external structure –Prediction of MCAS scores Study 2: Evidence based on internal structure –4 internal structure patterns –2 questions Q1: Regarding how scaffolds get easier--what happens when you get a scaffold wrong? Q2: What about the gap?
17
Paper 4: Junker-ASSISTment System Rest of types of validity--see Paper 3
18
Looking Beyond What does this group of papers have to offer? What should it be looking out for?
19
Paper 1: Falmange et al-ALEKS Inner and Outer Fringe –What do teachers think of them, what do they do with them? “Standardized tests,” “psychometrics” as straw men –Alternative: compare ones work to the latest developments in item response modeling (e.g., EIRM)
20
Paper 2: Shute et al-ACED “Weight of Evidence” –Good alternative to Fisher information –Transparent, easily interpretable Models for people with disabilities –Most likely going to have different internal structure –Need to develop broader view of internal structure criteria
21
Paper 3: Heffernan et al -ASSISTment System MCAS as starting point for diagnostic testing? –Using released items?!? What is “unidimensionality”
22
Paper 3: Heffernan et al -ASSISTment System In a latent class model, the latent class looks like this: In an item response model (e.g., Rasch model), unidimensionality looks like this: … See: Karelitz, T.M., Wilson, M.R., & Draney, K.L. (2005). Diagnostic Assessment using Continuous vs. Discrete Ability Models. Paper presented at the NCME Annual Meeting in San Francisco, CA.
23
Paper 4: Junker-ASSISTment System What is the effect of assuming MCAR/MAR assumptions when neither is true? –Relevant to all CAT –Or of assuming you know the response under NMAR Is there a discrimination paradox in DINA models? Why do scaffold questions get easier?
24
Future Directions What is a “Knowledge State” (KS) How do we test if it’s a unitary thing? What if it isn’t? –Mixture models--structured KSs Do teachers (and other practitioners) find the KSs useful –How to adjust if they don’t? finer/coarser grained structured
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.