Download presentation
Presentation is loading. Please wait.
Published byHarriet Fletcher Modified over 9 years ago
1
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development Services WestEd /AACC Summit II: Designing Comprehensive Evaluation Systems February 27, 2012
2
Presentation Purposes What are the characteristics of appropriate assessments to measure student growth? What technical criteria should be applied? How can we define “good enough?” What challenges do we face (general and growth related)? 1
3
Demonstrating Technical Adequacy Lack of Bias Reliability Validity Test Administration Procedures Scoring and Reporting (interpretive guides) 2
4
Technical Criteria Purpose o Content and context o General factors relevant to all assessments o Additional factors specific to measuring growth Population o Ensure validity and fairness for all student populations o Conduct field tests and item reviews Content o Articulate the range of skills and concepts to be assessed o Specify appropriate item types 3
5
Evidence Provided o Assertion? Summary? Detailed Description? Data supported? Type of Data o Quantitative? Qualitative? Both? Sufficiency o Comprehensive? Context and interpretation provided? Quality o Statistical assumptions satisfied? Replicable? Accurate? Generalizable? Adequacy o Credible information? Sufficiency, Quality, and Adequacy 4
6
Definition of Rigorous and Comparable Existing Instrument Use as available Modify items/tasks (alignment, breadth/depth) Collect additional evidence (who, how, when?) “Paper Test” – Performance Assessments – Modules – Classroom Work— Grades Which type of evidence takes precedence (reliability vs. consequential validity)? Sliding scale for weighting assessment data Cost: House analogy Challenges: General 5
7
Vertical Scale vs. Scale Free Models (e.g., Vertically Articulated Achievement Levels, SGPs) Non contiguous grades and content True Gains vs. Error Multiple equated forms Recommended pre-post time frame Reliability at various points on the score scale, especially extremes Evidence of gain score reliability Challenges: Specific to Growth 6
8
Criteria (Validity Cluster) 7 Criteria ClusterCriterionSpecific Evidence Validity Field Testing Field Test Sampling Design: Representativeness and Norming Field Test Sampling Design: Currency (at least, dates documented) Field Test Sampling Design: Randomization Fidelity (link of test to stated purpose of the test) Design Attrition of Persons (for Pre/Post Designs) Test Blueprint Scoring Rubric for OE Items: Construction and Validation Accommodations
9
8 Criteria ClusterCriterionSpecific Evidence Validity Content Content Alignment Studies Expert judgments p-values Discrimination (Item-test Correlations) Bias/DIF analysis IRT/Item fit (ICC) Distractor Analysis Construct Factorial Validity (structural equation modeling) Multi-Trait/Multi-Method Equivalence/Comparability (construct the same regardless of examinee’s ability) Criteria (Validity Cluster) cont.
10
9 Criteria ClusterCriterionSpecific Evidence Validity Criterion Predictive validity - Validation to the Referent Predictive validity - Individual and group scores Concurrent validity - Validation to External Criteria Concurrent validity – Validity of External Criteria Concurrent validity - Individual and group scores Consequential Evaluation of Testing Consequences Individual and group scores Criteria (Validity Cluster) cont.
11
10 Criteria ClusterCriterionSpecific Evidence ValidityGrowth Multiple equated forms Recommended pre-post time frame Reliability at various points of score scale Gain score reliability Criteria (Validity Cluster) cont.
12
11 Criteria (Reliability Cluster) Criteria ClusterCriterionSpecific Evidence Reliability Reliability: Single Administration Scale Internal Consistency Split-half Scorer / Hand-scoring Reliability: Multiple Administrations Test-retest Reliability: Either Single or Multiple Administrations Alternate form Individual and group scores Classification consistency Generalizability
13
12 Criteria ClusterCriterionSpecific Evidence Freedom from Bias Judgmental and Statistical (DIF) Reviews Bias review panel Content Ethnicity Cultural Linguistic Socio-economic Geographic Students with disabilities Universal Design Criteria (Freedom from Bias Cluster)
14
13 Criteria (Testing System Cluster) Criteria ClusterCriterionSpecific Evidence Testing System (Superordinate) Criteria Form-Level Analyses N (sample size) Central Tendency (Mean, Median, Mode) Variation (Range, Variance, Standard Deviation) Standard Error of Measurement Bias IRT fit (TCC) Equating Scaling
15
14 Criteria (Testing System Cluster) cont. Criteria ClusterCriterionSpecific Evidence Testing System (Super-ordinate) Criteria Reporting Student level ESEA Subgroups Class District State Populations Description of Standards Setting: Methods, Participants, Group Size Report Format Basic Custom
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.