Download presentation
Presentation is loading. Please wait.
Published bySpencer Shepherd Modified over 9 years ago
1
C R E S S T / Harvard Daniel Koretz Harvard Graduate School of Education National Center for Research on Evaluation, Standards, and Student Testing “Believe me, it’s not cheating, but some strange method” Annual CRESST Conference September 11, 2002, Los Angeles, CA GRE/TOEFL prep teacher, Shanghai
2
C R E S S T / Harvard 2 Validating inferences in the age of NCLB è Validity is a property of inferences, not of measures è Key inferences are now about gains obtained under high stakes conditions è Traditional validation is insufficient n Inappropriate framework n Insufficient methods è Risk is false positives: inflated gains
3
C R E S S T / Harvard 3 Map of talk è Will not show evidence of severe inflation—old hat by now è Will discuss approach to validation of gains è Will illustrate possible leverage points for coaching, inflation of scores è Will note possible directions for future
4
C R E S S T / Harvard 4 Why traditional validation is insufficient è Cross-sectional, insensitive to changes in levels of performance è Insufficient in high-stakes contexts: n Largely ignores behavioral responses to testing n Ignores inadvertent emphases in tests n Assumes stability in relationships between aspects of performance, both tested and untested
5
C R E S S T / Harvard 5 Why these limitations matter è Scores can rise rapidly—and be inflated— without affecting correlations among tests è Behavioral responses to testing (e.g., coaching) can make sampled content unrepresentative of domain after initial validation è Inadvertent emphases in tests can provide leverage for coaching
6
C R E S S T / Harvard 6 KY math trends, KIRIS and ACT
7
C R E S S T / Harvard 7 Correlations Between ACT and KIRIS Mathematics
8
C R E S S T / Harvard 8 Keys to validating gains è Assess generalizability of gains to other (audit) measures è Determine how much generalizability should be expected n Based on users’ inferences (example of TAAS vs. NAEP) è Examine behavioral responses
9
C R E S S T / Harvard 9 CRESST work on the validation of gains è Develop framework for validation efforts (Tech Report 551) è Explore teacher surveys and interviews as a means of obtaining information behavioral responses to testing (ongoing) è Develop statistical models for the analysis of gains (new)
10
C R E S S T / Harvard 10 Framework for validating gains è Identify substantive and nonsubstantive performance elements in test, inferences è Determine weights given to PEs in test n May be unintended n May be trivial or zero è Determine weights given to PEs in key inferences about gains è Validity hinges on consistency of change in performance on PEs with inference weights
11
C R E S S T / Harvard 11 Types of test preparation è Teaching more è Working harder è Working more effectively è Reallocation è Alignment è Coaching è Cheating
12
C R E S S T / Harvard 12 Reallocation è Refers to shifting limited instructional resources among substantive areas n Within subject n Between subjects è Results in reallocating achievement è Can lead to either meaningful change or inflation è Inflates by undermining representation of the domain
13
C R E S S T / Harvard 13 Alignment è Sometimes presented as providing protection against inflation: emphasis on PEs deemed important è But this is just a form of reallocation è Whether gains are inflated depends on n Importance of emphasized material to inference, and n Importance of de-emphasized or omitted material to inference
14
C R E S S T / Harvard 14 Coaching è Focuses on details of the test n Substantive, including item style n Non-substantive, such as item formats and scoring rubrics è Includes test-taking tricks (e.g., POE, plug-in) è Can inflate scores or simply waste time
15
C R E S S T / Harvard 15 Possible levers for coaching è Possibly inadvertent content overweighting è Item style n Recurrent content detail n Recurrent form of presentation è Inadvertent, recurrent construct underrepresentation è Recurrent cognitive demand with limited construct relevance
16
C R E S S T / Harvard 16 Eva has four sets of straws. The measurements of the straws are given below. Which set of straws could not be used to form a triangle? A. Set 1: 4 cm, 4 cm, 7 cm B. Set 2: 2 cm, 3 cm, 8 cm C. Set 3: 3 cm, 4 cm, 5 cm D. Set 4: 5 cm, 12 cm, 13 cm Item from G8 MCAS
17
C R E S S T / Harvard 17 Each arrangement in this pattern is made up of tiles. How many tiles will be in the 6 th arrangement in the pattern? Item from G8 MCAS
18
C R E S S T / Harvard 18 Prompt from G8 MCAS Use the balance scales below to answer the question below
19
C R E S S T / Harvard 19 Prompt from G10 NAEP Use the unit of length below to estimate the perimeter of the figure shown. Between which two consecutive whole-number units does the perimeter lie?
20
C R E S S T / Harvard 20 Prompt from G10 MCAS Use the map below to answer this question.
21
C R E S S T / Harvard 21 Prompt from a G8 KIRIS item
22
C R E S S T / Harvard 22 Prompt from G10 MCAS Use the figure below to answer the next question
23
C R E S S T / Harvard 23 Answers for G10 MCAS prompt If the figure above is folded into a cube, which of the following solids will be formed?
24
C R E S S T / Harvard 24 Next steps for research è Develop methods for ascertaining which levers teachers use to inflate scores è Develop methods for identifying systematically the patterns in tests that facilitate or inhibit coaching and inappropriate reallocation è Develop methods for ‘unpacking’ lack of generalization and for better distinguishing between meaningful gains and inflation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.