Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa.

Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa

Test Development: A Technical Concern Procedures are well-established – it’s sort of a ‘rocket-art’ Aspects of ‘quality’ that seem distinct to an observer are inseparable to a developer Quality control requires resources – talent, time, and money – to do well TD is the grunt work of assessment

Best Practice in Test Development Interpret content standards; translate into test specifications Search for stimulus material; draft items Do the 3Rs: REVIEW-REVISE-REPLACE Prepare material for field testing Oops – we forgot about finding the kids to participate in field testing, many comparable samples of them

More Best Practice in TD Administer, retrieve, and score tryout materials; get item analysis results to TDers Do the 3Rs: REVIEW-REVISE- REPLACE Prepare more material for field testing Oops – more kids for field testing, more comparable samples

What do we get from Best Practice? Something elusive (important content, interesting materials, good questions, cognitive complexity, comparability) Something intangible (fairness, alignment with standards, intended consequences) Something concrete (coverage, rater reliability, a validity or generalizability coefficient, acceptable cost)

Some TD Half Truths Multiple Choice Items  Development is hard  Scoring is easy (and public)  Quality Control built in to TD process Open-ended Items  Development is easy  Scoring is hard (and private)  Quality Control elusive due to scoring

Comparability in Test Materials Test form as the unit for judging comparability Easy to achieve with many items on the test and many potential throwaways in the pool Experienced test development staff Good field testing and scoring needed

Group Differences and Fairness TD seeks a balance Tension is that balance requires questions, lots of them Instructional influences confounded with group effects DIF requires good matching questions

Cost Factors in Large-Scale Testing Development Costs  Recur with each test form  Are fixed by instrument design Scoring Costs  Recur with each test administration  May change because of ‘unexpected’ circumstances

Validity in Test Development Best practice ensures content quality, balance, and alignment with standards – critical aspects of validity & reliability TD is predicated on anticipated use Other aspects of validity & reliability aren’t understood until it’s too late, i.e. when the test is operational

Validity & Capacity in NCLB NCLB is census testing Census testing places heavy demands on TD and other aspects of an accountability system Limit on capacity in TD means  only 1R, or 2Rs  fewer rounds of field testing  dwindling pools of test materials No item left behind

Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa.

Similar presentations

Presentation on theme: "Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa.

Similar presentations

Presentation on theme: "Developing the Tests for NCLB: No Item Left Behind Steve Dunbar Iowa Testing Programs University of Iowa."— Presentation transcript:

Similar presentations

About project

Feedback