On-Line Student Assessment Richard Hill Center for Assessment Nov. 5, 2001
Speaking Points Current paper-and-pencil-based assessments Image Scoring Computer Administration Computer Scoring
Typical Current Paper-and-Pencil Based Statewide Assessment 3 grades Reading, writing, math, science, social studies 30 MC and 6 OE questions for four areas, one essay for writing 50,000 students per grade
Materials Processed 150, page test booklets 2 millions sheets of paper 10 tons of paper, a stack 700 feet high 150, page answer documents 1.5 million sheets of special paper 7.5 tons 600 boxes to store (per year)
Process Materials shipped to schools Materials shipped back to contractor Materials logged in Count everything, resolve discrepancies Note that one misplaced school can stop entire process
Process for Receiving Materials Separate answer booklets from test booklets Test booklets placed in temporary storage in original boxes, then destroyed after reporting complete Answer sheets guillotined MC answer sheets scanned OE sheets packaged by scoring
Processing of OE Sheets Separate by content area Sorted by form, randomized across schools Scanned to capture ID numbers Scoring headers prepared, then merged with answer sheets
Scoring Hire, train, qualify Score On-going evaluation of quality of scoring Determine papers that need adjudication, then rescore as necessary Scan scoring headers Merge MC, OE and writing scores
Scoring Time 20 seconds per OE question 5 minutes per essay (2 scorings plus adjudication, if necessary) 13 minutes per student 32,500 hours 1000 person-weeks, plus training, qualifying, quality control and equating
Equating to Previous Year MC OE Difficulty of items Changes in scoring
Count, Count, Count Initial log-in counts After packaging Every time a box is opened or closed Count boxes, too
Final Steps Ship reports back to schools Resolve problems Missing or misplaced students Challenges to scoring (requires finding answer sheets—perhaps all for one student) Destroy test materials Long-term storage for answer documents
Solution # 1—Image Scoring High-speed scanners capture images of documents All processing is done on CRTs by looking at electronic image of original paper
Advantages Control Scoring Blind read-behinds Real-time tracking of accuracy of every scorer Multiple sites Equating Blind rescores from previous year
Advantages (cont’d) Scoring speed Next response is ready to be scored when first is done Scoring stops when rates decline No fumbling for papers Up to 1/3 faster
Advantages (cont’d) Tracking No need for counting Nothing is lost Nothing is damaged Records automatically linked Special-request papers easy to obtain Prep for next year’s scoring Challenged papers Adjudication
Advantages (cont’d) Reporting—Send sample of work home to parents Storage Permanent Compact
Disadvantages Hardware and software costs Costs have dropped dramatically ($150,000 server two years ago now selling for $16,000) Need to prove that scoring is the same Writing vs. OE Connectivity Power outages
Computer Administered Tests Web-based vs. CD Comparability Standards—especially writing Students that write on paper and then just type in Full use of computer capabilities Underestimation of (some) students’ abilities
Georgia’s Proposed System Huge item bank, three levels Teachers can create tests Capacity concerns for Level III tests
Advantages Elimination of paper Accommodations Adaptive testing Shorter tests Diagnostic tests Lower frustation levels Real-time scoring
Issues Administration time All schools have some computers, but how many? Transition Recommendation is to test all schools the same way Comparability Logistics of operating two programs at same time
Computer Scoring Major vendors NCME Session N1, April 12, 2001 ETS Technologies—E-rater (Princeton, NJ) Vantage Learning—Intellimetric (Yardley, PA) TruJudge—Project Essay Grade (PEG) (Purdue) Knowledge Analysis Technologies— Intelligent Essay Assessor (Boulder, CO)
Advantages Time Cost Objective (or at least impersonal)
Issues Accuracy rates PA study—computers vs. humans Computer more accurate than one human Computer less accurate than two humans Bias vs. random error Beating the system (“Stakes changes everything”) Capacity of contractors to deliver logistics
Alternate Testing Modes Listening Special education adaptations—see Tindel Virtual reality