Download presentation
Presentation is loading. Please wait.
Published byCornelius Miller Modified over 9 years ago
1
Multiple Perspectives on CAT for K-12 Assessments: Possibilities and Realities Alan Nicewander Pacific Metrics 1
2
The following are some putative advantages of CAT relative to paper-based tests, and so-called linear tests delivered by computer. Some of these are listed below with comments: CAT allows a pool of items to be used for on-demand testing of students and/or multiple testing of the same student…AND, at the same time, preserves the security of the item bank. CAT presents test items at a level appropriate for a student’s ability level. It should also be mentioned that CAT eliminates test booklets that can be stolen or lost--thereby, compromising test security. 2
3
CATs are significantly shorter, and have higher measurement efficiency than linear tests. They are shorter because time is not wasted by: – Presenting low proficiency students difficult items leads to many incorrect responses from which little is learned about student proficiency. – Presenting highly proficient students with items that are so easy that the extent of their knowledge is not revealed. – Even though CATs are shorter than linear tests, they are capable of increasing the reliability of measurement in the extremes of the proficiency distribution relative to linear tests of greater length. 3
4
Comments Any type of on-demand or repeated testing carries the risk of item exposure. A crucial variable for increasing the risk of item exposure during on demand, repetitive testing is the degree to which the test is high-stakes; the higher the stakes, the greater the pressure on the item pool for exposure. The prime example here is the CAT-GRE, which was abandoned partly because of item security issues. 4
5
To insure reasonable levels of CAT security, two methods have been found to be most effective in simulations: – Stochastic exposure control using the Sympson-Hetter method (or a similar method). – Increasing the number of items in the pool. These findings are from initial R&D done for development of the CAT-ASVAB. 5
6
Comments It is true that CATs can be considerably shorter in length. For example, the CAT-ASVAB is 1/3 shorter than the paper-based version (129 vs. 200 items), and the reliability coefficients run about 15% higher. – However, the CAT-ASVAB has moderate exposure control and very little content balancing imposed on optimum item selection. – Increasing the levels of exposure-and-content controls can lead to longer test lengths and BATs (barely adaptive tests). – Increased levels of exposure control and content balancing lead to longer tests with lower reliability. 6
7
Existing test forms can be used to produce item pools for CAT. 7
8
Comments CAT item pool development can be a daunting task. As an illustration, suppose a current, paper-based testing program is administered with three forms of a 50-item test. [Note that the item exposure rate for the current procedure is 1/3 (each time a test is given, 1/3 of the total collection of items is exposed).] If it assumed that a CAT system can reduce test length to 35 items, how many items need to be developed to form the pools needed? A general rule is to have pool size five times the length of the CAT; this leads to 175 items in each pool in this example. 8
9
Now, further assume that students will be allowed to take the CAT three times during a year. How many item pools are needed to attain the same exposure rate as the 50-item paper-based test being replaced? – Three pools will be needed to achieve the same theoretical exposure rate as the paper-based test. Also, a statistical exposure control (such as Sympson-Hetter) will be needed to overcome the fact that, within a pool, certain items are selected very frequently by a procedure that maximizes test information. 9
10
So, we are left with these number for item-pool development: 3 pools of 175 items each = 525 items, and using the general rule that one must write twice as many items as necessary, this means that 1,050 items must be written for this rather modest CAT project. Or perhaps more realistically, (525 – 150)*2 = 750 new items will have to be written if all the paper-based items are used in the item pools. 10
11
The bottom line is that CAT: – can provide tests at a level appropriate for a student’s ability. – can save testing time and increase test reliability. – is unlikely to save money because it can be a giant, item- eating machine. – Offers the possibility of greater protection of the items from compromise than would be possible by the computer administration of a current paper-based test. 11
12
Evaluating a CAT Item Pool using Optimal Adaptive Tests (OATs) We are now going to construct some adaptive tests in an optimal way in order to illustrate some problems and to indicate an interesting possibility for implementing CAT. If one knew a person’s standing on the latent trait, θ, it would be easy to choose a fixed number of items (from some item pool) that will maximize the test information. —We call such a test an “optimal adaptive test” (OAT) in that no other test from this item pool, and of the same length, could exceed this test’s measurement accuracy. 12
13
The use of OATs for evaluating an item pool is now illustrated using an operational item pool for mathematics. – This item pool contains 84 items, and is used to construct 15 item adaptive tests for various values of the latent trait. – The items in the pool have an average a-value of 1.61; S.D. =.51 an average b-value of -.06; S.D. = 1.10 and an average c-value of.15; S.D. =.07 – For its intended purpose, this is an excellent item bank. 13
14
Using a grid of θ’s from -3 to 3 at intervals of.5, 13 OATs were constructed from the 84-item bank. In order to illustrate the item-overlap in this collection of OATs, three of these were designated as focal OATs. – These focal OATs were those at θ = -1.5, 0 and 1.5. – One might think of these as the optimal tests for three cut- scores. – In the next three slides (one for each of the focal OATs), the overlap with neighboring OATs are shown. – Accuracy of the OATs are indicated with information functions and reliability coefficients. 14
15
OAT at θ = -1.5 and Overlap with Neighboring OATs Theta-3-2.5-2-1.5 ItemsInfo.ItemsInfo.ItemsInfo.ItemsInfo.ItemsInfo. 10.0464110.150510.365510.6331610.73234 20.0701220.1671950.6662820.5500520.66594 50.1376250.40163120.4443950.61413110.64113 120.13777120.282130.61745120.51887160.55656 130.15398130.39049150.36296130.59487460.8022 140.10668140.22847160.39786160.55072470.58048 150.07031150.18718460.37141460.73021490.5616 160.09615160.2189470.51934470.67994500.55776 470.09194470.26129740.59481490.51411511.56191 490.05738490.15774751.14763751.05888520.73292 740.29611740.55697760.59026761.46162761.10421 750.08371750.471780.47144770.5133771.25759 780.20636780.38526801.39435790.56519790.87044 800.05043800.43173810.47526801.32425810.65866 820.27267820.38417820.37743810.79822830.59127 Test Info.1.87764.67458.7963 11.107511.8750 Reliability0.65290.82370.8979 0.91740.9223 OAT(-1.5)0.49650.76810.88780.91740.9089 15
16
OAT for θ = 0 and Neighboring OATs Theta-0.500.5 ItemsInfo.ItemsInfo.ItemsInfo. 60.6476661.08164281.92147 110.65243241.47013322.3066 190.69925252.4096332.30979 240.64689261.37743342.14517 251.19992311.89917382.04069 260.72767321.67205402.00697 441.58144441.91164411.90624 512.40629521.23469433.18838 521.7944531.88879532.32357 560.66504541.11195542.76355 571.06812561.38887611.88903 580.92946571.13203623.74146 590.68789621.5257632.08051 770.92247642.9732642.44572 790.67854651.12879702.06511 Test Info.15.3075 24.205735.1343 Reliability0.9387 0.96030.9723 OAT(0)0.91210.96030.9571 16
17
OAT for θ = 1.5 and Neighboring OATs Theta11.522.53 ItemsInfo.ItemsInfo.ItemsInfo.ItemsInfo.ItemsInfo. 281.5391931.1398330.4516130.1269430.03231 332.124171.5392340.3353540.2623740.16677 341.77975200.97318100.25826100.1473100.07545 353.35738221.71518173.40987171.07741170.16494 362.17064230.77671200.96636200.48872200.17674 372.32081270.98121210.48797210.39597210.24441 382.88038351.39005221.56596220.55974220.13937 393.38557362.94918231.24509230.85198230.34086 401.71708391.80579270.30236270.06867270.01453 422.03306452.09891360.62812360.08192360.00996 432.25752600.97349390.28148450.13509450.02322 602.48717680.75244450.70132680.03909680.0081 631.42694691.66372690.97469690.28709690.06823 681.85907720.96606720.4508720.15591720.0485 702.23114731.8975731.62414730.45751730.09133 Test Info.33.5697 21.622413.68335.13571.6047 Reliability0.9710 0.95580.93190.83700.6161 OAT(1.5)0.95360.95580.92050.81590.5314 17
18
Focal OATS Derived Using the Rasch Model 18
19
Conclusions The previous slides indicate that there will be considerable overlap in the CATs constructed from this item bank--in spite of the fact that there is considerable variability in the difficulty of the items. – Hence, many of the items will be “overly-exposed” and subject to compromise. In the actual use of this item bank, the exposure of items is controlled using the Sympson-Hetter Exposure Control method. 19
20
The previous slides also indicate that the three focal, OATs, optimal for θ = -1.5, 0 and 1.5, do a rather remarkable job of providing accurate measurement across the θ-continuum even though they only contain 15 items each. OATs—and by implication, CATs in general—will differ depending on the IRT model used in development and implementation. 20
21
This also suggests, that a two-stage, CAT procedure would work quite well with this item bank. – In a two-stage CAT, an initial, Stage 1 test is administered in either a CAT mode or as a fixed, medium difficulty test. – Scores on the Stage 1 test are used to assign examinees to one of several Stage 2 tests which vary in overall difficulty from easy to difficult—for example one of the three, focal OATs described above. – In this case (and perhaps in most cases), a pure CAT, where items are selected “on the fly”, does not seem to have any advantages over the pre-selected, optimal, Stage 2 tests. 21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.