Download presentation
Presentation is loading. Please wait.
Published byAnn Wheeler Modified over 9 years ago
1
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. jay.breyer@thomson.com Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix, Arizona
2
2 Test Development Process (Where we have been) Content: found to be important for job as determined by job analysis Sampling of content: How many items are needed in the test form necessary to assess minimal competency? Importance of content domains: What is the emphasis on specific content domains? Based on identified test specifications, select items that match content domains Evaluate total item bank Pretest new items Evaluate statistical parameters: verify appropriate performance of items Outcome: Valid & reliable test that is sound and defensible But wait!!!: We can do something else … how can we change what we do to improve the testing program? Review and edit items to ensure correct grammatical structure and adherence to fairness and sensitivity guidelines Equate test forms following the standard setting to ensure comparability of test scores for different test forms Prepare test forms for administration : paper-and- pencil delivery or computer delivery Validity Reliability & Defensibility Content Test Specifications Item Type Item Development Item Writing Statistical Analysis Form Assembly Edit & Fairness Review Statistical Parameters Test Modality
3
3 It seems we would never get to this point but here we are and before the next test is created … What can we learn from this administration? What should we do to find out about our examination we just gave and reported? After the Examination is Over…. Activities What is the size and quality of my item bank Do I have sufficient numbers of items in each content area for the next examination form? Can I assemble the next form to content and statistical specifications? How do I find out what my statistical specifications are? What is the reliability of my test?
4
4 Determining appropriate psychometric approaches to item and test development What do you do if your test is too Long for the time allotted? Too hard/easy for the population tested and the purpose? Not sufficiently reliable for the test’s purpose? Item analysis of the test before scores are reported helps ensure validity Correct keys are used to grant points Items function as intended But Test Analyses after the test is reported can be useful for Construction of new test forms Evaluation of item creation techniques Changes that improve the testing program Approaches Challenges
5
5 Help ensure quality for testing programs that wish to verify that appropriate test development and psychometric procedures are being used. These analyses help to verify that the program’s test development activities are psychometrically sound and provide directions for possible continuous improvement Test Analyses Assure the public of meeting basic standards of Quality & Fairness Reliability Answer the question “How are my test development activities doing?” Analyses should not Analyses should Limit innovation or have a punitive function Be ignored
6
6 Item Analyses at Different Times PIA –Preliminary Item Analysis EIA –Early Item Analysis IA after PINS but before equating or cut score study FIA –Final Item Analysis
7
7 PIA: Only Bad Items
8
8 PIA: Hard Item
9
9 PIA: Key Issue
10
10 FIA: Everything 98.2 72.3 C 89.0
11
11 Item/Task Information Total Score Information Subscore Information Reliability Score Distributions Descriptive Information Speededness Reliability of reported subscores Score Distributions Descriptive Information Post Test Administration Inquiry A FAIR TEST Quality of items/tasks from past test Difficulty Discrimination DIF
12
12 Score Information: Reliability and Validity Reliability –Consistency & Accuracy Validity –Score inferences, score meaning, score interpretations What we can say about people
13
13 Score Information: Reliability Reliability –Consistency and Accuracy Credential Testing –Refers to consistency of test scores across different test forms given the content sampling Alpha, Kuder-Richardson, (K-R 20 ) –Refers to consistency of passing and failing the same people as if they were able to take the test twice Subkoviak, PF Consistency, RELCLASS
14
14 Score Information: Reliability Measurement Error –Refers to random fluctuations in a person’s score due to factors not related to the content of the test SEM CSEM
15
15 Test Analyses: Score Information 0.88 75%
16
16 Test Analyses: Score Information Correlations can add to the understanding of score reliability
17
17 Item Information: DIF & Sensitivity Sensitivity –How questions appear –Review by TD Person Removes words and phrases from a test that may be Insulting Defamatory Charged Differential Item Functioning (DIF) –How question behave –Searches for items with Construct Irrelevant Variance –Tests differences in item difficulty for k groups when matched on proficiency –Mantel-Haenszel
18
18 DIF Impact is not DIF –The assessment of group differences in test performance between unmatched focal and reference group members –Confounding of item performance differences between focal; and reference groups
19
19 DIF How DIF is calculated –The criterion is the total test score or Construct –The question DIF answers is: Is the meaning the same for the focal group as it is for the reference group? –If the interpretation of the scores – the meaning, is different for subgroups then DIF is present DIF has to do with improving validity
20
20 In Summary Statistical Information following Test Administration can provide –Item information Difficulty and suitability of the items/tasks for your candidate samples DIF –Potential sources of bias (invalidity) –Decision Score Information Distributions – descriptive statistics – reliability information –Subscore Information Reliability information – intercorrelations Help highlight areas for continuous improvement –Kaizen
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.