Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Item Analysis What makes a question good??? Answer options?
PSY 307 – Statistics for the Behavioral Sciences
Chi-square Test of Independence
Measures of Central Tendency
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Measures of Central Tendency
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 11 Descriptive Statistics Gay, Mills, and Airasian
Descriptive Statistics
Analyzing and Interpreting Quantitative Data
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Techniques to improve test items and instruction
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Group 2: 1. Miss. Duong Sochivy 2. Miss. Im Samphy 3. Miss. Lay Sreyleap 4. Miss. Seng Puthy 1 ROYAL UNIVERSITY OF PHNOM PENH INSTITUTE OF FOREIGN LANGUAGES.
Interpreting Performance Data
Descriptive Statistics
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Psychology’s Statistics Module 03. Module Overview Frequency Distributions Measures of Central Tendency Measures of Variation Normal Distribution Comparative.
Skewness & Kurtosis: Reference
Chapter 2 Statistical Concepts Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Dispersion
INVESTIGATION 1.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Basic Measurement and Statistics in Testing. Outline Central Tendency and Dispersion Standardized Scores Error and Standard Error of Measurement (Sm)
Measures of Central Tendency: The Mean, Median, and Mode
Grading and Analysis Report For Clinical Portfolio 1.
Z-Scores Standardized Scores. Standardizing scores With non-equivalent assessments it is not possible to develop additive summary statistics. –e.g., averaging.
Thinking About Psychology The Science of Mind and Behavior 3e Charles T. Blair-Broeker & Randal M. Ernst PowerPoint Presentation Slides by Kent Korek Germantown.
DESCRIPTIVE STATISTICS. Nothing new!! You are already using it!!
L643: Evaluation of Information Systems Week 13: March, 2008.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
STATISTICS. STATISTICS The numerical records of any event or phenomena are referred to as statistics. The data are the details in the numerical records.
Chapter 6: Analyzing and Interpreting Quantitative Data
Summary Statistics: Measures of Location and Dispersion.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
1 Chapter 10: Describing the Data Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house.
By Tatre Jantarakolica1 Fundamental Statistics and Economics for Evaluating Survey Data of Price Indices.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Psychometrics: Exam Analysis David Hope
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
DESCRIPTIVE STATISTICS
Classroom Analytics.
z-Scores, the Normal Curve, & Standard Error of the Mean
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Module 8 Statistical Reasoning in Everyday Life
Basic Statistical Terms
Psychology Statistics
Using statistics to evaluate your test Gerard Seinhorst
Analyzing test data using Excel Gerard Seinhorst
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses Prepared by Yi-Hsin Chen, Chunhua Cao, and Stephanie Green College of Education at USF Presented at the meeting of the Central Florida Assessment Collaborative (CFAC) May 20th, 2014, Orlando Florida

Agenda of This Presentation SAS macro for CTT test/item analysis, IRT 2PL model, and Mantel-Haenszel differential item functioning (DIF) analysis Introduction of statistical concepts for test/item development Item Analyses: CTT and IRT Distractor Analysis DIF Analysis

SAS macro for test/item, 2PL, DIF analyses

SAS Macro Outputs A SAS macro developed for this project There are six excel outputs Test score statistics Frequencies of options for each item Item analysis statistics Distractor analysis DIF 2PL item parameter Available upon request at ychen5@usf.edu

Test Score Statistics

Frequencies of Options

Item Analysis Statistics

Item Analysis Statistics

Distractor Analysis

DIF Analysis

Statistical Concepts of Test Scores

Sample size N: Sample size 85, 60, 70, 44, 59, 89, 99, 79, . , 100 USED_N: Sample size used for analysis without missing data one missing data USED_N = 9

Central Tendency MEAN: Arithmetic average Most frequently reported measure of central tendency Sum of scores divided by number of scores

Test Statistics: Central Tendency MEDIAN (Q2): the score at the 50th percentile half of the examinees score above median, and half score below median 110 105 100 95 90 Median = 95+100 / 2 = 97.5 110 105 100 95 90 Median = 100

Percentiles Percentile is considered when we consider the percentage of scores that fall below a given point They are very useful for interpreting an individual student’s performance Q1: The score is at the 25th percentile Q1 = 10, indicating 25 percent of the students’ scores below 10 points Q3: The score is at the 75th percentile

Variability Range Subtract lowest score (Minimum) from highest score (Maximum) This is a rough measure of variability High score = 90 Low score = 50 Range = ? (40) High score = 100 Low score = 50 Range = ? (50) High score = 90 Low score = 30 Range = ? (60)

Variability Standard Deviation (SD): an average points that deviates from the mean score A measure of the amount of variability in examinees’ total scores Large SD = large variability (heterogeneity) Small SD = small variability (homogeneity) (scores cluster closer to the mean)

Variability Deviation Scores Squared 100-92= 8 82 = 64 100-92= 8 82 = 64 96-92 = 4 42 = 16 94-92 = 2 22 = 4 92-92 = 0 02 = 0 90-92 = -2 (-2)2 = 4 80-92 = -12 (-12)2 = 144 232 =  (X-Mean)2 SD =  (X-Mean)2 = 232 = N 6 Scores 100 96 94 92 90 80 Mean = 92 6.22

Skewness and Kurtosis SKEWNESS: a measure to tell the shape of the score distribution, such as positive or negative skewness or symmetry KURTOSIS: a measure of the "peakedness" of the score distribution

Skewness

Skewness a roughly negatively skewed distribution (bar chart)

Skewness

Skewness a roughly positively skewed distribution (bar chart)

Kurtosis Different kurtosis values K > 0 K = 0 K < 0

Reliability: Cronbach’s Alpha A measure of the test reliability, indicating the internal consistency of the test Sample dependent Different samples may obtain different reliability with the same test Ranges from 0 to 1 0.7 and above: good internal consistency

Standard Error of Measurement SEM (Standard Error of Measurement) SEM = STD * 1−𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 A higher reliable test can cause smaller SEM

Statistical Concepts of Item Analysis

Item Analysis Why care? – Item analysis helps you identify problems with your items (or scoring) These problems can be corrected, resulting in a better test, and better measurement

Item Analysis When is it useful? – Item analysis is most useful when you are developing a bank, or pool, of items that you will continue to use It can be used when evaluating standardized tests It is also a useful tool, anytime students have complained about an item It can be used to identify mis-keyed items

Item Difficulty (p-value) Item difficulty (proportion correct): the proportion of examinees tested that answered the item correctly # of students who responded correctly total # of students who responded p = Ncorrect Ntotal p =

Item Difficulty (p-value) p can range from 0 to 1.0 A rough level of item difficulty (p) .80 and above moderately easy to very easy (mastery) .80 - .30 moderate .30 and below moderately difficult to very difficult

Item Discrimination Discrimination can be computed using correlation This shows the relationship between a single item and the total test It is expected that students with high scores answer the item correctly rpb = (point-biserial) correlation between item score and total score

Item Discrimination Corrected point-biserial correlation: A statistic similar to point-biserial correlations The score of the individual item is taken out of the total score so that the contribution of the item itself is removed from the correlation This statistic is more accurate to represent item discrimination

Item Discrimination Two ability groups (upper and lower) approach Median score is used to divide the students into two groups Discrimination coefficient (D-value) = percentage correct in the upper group – percentage correct in the lower group Ranges from -1 to 1 An item with higher and positive D-value indicates a good discriminating item An item with a negative D-value suggests that the lower achieving group did better on an item than the higher achieving group, indicating a poor item

Item Discrimination A rough scale of item discrimination (D) D can range from -1 to 1 .30 and above moderate to high discrimination 0 - .30 little to no discrimination 0 and below negative discrimination (unwanted)

Item Difficulty and Discrimination Relationship between item difficulty and discrimination there can be little discrimination: if nearly everyone gets the item right, or if nearly everyone gets the item wrong there can be maximum discrimination: if about half the people got the item right, and about half got the item wrong

Item Difficulty and Discrimination 0 .5 1.0 Item Difficulty Max Discrimination 0 .5 1.0 Relationship between item difficulty and potential discrimination

Alpha If an Item Deleted “The Alpha If Deleted” shows what would happen to the internal consistency when the item is deleted When the test_alpha_deleted coefficient goes up, compared with the original test-alpha, it indicates that without the deleted item, the test can be more reliable (that item can be removed from the test) When the test_alpha_deleted coefficient goes down, it means that deleting that item is not a good thing and also indicates that item is a good item

Statistical Concepts of Distractor Analysis

Distractor Analysis used to determine which distractors students find attractive consider the proportion of (total) students choosing each option compare the number of examinees selecting each option in the High and Low groups, or Example: Proportion of total examinees selecting each option a* b c d Total .78 .11 .03 .08

Selecting upper and lower groups Upper and Lower groups are needed: to hand-compute D-values, and for distractor analysis when comparing numbers of examinees To select Upper and Lower groups: arrange the tests by total score separate out the tests for each group top half becomes Upper group, and bottom half becomes Lower group

Selecting upper and lower groups Upper and Lower groups are needed: to hand-compute D-values, and for distractor analysis when comparing number of examinees To select Upper and Lower groups: Upper group: top half (50%) or top 33% Lower group: bottom half (50%) or bottom 33%

Example 1: distractor analysis 1. The capital of Switzerland is Bern. Zurich. Lucerne. Geneva. Numbers in the High and Low groups who selected each option a* b c d Upper 13 1 Lower 3 2 9

Example 2: distractor analysis 2. The most important part of test planning is creating: sound instruction. a test blueprint. an item analysis plan. the grading curve. Numbers in the High and Low groups who selected each option a b* c d Upper 1 8 Lower 2

Example 3: distractor analysis 3. Which type of essay item contains the most explicit instructions to students? extended response fixed response explicit response restricted response Numbers in the High and Low groups who selected each option a b c* d Upper 3 1 2 14 Lower 4 7 8

Statistical Concepts of 2PL IRT model Analysis

Two-Parameter Logistic Model Alpha represents item discrimination The value is positive Beta represents item difficulty with the mean of 0 and the SD of 1 Items with the negative values = easy items Items with the positive values = hard items

Statistical Concepts of DIF Analysis

Differential Item Functioning A major concern regarding using the psychological measures is that these measures may “work differently” or be either “for or against” a particular group of examinees (e.g., gender or ethnicity) When a test item unfairly favors one group over another, it can be said to show differential item functioning or DIF

Uniform or consistent DIF

Non-uniform or crossing DIF

Mantel Haenszel chi-square 1 Total Reference Bt At NRt Focal Dt Ct NFt M0t M1t Tt subscript t = individual raw score

Mantel Haenszel chi-square Controlling for the observed score, we want to see if the proportion correct for the focal group is equal to that for the reference group on an item The MH statistic consists of a series of 2x2 contingency tables MH = 1 : No DIF MH < 1: DIF and favor the focal group (dummy=0) if p < .05 MH > 1: DIF and favor the reference group (dummy=1) if p < .05

Field Test Analyses

Test Statistics for Three Subjects Precalculus N 210 USED_N MEAN 9.748 STD 2.978 MIN 2 Q1 8 MEDIAN 10 Q3 11 MAX 20 SKEWNESS 0.378 KURTOSIS 0.679 ALPHA 0.506 SEM 2.093 STATISTIC Anatomy N 269 USED_N MEAN 12.364 STD 3.337 MIN 4 Q1 10 MEDIAN 12 Q3 15 MAX 21 SKEWNESS -0.102 KURTOSIS -0.447 ALPHA 0.533 SEM 2.281 STATISTIC Phy-Sci N 183 USED_N MEAN 12.852 STD 4.141 MIN 4 Q1 10 MEDIAN 13 Q3 16 MAX 25 SKEWNESS 0.088 KURTOSIS -0.658 ALPHA 0.626 SEM 2.531

Item Difficulty Item difficulty Pre-calculus (21 items) 0-0.1 0.1-0.2 (2 Items) 19, 3 0.2-0.3 (1 Item) 14, 1 0.3-0.7 (14 Items) 21, 10, 11, 18, 12, 20, 8, 16, 17, 15, 14, 6, 13, 2 0.7-0.8 (3 Items) 7, 5, 9 0.8-0.9 0.9-1.0 Item difficulty Anatomy (27 items) 0-0.1 (0 items) 0.1-0.2 (2 items) 13, 2 0.2-0.3 (6 items) 16, 8, 27, 3, 10, 20 0.3-0.7 (14 items) 17, 4, 9, 14, 11, 5, 18, 26, 25, 7, 15, 22, 21, 24 0.7-0.8 (3 items) 12, 19, 1 0.8-0.9 (2 items) 23, 6 0.9-1.0 Item difficulty Physical Science (31 items) 0-0.10 (1 item) 22 0.11-0.20 (2 items) 11, 28 0.21-0.30 (5 items) 16, 27, 9, 6, 20 0.31-0.70 (12 items) 30, 18, 12, 25, 31, 15, 2, 19, 24, 13, 26, 29, 21, 10, 23, 8, 7, 4, 3, 17, 5, 14 0.71-0.80 (1 item) 1 0.81-0.90 0 items 0.90-1.00

Item Discrimination (Corrected point-biserial correlation) Value Pre-calculus 21 items Negative Value (1 Item) 19 0-0.10 (2 Items) 11, 3 0.10-0.20 (13 Items) 9, 10, 2, 18, 8, 5, 17, 21, 1, 13, 14, 4, 12 0.20-0.30 (5 Items) 16, 15, 7, 6, 20 Above 0.30 (0 Items) Value Pre-calculus 27 items Negative Value (3 items) 3, 17, 13 0-0.10 (6 items) 16, 9, 20, 10, 27, 2 0.11-0.20 (9 items) 15, 4, 11, 26, 18, 12, 5, 14, 8 0.21-0.30 (9 items) 7, 1, 25, 22, 23, 21, 24, 6, 19 Above 0.30 0 items Value Physical Science 31 items Negative Value (6 items) 11, 22, 20, 12, 31, 1 0-0.10 (2 items) 21, 5 0.10-0.20 (8 items) 23, 19, 6, 28, 25, 18, 10, 16 0.20-0.30 (6 items) 8, 3, 15, 2, 30, 27 Above 0.30 (9 items) 13, 7, 17, 24, 29, 9, 14, 26, 4

Item Discrimination (Two-Group Approach) Value Pre-calculus 21 items Negative Value (0 Items) 0-0.10 (3 Items) 19, 3, 18 0.10-0.20 (7 Items) 21, 14, 9, 17, 10, 2, 11 0.20-0.30 (7 Items) 1, 5, 12, 13, 8, 7, 16 Above 0.30 (4 Items) 15, 20, 4, 6 Value Anatomy 27 items Negative Value (0 items) 0-0.10 (5 items) 13, 3, 16, 17,2 0.11-0.20 (7 items) 27, 9, 10, 20, 12, 6, 26 0.21-0.30 (11 items) 8, 4, 11, 1, 15, 18, 23, 15, 5, 7 Above 0.30 (5 items) 25, 24, 19, 22, 21 Value Number of items Negative Value (2 items) 11, 22 0-0.10 (4 items) 20, 1, 12, 31 0.11-0.20 (8 items) 28, 21, 23, 5, 19, 25, 6, 16 0.21-0.30 (6 items) 10, 3, 8, 27, 18, 9 Above 0.30 (10 items) 15, 30, 2, 17, 13, 24, 14, 7, 29, 26, 4

Alpha Difference (Alpha and Alpha When deleted) Pre-Calculus 21 items Negative Value (2 Items) 19, 11 0-0.005 (3 Items) 3, 9, 10 0.005-0.01 (5 Items) 2, 8, 18, 5, 17 Above 0.01 (14 Items) 21, 1, 13, 14, 4, 12, 16, 7, 15, 6, 20 Alpha Difference Anatomy 27 items Negative Value (7 items) 3, 17, 9, 16, 13, 20, 10 0-0.005 (3 items) 27, 2, 15 0.005-0.01 (4 items) 4, 11, 26, 18 Above 0.01 (13 items) 12, 5, 14, 8, 1, 7, 25, 23, 6, 22, 21, 19, 24 Alpha Difference Physical Science 31 items Negative Value (8 items) 11, 20, 12, 31, 22, 1, 21, 5 0-0.005 (6 items) 23, 19, 6, 28, 25, 18 0.006-0.01 (3 items) 10, 16, 8 Above 0.01 (14 items) 3, 15, 2, 27, 30, 13, 7, 17, 24, 9, 29, 14, 26, 4

Item Analysis Summary The test with reliability (alpha) less than .5 needs to be worried Too hard item (e.g., p-value < 0.1 or 0.2) or/and too easy (e.g., p-value close to 1) items may be revisited Revisiting Items with a negative value of discrimination is warranted, especially for the two-group item discrimination Items with negative alpha difference between the original test alpha and the test alpha when deleted are not good, either

DIF Results: Precalculus Girls = 0 Boys = 1 Favor boys

DIF Results: Precalculus Girls = 0 Boys = 1 Favor girls

DIF Results: Anatomy Girls = 0 Boys = 1 Favor boys

DIF Results: Anatomy Girls = 0 Boys = 1 Favor girls

DIF Results: Anatomy Girls = 0 Boys = 1 Favor boys

DIF Results: Anatomy Girls = 0 Boys = 1 Favor girls

DIF Results: Anatomy Girls = 0 Boys = 1 Favor girls

DIF Results: Physical Science Girls = 0 Boys = 1 Favor boys

DIF Results: Physical Science Girls = 0 Boys = 1 Favor boys

DIF Results: Physical Science Girls = 0 Boys = 1 Favor girls

Distractor Analysis: Typical Problems and Solutions

Precalculus: Item 29 Frequency Row Pct Table of groupB by r19 groupB D Total LOWER GROUP 14 27.45 21 41.18 11 21.57 5 9.80 51 UPPER GROUP 47 41.23 32 28.07 27 23.68 8 7.02 114 61 53 38 13 165 Frequency Missing = 76

Precalculus: Item 29 The item is a hard item (p = 0.18)

Precalculus: Item 3 Frequency Row Pct Table of groupB by r3 groupB r3   Table of groupB by r3 groupB r3 A B C* D Total LOWER GROUP 20 32.26 29 46.77 9 14.52 4 6.45 62 UPPER GROUP 40 30.30 56 42.42 30 22.73 6 4.55 132 60 85 39 10 194 Frequency Missing = 47 Add items 3 and 1 for worse items Add item 21 as a good item with high discrimination and high difficulty Add two items with moderate difficulty and high discrimination The item is a hard item (p = 0.162)

Precalculus: Item 3 The item is a hard item (p = 0.19) Add items 3 and 1 for worse items Add item 21 as a good item with high discrimination and high difficulty Add two items with moderate difficulty and high discrimination The item is a hard item (p = 0.19)

Precalculus: Item 1 Frequency Row Pct Table of groupB by r1 groupB r1   Table of groupB by r1 groupB r1 A B C D* Total LOWER GROUP 14 24.14 27 46.55 5 8.62 12 20.69 58 UPPER GROUP 23 17.83 49 37.98 8 6.20 129 37 76 13 61 187 Frequency Missing = 54 The item is a hard item (p = 0.253)

Precalculus: Item 1 The item is a hard item (p = 0.30)

Precalculus: Item 14 Table of groupB by r14 groupB r14 - A B C D* Total LOWER GROUP 5 5.56 8 8.89 3 3.33 57 63.33 17 18.89 90 UPPER GROUP 5 4.55 4 3.64 59 53.64 38 34.55 110 10 12 7 116 55 200 Frequency Missing = 10

Precalculus: Item 14 The item is challenging (p = 0.26) Option C may be the potential key Or students have a misconception on this item

Precalculus: Good Item Table of groupB by r21 groupB r21 - A B* C D Total LOWER GROUP 33 33.33 18 18.18 23 23.23 10 10.10 15 15.15 99 UPPER GROUP 23 20.72 16 14.41 43 38.74 18 16.22 11 9.91 111 56 34 66 28 26 210 The item is challenging (p = 0.266) Discriminating well

Precalculus: Good Item The item is challenging (p = 0.31) Discriminating well However, this item shows DIF and favors girls

Summary for Precalculus Some items need to revisit: Items: 19, 3, 1, and 14 Develop some easy items (p=.70-.90) Two DIF items Items 4 and 21

Anatomy: Hard Item The item is a hard item (p = 0.271) Table of groupB by r3 groupB r3 A B C* D Total LOWER GROUP 20 18.52 24 22.22 25 23.15 39 36.11 108 UPPER GROUP 7 4.38 28 17.50 48 30.00 77 48.13 160 27 52 73 116 268 Frequency Missing = 1 The item is a hard item (p = 0.271) Not discriminating well

Anatomy: Hard Item The item is a hard item (p = 0.271) Not discriminating well

Anatomy: Potential Miskey Table of groupB by r16 groupB r16 A B C D* Total LOWER GROUP 52 48.15 27 25.00 10 9.26 19 17.59 108 UPPER GROUP 92 57.50 16 10.00 12 7.50 40 25.00 160 144 43 22 59 268 Frequency Missing = 1 The item may have a miskey of Option D The possible correct key is Option A (Majority of the upper group chose this option)

Anatomy: Potential Miskey The item may have a miskey of Option D The possible correct key is Option A (Majority of the upper group chose this option) Or there is a misconception on this item

Anatomy: Good Item The item has moderate difficulty level(p = 0.491) Table of groupB by r25 groupB r25 A B C D* Total LOWER GROUP 13 11.93 30 27.52 32 29.36 34 31.19 109 UPPER GROUP 8 5.03 22 13.84 31 19.50 98 61.64 159 21 52 63 132 268 Frequency Missing = 1 The item has moderate difficulty level(p = 0.491) Discriminating well

Anatomy: Good Item The item has moderate difficulty level(p = 0.491) Discriminating well

Summary for Anatomy The p-value of the items look good, with half of the items being moderate difficult, almost one quarter of them being easy, and almost one quarter being difficulty No negative discrimination items using the two-group approach (a good sign) The test alpha is low (0.533) DIF: Items 14, 19 (favoring boys) and items15, 22, 26 (favoring girls)

Physical Science: Item too hard Table of groupB by r28 groupB r28 - A B C D* Total LOWER GROUP 2 2.33 19 22.09 37 43.02 9 10.47 86 UPPER GROUP 20 20.62 4 4.12 44 45.36 8 8.25 21 21.65 97 22 23 81 27 30 183 The item is a hard item (p = 0.164)

Physical Science: Item too hard The item is a hard item (p = 0.164)

Physical Science: Potential Miskey Table of groupB by r11 groupB r11 A B C* D Total LOWER GROUP 35 40.70 26 30.23 17 19.77 8 9.30 86 UPPER GROUP 66 68.04 11 11.34 9 9.28 97 101 37 28 17 183 The item may have a miskey of Option C The possible correct key is Option A (Majority of the upper group chose this option)

Physical Science: Potential Miskey

Physical Science: Good Item Table of groupB by r27 groupB r27 A B C D* Total LOWER GROUP 13 15.12 30 34.88 34 39.53 9 10.47 86 UPPER GROUP 9 9.28 19 19.59 33 34.02 36 37.11 97 22 49 67 45 183 The item has moderate difficulty level(p = 0.491) Discriminating well

Physical Science: Good Item

Summary for Physical Science Some items need to revisit: Items: 6, 11, 12, 22 Potential miskey item: 11 Develop some easy items (p=.70-.85) DIF: Items 3 and 4 (favoring boys) and Item 7 (favoring girls)