Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Test Development.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Chapter 16: Correlation.
Rebecca Sleeper July  Statistical  Analysis of test taker performance on specific exam items  Qualitative  Evaluation of adherence to optimal.
Topic 4B Test Construction.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Describing Relationships Using Correlation and Regression
Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)
Item Analysis What makes a question good??? Answer options?
PSY 307 – Statistics for the Behavioral Sciences
Lesson Seven Item Analysis. Contents Item Analysis Item Analysis Item difficulty (item facility) Item difficulty (item facility) Item difficulty Item.
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Lesson Nine Item Analysis.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Personality, 9e Jerry M. Burger
ANALYZING AND USING TEST ITEM DATA
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 7 Confidence Intervals and Sample Sizes
Equations in Simple Regression Analysis. The Variance.
Chapter 7 Estimation: Single Population
©2012 Pearson Education, Auditing 14/e, Arens/Elder/Beasley Audit Sampling for Tests of Details of Balances Chapter 17.
Correlation.
Chapter 15 Correlation and Regression
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Analyzing and Interpreting Quantitative Data
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
3 Some Key Ingredients for Inferential Statistics.
Differential Item Functioning. Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Estimates and Sample Sizes Chapter 6 M A R I O F. T R I O L A Copyright © 1998,
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Tests and Measurements
Overview and interpretation
Chapter 6 - Standardized Measurement and Assessment
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Items analysis Introduction Items can adopt different formats and assess cognitive variables (skills, performance, etc.) where there are right and.
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Evaluation of measuring tools: validity
Classroom Analytics.
Reliability & Validity
Calculating Reliability of Quantitative Measures
Test Development Test conceptualization Test construction Test tryout
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
Using statistics to evaluate your test Gerard Seinhorst
UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT
Analyzing test data using Excel Gerard Seinhorst
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through a process known as item analysis. —Linda Croker

Both the validity and the reliability of any test depend ultimately on the characteristics of its items.

Two Approaches of Item Analysis Qualitative Analysis Quantitative Analysis

Qualitative Analysis includes the consideration of content validity (content and form of items), as well as the evaluation of items in terms of effective item-writing procedures.

Quantitative Analysis includes principally the measurement of item difficulty and item discrimination.

§1 Item Difficulty 1. Definition The item difficulty for item i, p i, is defined as the proportion of examinees who get that item correct.

Though the proportion of examinees passing an item traditionally has been called the item difficulty, this proportion logically should be called item easiness, because the proportion increase as the item becomes easier.

2. Estimation Methods Method for Dichotomously Scored Item Method for Polytomously Scored Item Grouping Method

Method for Dichotomously Scored Items (7.1) p is the difficulty of a certain item. R is the number of examinees who get that item correct. N is the total number of examinees.

Example 1 There are 80 high school students attending a science achievement test, and 61 students pass item 1, 32 students pass item 10. Please calculate the difficulty for item 1 and 10 separately.

Method for Polytomously Scored Items (7.2), the mean of total examinees’ scores on one item, the perfect scores of that item

Example 2 The perfect scores of one open- ended item is 20 points, the average score of total examinees on this item is 11 points. What is the item difficulty? Key:.55

Grouping Method (Use of Extreme Groups) Upper (U) and Lower (L) Criterion groups are selected from the extremes of distribution of test scores or job ratings. T. L. Kelley (1939) proposed that upper and lower 27% could lead to the optimal point when the total test scores are normally distributed.

is th proportion for examinees of upper group who get the item correct. (7.3) is the proportion for examinees of lower group who get the item correct.

Example 3 There are 370 examinees attending a language test. Known that 64 examinees of 27% upper extreme group pass item 5, and 33 examinees of 27% lower extreme group pass the same item. Please compute the difficulty of item 5. Key :.49

3. Correct Chance Effects on Item Difficulty for Multiple-Choice Item (7.4), corrected item difficulty,uncorrected item difficulty, the number of choices for that item

Example 4 The diffuculty of one five-choice item is.50, the difficulty of another four-choice item is.53. Which item is more difficulty?

ANSWER So, the four-choice item is more difficulty.

4. Item Difficulty and Discrimination Discrimination Difficulty

If there are 100 persons in one population, then,we can calculate the discriminations as following: P=.01, 1 × 99 = 99 P=.02, 2 × 98 = 196 P=.3, 30× 70 = 2100 P=.5, 50 × 50 = 2500

5. Test difficulty and the Distribution of Test Scores How to Calculate the Test Difficulty ? Two Methods A calculate the mean of all item difficulties of the test B compute the ratio of mean of test scores to perfect test scores

Test difficulty and the Distribution of Test Scores (a) Positive Skewed Distribution (b) Negtive Skewed Distribution

§2 Item Discrimination When the test as a whole is to be evaluated by means of criterion-related validation, the items may themselves be evaluated and selected on the basis of their relationships to the external criterion. When we identify an item for which high scoring examinees have a high probability of answering correctly and low-scoring examinees have a low probability of answer correctly, we would say such an item can discriminates or differentiates the examinees.

1.Interpretation Item discrimination refers to the degree to which an item differentiates correctly among test takers in the behavior that the test is designed to measure.

2. Estimation Methods Index of Discrimination (used for dichotomously scored items) D = P H - P L (7.5) We need to set one or two cutting scores to divide the examinees into upper scoring group and lower scoring group. P H is the proportion in the upper group who answer the item correctly and P L is the proportion in the lower group who answer the item correctly. Values of D may range from to 1.00.

Example 1 There are 140 students attending a world history test. (1) If we use the ratio 27% to determine the upper and lower group, then how many examinees are there in the upper and lower group separately? (2)If 18 examinees in upper group answer item 5 correctly, and 6 examinees in lower group answer it correctly, then calculate the discrimination index for item 5.

Example 2 50 Examinees’ Test Data on 8-Item Scale About Job Stress. Item PHPLPHPL D

Guidelines for Interpretation of D Value D≥.40, the item is functioning quite satisfactorily.30≤ D≤.39, little or no revision is required.20 ≤ D≤.29, the item is marginal and needs revision D≤.19, the item should be eliminated or completely revised

Correlation Indices of Item Discrimination (1)Pearson Product Moment Correlation Coefficient This formula is commonly used to estimate the degree of the relationship between item and criterion scores

(2) Point Biserial Correlation If we use the total test score as the criterion, and test item is scored 0 to 1, then we can use the following formula: (7.6) is the mean test scores for those who answer the item correctly is the mean scores for the entire group is the standard deviation of test scores for entire group is the pass ratio of that item (difficulty ) is fail ratio of that item

Example 3 the Test Data of 15 Examinees Examinees Test score Item score note:

Transformation of Formula 7.6 ( 7.7) is the mean test scores for those who answer that item incorrectly

(3) Biserial Correlation Coefficient or

(4) Correlation Between Items a) Tetrachoric Correlation Coefficient Each variable is created through dichotomizing an underlying normal distribution (7.8) A B CD Item i 0 1 Item j 1010 A+C B+D A+B C+D

b) PHI Coefficient (7.9)

Variance for item (7.10)

Difficulty and Discrimination P D

§3 Application Case of Item Analysis 1.Procedures Select a representative sample of examinees and administer the test; Differentiate the examinees into upper 27% (or 30% etc.) group and lower 27% group according to their test scores; Calculate P U and P L, then estimate P and D for each item; Compare the responses on different choices for each item between the upper group and lower group; ● Revise items.

2. Analysis Case P D ItemGroupNumber of Examinees on Each Choice Key ABCDOmit 1Upper B Lower Upper A Lower Upper D Lower Upper C Lower

Choice Analysis Whether the examinees who choose the correct choice is more than those who choose the wrong choices Whether a lot of examinees choose the wrong choices Whether the examinees of upper group who choose the correct choice is more than the examinees of lower group Whether the examinees of upper group who choose the wrong choice is more than those of lower group Whether there is any choice that few examinees choose Whether there is any item that quite a number of examinees make no choices