Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003

Similar presentations


Presentation on theme: "Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003"— Presentation transcript:

1 Development of a Confidence Interval for Small Sample Expert Review of Item Content Validation
Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003 University of Florida &

2 INTRODUCING CONTENT VALIDITY
“Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999) Content validity refers to the degree to which the content of the items reflects the content domain of interest (APA, 1954)

3 THE NEED FOR IMPROVED REPORTING
Content is a precursor to drawing a score-based inference. It is evidence-in-waiting (Shepard, 1993; Yalow & Popham, 1983) “Unfortunately, in many technical manuals, content representation is dealt with in a paragraph, indicating that selected panels of subject matter experts (SMEs) reviewed the test content, or mapped the items to the content standards – and all is well (Crocker, 2003)”

4 QUANTIFYING CONTENT VALIDITY
Several indices for quantifying expert agreement have been proposed For many, experts quantify the match of the item to an objective using a rating scale The mean rating across raters is often used in calculations Klein & Kosecoff’s Correlation (1975) Aiken’s V (1985) The mean, by itself, does not account for error and does not tell us how far it lies from the population mean. WE NEED A CONFIDENCE INTERVAL!

5 THE CONFIDENCE INTERVAL
The traditional confidence interval assumes a normal distribution for the sample mean of a rating scale. However, the assumption of population normality can not be justified when analyzing the mean of an individual scale item because 1.) the outcomes of the items are discrete 2.) the items are bounded by the limits of the Likert-scale. 3.) sample size for raters is too small even if the above were not problematic

6 SCORE CONFIDENCE INTERVAL
FOR RATING SCALES The Score confidence interval (Penfield, 2003) treats rating scale variables as outcomes of a binomial distribution. This interval is asymmetric Hence, it is based on the actual distribution for the item of concern. Further, the limits cannot extend below or above the actual limits of the categories.

7 1. Obtain values for n, k, and z
n = the number of raters k= the number of possible ratings The highest rating is scale starts with 0 The highest rating minus 1 if scale starts greater than 0 z = the standard normal variate associated with the confidence level (e.g., +/ at 95% confidence)

8 2. Calculate The sum of the ratings for an item divided by the number of raters

9 3. Calculate p Or if scale begins with 1 then

10 4. Use p to calculate the upper and lower limits for the population proportion (Wilson, 1927)

11 5. Calculate the upper and lower limits of the Score confidence interval

12 Shorthand Example (cont.)
Let n = 10, k = 4, z = 1.96, and let the sum of the items = 31 so, the mean equals 31/10 = 3.100 so, p = 31 / (10*4) = 0.775

13 Shorthand Example (cont.)
= – 1.96*sqrt(0.938/10) = 2.500 = *sqrt(0.421/10) = 3.507

14 = ( – ) / = 0.625 = ( ) / = 0.877

15 Conclusion We are 95% confident that the
population mean rating falls somewhere between and 3.507

16 Rating Frequency for 10 Raters
EXAMPLE WITH 4 ITEMS Rating Frequency for 10 Raters 95% Score CI Item 1 2 3 4 Mean Lower Upper 6 3.60 3.08 3.84 5 3.10 2.50 3.51 2.20 1.59 2.77 2.10 1.50 2.68


Download ppt "Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003"

Similar presentations


Ads by Google