Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004.

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Statistics for Business and Economics
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
© 2010 Pearson Prentice Hall. All rights reserved Two Sample Hypothesis Testing for Means from Independent Groups.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Fall 2006 – Fundamentals of Business Statistics 1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 7 Estimating Population Values.
Central Tendency.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview Central Limit Theorem The Normal Distribution The Standardised Normal.
Statistics Lecture 22. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Chapter 13: Inference in Regression
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Basic Business Statistics 11 th Edition.
Confidence Interval Estimation
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Lecture 14 Sections 7.1 – 7.2 Objectives:
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Estimating the Value of a Parameter Using Confidence Intervals
Reliability & Validity
CONTENT VALIDITY Jeffrey M. Miller November, 2003.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
STATISTICS. STATISTICS The numerical records of any event or phenomena are referred to as statistics. The data are the details in the numerical records.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Measurement MANA 4328 Dr. Jeanne Michalski
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Confidence Intervals for a Population Mean, Standard Deviation Unknown.
Confidence Interval Estimation For statistical inference in decision making: Chapter 9.
Lesoon Statistics for Management Confidence Interval Estimation.
Confidence Intervals for a Population Proportion Excel.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
A QUANTITATIVE RESEARCH PROJECT -
Chapter Nine Hypothesis Testing.
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Parameter Estimation.
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Sampling Distributions and Estimation
Evaluation of measuring tools: validity
Chapter 8: Inference for Proportions
Week 10 Chapter 16. Confidence Intervals for Proportions
Jeffrey M. Miller & Randall D. Penfield FERA November 19, 2003
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Confidence Interval Estimation and Statistical Inference
CONCEPTS OF ESTIMATION
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Chapter 12 Inference for Proportions
Presentation transcript:

Improving Content Validity: A Confidence Interval for Small Sample Expert Agreement Jeffrey M. Miller & Randall D. Penfield NCME, San Diego April 13, 2004 University of Florida &

“Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999) Content validity refers to the degree to which the content of the items reflects the content domain of interest (APA, 1954) INTRODUCING CONTENT VALIDITY

Content is a precursor to drawing a score-based inference. It is evidence-in-waiting (Shepard, 1993; Yalow & Popham, 1983) “Unfortunately, in many technical manuals, content representation is dealt with in a paragraph, indicating that selected panels of subject matter experts (SMEs) reviewed the test content, or mapped the items to the content standards…(Crocker, 2003)” THE NEED FOR IMPROVED REPORTING

Several indices for quantifying expert agreement have been proposed The mean rating across raters is often used in calculations However, the mean alone does not provide information regarding its proximity to the unknown population mean. We need a usable inferential procedure go gain insight into the accuracy of the sample mean as an estimate of the population mean. QUANTIFYING CONTENT VALIDITY

A simple method is to calculate the traditional Wald confidence interval However, this interval is inappropriate for rating scales. THE CONFIDENCE INTERVAL 1.Too few raters and response categories to assume population normality has not been violated. 2.No reason to believe the distribution should be normal. 3.The rating scale is bounded with categories that are discrete.

Penfield (2003) demonstrated that the Score method outperformed the Wald interval especially when The number of raters was small (e.g., ≤ 10) The number of categories was small (e.g., ≤ 5) AN ALTERNATIVE IS THE Furthermore, this interval is asymmetric It is based on the actual distribution for the mean rating of concern. Further, the limits cannot extend below or above the actual limits of the categories. SCORE CONFIDENCE INTERVAL FOR RATING SCALES

1. Obtain values for n, k, and z n = the number of raters K = the highest possible rating z = the standard normal variate associated with the confidence level (e.g., +/ at 95% confidence) STEPS TO CALCULATING THE SCORE CONFIDENCE INTERVAL

2. Calculate the mean item rating The sum of the ratings for an item divided by the number of raters

3. Calculate p p = Or if scale begins with 1 then p =

4. Use p to calculate the upper and lower limits for a confidence interval for population proportion (Wilson, 1927)

5. Calculate the upper and lower limits of the Score confidence interval for the population mean rating

Shorthand Example Item: 3 + ? = 8 The content of this item represents the ability to add single-digit numbers Strongly Disagree Disagree Agree Strongly Agree Suppose the expert review session includes 10 raters. The responses are 3, 3, 3, 3, 3, 3, 3, 3, 3, 4

Shorthand Example n = 10 k = 4 z = 1.96 the sum of the items = 31 = 31/10 = 3.10 p = so, p = 31 / (10*4) = 0.775

Shorthand Example (cont.) = ( – ) / = = ( ) / = 0.877

Shorthand Example (cont.) = – 1.96*sqrt(0.938/10) = = *sqrt(0.421/10) = 3.507

We are 95% confident that the population mean rating falls somewhere between and 3.507

Content Validation 1.Method 1: Retain only items with a Score interval of a particular width based on a.A priori determination of appropriateness b.An empirical standard (25 th and 75 th percentiles of all widths) 2. Method 2: Retain items based on hypothesis test that the lower limit is above a particular value

EXAMPLE WITH 4 ITEMS Rating Frequency for 10 Raters 95% Score CI Item01234MeanLowerUpper

Conclusions 1.Score method provides a confidence interval that is not dependent on the normality assumption 2.Outperforms the Wald interval when the number of raters and scale categories is small 3.Provides a decision-making method for the fate of items in expert review sessions. 4.Computational complexity can be eased through simple programming in Excel, SPSS, and SAS

For further reading, Penfield, R. D. (2003). A score method for constructing asymmetric confidence intervals for the mean of a rating scale item. Psychological Methods, 8, Penfield, R. D., & Miller, J. M. (in press). Improving content validation studies using an asymmetric confidence interval for the mean of expert ratings. Applied Measurement in Education.