N. Ganesh, Adrijo Chakraborty, Vicki Pineau, and J. Michael Dennis

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Linear regression models
Objectives (BPS chapter 24)
1 Confidence Interval for a Mean. 2 Given A random sample of size n from a Normal population or a non Normal population where n is sufficiently large.
Introduction to Formal Statistical Inference
Chapter 8 Estimation: Single Population
Ratio Estimation and Regression Estimation (Chapter 4, Textbook, Barnett, V., 1991) 2.1 Estimation of a population ratio:
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
AM Recitation 2/10/11.
Chapter 7 Estimation: Single Population
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
PARAMETRIC STATISTICAL INFERENCE
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Confidence Interval & Unbiased Estimator Review and Foreword.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Lecture 5 Introduction to Sampling Distributions.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Investigating the Potential of Using Non-Probability Samples Debbie Cooper, ONS.
Lecture 4 Confidence Intervals. Lecture Summary Last lecture, we talked about summary statistics and how “good” they were in estimating the parameters.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
© 2001 Prentice-Hall, Inc.Chap 8-1 BA 201 Lecture 12 Confidence Interval Estimation.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Statistics for Business and Economics 7 th Edition Chapter 7 Estimation: Single Population Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Stats Methods at IC Lecture 3: Regression.
Statistics Statistics is that field of science concerned with the collection, organization, presentation, and summarization of data, and the drawing of.
Statistical Estimation
This will help you understand the limitations of the data and the uses to which it can be put (and the confidence with which you can put it to those.
Application of the Bootstrap Estimating a Population Mean
Sampling Distributions
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Advanced Higher Statistics
Introduction to estimation: 2 cases
Chapter 7 Sampling Distributions.
Introduction to Survey Data Analysis
Ratio and regression estimation STAT262, Fall 2017
Impact evaluation: The quantitative methods with applications
Making Statistical Inferences
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Elementary Statistics: Picturing The World
Methods of Economic Investigation Lecture 12
Chapter 7 Sampling Distributions
Chapter 7 Sampling Distributions.
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Confidence Interval Estimation
OVERVIEW OF LINEAR MODELS
Chapter 6 Confidence Intervals.
Sampling and Sample Size Calculations
Chapter 7 Sampling Distributions.
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
OVERVIEW OF LINEAR MODELS
The European Statistical Training Programme (ESTP)
Chapter 7 Sampling Distributions.
Chapter: 9: Propensity scores
Chapter 3: Response models
Chapter 8: Estimating with Confidence
Parametric Methods Berlin Chen, 2005 References:
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Estimating a Population Mean:  Known
Confidence Intervals for the Mean (Large Samples)
Sampling Distributions
Chapter 6 Confidence Intervals.
Chapter 7 Sampling Distributions.
Interval Estimation Download this presentation.
SMALL AREA ESTIMATION FOR CITY STATISTICS
Presentation transcript:

Combining Probability and Non- Probability Samples Using Small Area Estimation N. Ganesh, Adrijo Chakraborty, Vicki Pineau, and J. Michael Dennis August 2017, JSM

Outline Introduction to the Problem Approaches for Combining Probability and Non-Probability Samples Investigation of Two Small Area Models Data Application using an AmeriSpeak Panel Study Summary

Introduction For cost reasons, some studies use a combination of probability and non- probability samples Cost associated with obtaining a larger probability sample and/or Cost associated with obtaining sufficient sample size for low incidence target populations Given the unknown biases associated with a non-probability sample, what method(s) are best for combining a probability sample with a non-probability sample We want more reliable estimates (hence we use the non-probability sample) But we don’t want to introduce “too much” bias

Combining Probability and Non-Probability Samples Different methods to combining Propensity based pseudo weighting methods (Elliott) Model-based methods (Elliott & Valliant, Wang et. al.) Raking / calibration approaches (Fahimi et. al., DiSogra et. al.) We investigated approaches that use small area models (Elliott & Haviland) Assume that the probability sample generates unbiased estimates Assume that the non-probability sample estimates are biased Considered two small area models Model probability sample estimates with non-probability sample estimates as covariates Bivariate model for probability and non-probability sample estimates

Model 1: Fay-Herriot Model Domains are constructed using race, age, education, gender Direct estimates 𝑦 𝑑 𝑃 from probability sample for domain d are unbiased 𝑦 𝑑 𝑃 = 𝛼 𝑑 + 𝑥 𝑑 ′ 𝛾+ 𝑣 𝑑 + 𝑒 𝑑 𝑃 Fixed effect 𝛼 𝑑 is parametrized by main effects for race, age, gender, education 𝑥 𝑑 is a vector of domain-level covariates which includes the non-probability sample estimate 𝑣 𝑑 is a domain-level random effect 𝑒 𝑑 𝑃 are the sampling errors Model-based estimates for domains are derived using a standard prediction approach National-level estimates are obtained by aggregating (by population size) the model-based domain-level estimates

Model 2: Bi-Variate Fay-Herriot Model Direct estimates 𝑦 𝑑 𝑃 from probability sample for domain d are unbiased 𝑦 𝑑 𝑃 = 𝛼 𝑑 + 𝑥 𝑑 ′ 𝛾+ 𝑣 𝑑 + 𝑒 𝑑 𝑃 Direct estimates 𝑦 𝑑 𝑁𝑃 from non-probability sample for domain d are biased 𝑦 𝑑 𝑁𝑃 = 𝛼 𝑑 + 𝛽 𝑑 + 𝑥 𝑑 ′ 𝛾+𝑣 𝑑 + 𝑒 𝑑 𝑁𝑃 Fixed effect 𝛼 𝑑 is parametrized by main effects for race, age, gender, education Bias term 𝛽 𝑑 is parametrized by main effects for race, age, gender, education 𝑥 𝑑 is a vector of domain-level covariates 𝑣 𝑑 is a domain-level random effect 𝑒 𝑑 𝑃 and 𝑒 𝑑 𝑁𝑃 are sampling/non-sampling errors National-level estimates are obtained by aggregating (by population size) the model-based domain-level estimates

Data Application: Food Allergy Study of 18+ Adults ~7,200 probability sample completes Probability sample selected from NORC’s AmeriSpeak Panel ~33,300 non-probability sample completes Non-probability sample obtained from other sample vendors Analyzed 5 measures: Ever had a food allergy Peanut allergy Milk allergy Either biological parent has a food allergy Either biological parent has an environmental allergy Constructed 48 domains: Race by Age by Education by Gender

Model 1: Residual Plots (when modeling “Ever had a Food Allergy”)

Model 1: Ratio of Standard Errors Ratio of standard errors for direct and model estimates for “ever had a food allergy” Domains are ordered based on domain sample size Median ratio of standard error across all domains is 2.1 For 34 domains, the ratio of standard error was > 1.5

Model 1: Difference between Direct & Model Estimates Difference in direct and model estimates for “ever had a food allergy” Domains are ordered based on domain sample size Mean and median difference across all domains was approximately 0

Comparison of National Level Estimates & Confidence Intervals Variable Probability Sample Non-Probability Sample Probability + Non-Probability Sample Model 1 Model 2 Ever had a food allergy 21.6 ± 1.5 28.1 23.9 ± 1.0 21.1 ± 1.0 21.3 ± 0.6 Peanut allergy 1.7 ± 0.4 5.1 2.9 ± 0.3 1.4 ± 0.2 1.4 ± 0.1 Milk allergy 4.4 ± 0.5 6.5 5.2 ± 0.4 4.0 ± 0.4 4.1 ± 0.1 Biological parent has food allergy 11.9 ± 1.1 14.7 12.9 ± 0.7 11.5 ± 0.8 11.5 ± 0.5 Biological parent has environmental allergy 28.6 ± 1.9 29.6 28.9 ± 1.2 28.2 ± 1.2 27.9 ± 0.6

Summary Small area estimation models were used to combine probability and non- probability samples Models indicated reasonable reduction in standard error, especially for domains with smaller sample sizes At the domain-level, probability sample standard errors are … ~ 2x larger than Model 1 standard errors ~ 2.9x larger than Model 2 standard errors Under model assumptions, model estimates are unbiased Potential improvements Incorporate sampling error associated with the non-probability sample estimate in Model 1 Include auxiliary data at domain-level from other national surveys

N. Ganesh nada-ganesh@norc.org