Presenter: Phillip S. Kott

Slides:

Advertisements

Similar presentations

The Multiple Regression Model.

Advertisements

Simple Logistic Regression

NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.

Analysis of Complex Survey Data Day 5, Special topics: Developing weights and imputing data.

Dr. Chris L. S. Coryn Spring 2012

Beginning the Research Design

The Simple Regression Model

Why sample? Diversity in populations Practicality and cost.

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS

Volunteer Angler Data Collection and Methods of Inference Kristen Olson University of Nebraska-Lincoln February 2,

Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.

Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.

Types of Data in FCS Survey Nominal Scale – Labels and categories (branch, farming operation) Ordinal Scale – Order and rank (expectations, future plans,

Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

Chap 1-1 Statistics for Managers Using Microsoft Excel ® 7 th Edition Chapter 1 Defining & Collecting Data Statistics for Managers Using Microsoft Excel.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting Ellen Scheib, Peter H. Siegel, and James R. Chromy.

Sampling Methods, Sample Size, and Study Power

Chapter 9: Hypothesis Tests Based on a Single Sample 1.

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.

Opinion polls and survey research. What do surveys do? Measure concepts –Provide indicators for concepts Allow us track opinion and test theories.

Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.

Measurement & Data Collection

Measurements Statistics

Sampling From Populations

Marketing Research Aaker, Kumar, Leone and Day Eleventh Edition

Understanding Sampling Distributions: Statistics as Random Variables

Peter Linde, Interviewservice Statistics Denmark

Working with the ECLS-B Datasets Weights and other issues.

Sampling Why use sampling? Terms and definitions

The Language of Sampling

Graduate School of Business Leadership

Obtaining information on non-responders: a development of the basic question approach for surveys of individuals Patten Smith (Ipsos MORI) Richard Harry.

Meeting-6 SAMPLING DESIGN

Sampling: Theory and Methods

Introduction to Survey Data Analysis

Defining and Collecting Data

Chapter Eight: Quantitative Methods

Information from Samples

Nonresponse Bias in a Nationwide Dual-Mode Survey

CHAPTER 4 Designing Studies

Comparing Groups.

Variables and Measurement (2.1)

2.1 Simple Random Sampling

Chapter 8: Weighting adjustment

CHAPTER 4 Designing Studies

Inference for Relationships

Hindsight Bias Tendency to believe, after learning an outcome, that one would have foreseen it. “I knew.

CHAPTER 4 Designing Studies

Chapter 5: Producing Data

Business Statistics: A First Course (3rd Edition)

CHAPTER 4 Designing Studies

Thinking critically with psychological science

CHAPTER 4 Designing Studies

Field procedures and non-sampling errors

CHAPTER 4 Designing Studies

Defining and Collecting Data

Comparing Two Proportions

Mean vs Median Sampling Techniques

CHAPTER 4 Designing Studies

Defining and Collecting Data

Defining and Collecting Data

Hybrid Estimates for Rare Populations: Probability Surveys Augmented with Targeted Nonprobability Samples Jill A. Dever, PhD 2019 Joint Statistical Meetings.

Presentation transcript:

A Partially Successful Attempt to Integrate a Web-Recruited Cohort into an Address-Based Sample Presenter: Phillip S. Kott Collaborators: Matthew Farrelly and Kian Kamyab with an assist from Joe McMichael : R dk [1 + exp(mkTg)]ck = Tc    Model Calibration Calibration variables variables targets INPS 2017

Overview The Oregon Marijuana Study: an address-based sample (ABS) supplemented by Facebook recruits Adjusting the ABS respondent sample for selection bias At the same time, calibrating the Facebook recruits to the Internet respondents of the ABS sample Testing whether the previous step was appropriate Creating analysis weights and delete-a-group jackknife replicate weights to do estimation Some concluding remarks INPS 2017

The Oregon Marijuana Study An ABS of one adult per Oregon household was given a 20-minute questionnaire on marijuana use and attitudes. Roughly half responded via mail, half Internet More responses were recruited via Facebook. Poor response on race and household size questions. How can we weight the result to draw inferences? (Question was not asked until after the data was collected) INPS 2017

Potential Calibration Variables Sample Size – 1,989 (mail response – 722; mail-to-web – 640; recruit – 627) Missing number of adults in household – over 800 (745 for ABS respondents) Missing race = black – over 1,300 Used to calibrate the ABS sample to the population Missing Age group (six levels) – 3 Missing Sex – 76 Missing Education (three levels) – 173 Added to calibrate recruit cohort to mail-to-web cohort In politics TODAY, do you consider yourself …. Republican, Democrat, Independent, No preference, No or invalid answer (treated as a separate level) INPS 2017

The Selection Model The probability that an Oregon adult was sampled and then responded to the ABS survey is assumed to be a logistic function of three categorical variables: age group, sex, and education level. (Better would be to assume only a probability of response, if the probabilities of selection were known) The probability that an Oregon adult was recruited into the sample via Facebook is assumed to be a logistic function of the above three categorical variables and party affiliation. The population that would respond by Internet when given the chance (represented by the mail-to-web cohort) is assumed to be the same as the population that could be recruited via Facebook. An assumption that will be tested. INPS 2017

SAS/SUDAAN Code Recruit cohort: type = 1; x = 1; z = 1; abs = 0 Mail-to-web cohort: type = 2; x = 0; z = -1; abs = 1 Mail cohort: type = 3; x = 0; z = 0; abs = 1 Proc wtadjx data = d adjust = post design = wr; weight _one_; nest _one_; lowerbd 1; var [ ….]; Class sex age edu party; * after imputing missing values; Model _ONE_ = sex*abs age*abs edu*abs sex*x age*x edu*x party*x/noint; (NOINT = no intercept) Calvars sex*abs age*abs edu*abs sex*z age*z edu*z party*z/noint; Postwgt [population totals for the categories, 16 zeroes]; Vdiffvar type (1,2); INPS 2017

SAS/SUDAAN Code design = WR (with replacement); adjust = post (outside targets); WEIGHT _ONE_ (starts with weights = 1); NEST _ONE_ (no clusters or strata); lowerbd 1 (adjustment factor never less than 1); Find the g such that: R dk [1 + exp(mkTg)] ck = Tc      Starting lowerbd Model Calvar Postwgt weight variables targets (=1) Vdiffvar type (1,2) (difference between estimated means for TYPEs) Wtfinal is the calibrated weight Inverse of bracketed term is the estimated probability of selection. INPS 2017

Variables for VAR Statement Ordered response when item response; Whether there was an item response INPS 2017

Variables for VAR Statement INPS 2017

Variables for VAR Statement INPS 2017

Variables for VAR Statement INPS 2017

Why Party Affiliation? Before Calibration Weighting: Type Type Party Affiliation Facebook Recruit Mail-to-Web Mail Total No answer 14.83 4.06 6.09 Republican 13.88 17.97 22.02 Democrat 25.52 33.75 29.78 Independent 18.82 22.81 21.47 No preference 26.95 21.41 20.64 627 640 722 1989 INPS 2017

Party Affiliation After Calibration Weighting: Type Party Affiliation Type Party Affiliation Facebook Recruit Mail-to-Web Mail Total No answer 2.88 5.25 Republican 17.62 21.59 Democrat 27.98 26.42 Independent 24.05 20.81 No preference 27.47 25.94 1531798 1579221 4642817 INPS 2017

WTADJX’s linearization variance estimator INPS 2017

Holm-Bonferroni Procedure The conservative HB procedure is not only a overall multiple comparison test but also assesses each individual comparison. Sort the 20 (or 40) differences by their p-values. For HB20_.1 (as an example): Difference with lowest p-value out of 20 is significant at .1 level if p-value is less than HB20_.1 critical value (.1/20). Difference with second lowest p-value is significant at .1 level if p-value is less than HB20.1 critical value (.1/19). Continue until first not-significant difference. INPS 2017

Smallest p Values vs Critical Holm-Bonferroni Values VARIABLE Estimated difference p value HB40_.1 HB20_.05 HB20_.1 HB40_.05 More DUI? 0.11 0.00247 0.00250 0.00500 0.001000 Edible MJ in public? -0.23 0.00371 0.00256 0.00526 0.001026 How legal? 0.00658 0.00263 0.00556 0.001053 Adult frequency? -0.13 0.01619 0.00270 0.00588 0.001081 Is edible MJ safer? -0.17 0.02260 0.00278 0.00625 0.001111 Guest use in home? -0.18 0.04079 0.00286 0.00667 0.001143 Is vaping safer? 0.10 0.05260 0.00294 0.00714 0.001176 More teenage use? 0.12 0.08722 0.00303 0.00769 0.001212 Response to vaping Q 0.05 0.09704 0.00313 0.00833 0.001250 INPS 2017

Jackknife Weights (from Kott 2006) Randomly sort ABS and recruit respondent samples. Systematically assign respondents to one of 30 jackknife groups. Create the rth set of jackknife replicate weights by setting the replicate weights of respondents in the rth group to zero and multiply the calibrated weight for respondents outside the group by 30/29. Recalibrate each replicate without a lowerbd. Scale the calibrated and jackknife weights assigned to mail- to-web (by .65) and recruit (by .35) cohorts to eliminate double counting. INPS 2017

Standard-Error Results (ignoring fpc) Computing the standard errors of the 40 differences with jackknife weights (and diffvar) rather than through WTADJX increased SE measures by 4.8% on average (log(SEJK/SEWTADJX)); 6.0% median, interquartile range from 1.0% to 12.1%. This is consistent with theory (linearization tends to underestimate calibrated estimates’ SEs; replication to overestimate) Incorporating the recruit cohort into the ABS sample decreased SEs by 8.6% on average (comparing jackknife SE to jackknife SE); 7.5% median, interquartile range from 4.2% to 12.1%. Using a more traditional jackknife (which is more likely to fail to calibrate) returns nearly the same results. INPS 2017

Some Concluding Remarks Think about analysis before data are collected. Using nonprobability samples relies on assumptions, which need to be clearly stated and tested when possible. Selection modeling is analogous to nonresponse modeling. An estimated difference not being statistically significant does not mean the actual difference is 0. When appropriately calibrated (using WTADJX or an equivalent program in R) the decrease in SE from adding nonprobability samples is less than the sample-size increase implies. INPS 2017

Useful References Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 65–70. Kott, P. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology, 133–142. RTI International (2012). SUDAAN Language Manual, Release 11.0. Research Triangle Park, NC: RTI International. Singh, A., Dever, J., and Iannacchione, V. (2004). Efficient estimation for surveys with nonresponse follow-up using dual-frame calibration. Proceedings of the American Statistical Association, Section on Survey Research Methods, 3919–3930. Tille, Y. and Matei, A., (2013). Package ‘Sampling.’ A software routine available online at http://cran.r-project.org/web/packages/ sampling/sampling.pdf (procedure: gencalib). INPS 2017