Introduction to secondary analysis of complex survey data

Slides:

Advertisements

Similar presentations

Calculation of Sampling Errors MICS3 Regional Workshop on Data Archiving and Dissemination Alexandria, Egypt 3-7 March, 2007.

Advertisements

Calculation of Sampling Errors MICS3 Data Analysis and Report Writing Workshop.

9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.

Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.

Multiple Indicator Cluster Surveys Survey Design Workshop

Latent Growth Curve Modeling In Mplus:

Dr. Chris L. S. Coryn Spring 2012

Who and How And How to Mess It up

A new sampling method: stratified sampling

Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.

Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.

18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.

Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.

Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.

Lohr 2.2 a) Unit 1 is included in samples 1 and 3.  1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4.  2 is therefore 1/4 + 3/8 =

1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.

A Comparison of Variance Estimates for Schools and Students Using Taylor Series and Replicate Weighting Ellen Scheib, Peter H. Siegel, and James R. Chromy.

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.

Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,

Introduction to Survey Sampling

PEP-PMMA Training Session Sampling design Lima, Peru Abdelkrim Araar / Jean-Yves Duclos 9-10 June 2007.

Analysis of Experiments

ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.

1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago

Arun Srivastava. Variance Estimation in Complex Surveys Linearization (Taylor’s series) Random Group Methods Balanced Repeated Replication (BRR) Re-sampling.

Replication methods for analysis of complex survey data in Stata Nicholas Winter Cornell University

Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.

Sampling Concepts Nursing Research. Population  Population the group you are ultimately interested in knowing more about “entire aggregation of cases.

PHIA Surveys: Sample Designs and Estimation Procedures Graham Kalton Westat.

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.

AC 1.2 present the survey methodology and sampling frame used

Types of Samples Dr. Sa’ed H. Zyoud.

Dr. Unnikrishnan P.C. Professor, EEE

Peter Linde, Interviewservice Statistics Denmark

Working with the ECLS-B Datasets Weights and other issues.

Chapter 12 Sample Surveys

Sampling Why use sampling? Terms and definitions

Social Research Methods

Sampling Designs and Sampling Procedures

Editor – International Studies of Management and Organization

Graduate School of Business Leadership

Analysis of Complex Sample Data

Sampling: Design and Procedures

Social Research Methods

Introduction to Survey Data Analysis

Presented: 2009 Canadian Users Stata Group Meeting

Basic Sampling Issues.

Power, Sample Size, & Effect Size:

STRATIFIED SAMPLING.

Chapter 7 Sampling Distributions

Variables and Measurement (2.1)

Sampling Lecture 10.

Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka

Melanie Dove, MPH, ScD UC Davis

Chapter 8: Weighting adjustment

Secondary Data Analysis Lec 10

Section 5.1 Designing Samples

Sampling and Power Slides by Jishnu Das.

BUSINESS MARKET RESEARCH

New Techniques and Technologies for Statistics 2017 Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.

The European Statistical Training Programme (ESTP)

Social Research Methods

Presentation transcript:

Introduction to secondary analysis of complex survey data Brandon Nakawaki, PhD November 8, 2017

Introduction Me What is secondary analysis What is “complex” survey data

Outline Secondary analysis Sampling concepts Use with statistical programs Goal: Basic understanding of typical secondary datasets, why weighting must be used with complex survey data, and how to use those weights. Why this talk?

Secondary analysis Advantages Cost efficient Often representative Provides a potentially useful comparison Sometimes it’s the only data source Lots of variables Large unweighted sample size Disadvantages Measures not ideal Statistical background required

Pooling datasets Cross-sectional designs Further boost stability of sample Different or adjusted weights Measures and major design elements must not change

Where to get data Interuniversity Consortium for Political and Social Research (ICPSR) www.icpsr.umich.edu Simple Online Data Archive for Population Studies (SODAPOP) http://sodapop.pop.psu.edu National Data Archive on Child Abuse and Neglect (NDACAN) Institutional websites (government, college, contractor, research) Project websites Other locations

Image adapted from Koziol & Arthur (2011) Sampling Simple random sampling (SRS) Equal probability of selection Observations are independent and identical in distribution Basis for most statistics Near SRS Image adapted from Koziol & Arthur (2011)

Image adapted from Koziol & Arthur (2011) Sampling Probability samples Stratification Divide into groups, randomly sample within those groups Ensures a “good” sample, increases precision Image adapted from Koziol & Arthur (2011)

Image adapted from Koziol & Arthur (2011) Sampling Probability samples Clustering Randomly sample entire groups Convenient, decreases precision Image adapted from Koziol & Arthur (2011)

Sampling Probability samples Multiple stages of selection With or without replacement Poststratification May be based on available accurate population totals (e.g., age, sex, geographic region) Strong correlates of key survey variables Predictors of noncoverage Oversampling May have fewer or more weighting options (strata, psu, weights, replicates, etc.)

What to look for in documentation Generalizability Missing data Weights and design variables “Weight” “Stratum” “Strata” “Cluster” “Primary sampling unit (PSU)” Variance estimation method “Linearization” “Taylorization” “Replicate” “Faye”

Common weights Sampling weights Make the marginals look like the population from which they were drawn Often include poststratification adjustment (e.g., person-level nonresponse)

NSDUH 2012 Age Unweighted 12 2,798 13 2,757 14 2,792 15 2,956 16 3,058 17 3,038 Total 17,399 Weighted 4,054,868 4,049,003 4,156,730 4,097,288 4,293,852 4,281,311 24,933,052 2010 Census 25,296,465

Common weights Design weights/variables PSU Strata Replicate weights

Variance estimation in complex surveys Again, not SRS – must account for the sampling design with weights Standard errors usually change with weights Point estimates may also change

Weighted example (n=17,062)

Unweighted example (n=17,062)

Especially notable differences

Variance estimation in complex surveys Again, not SRS – must account for the sampling design with weights Three common methods Taylor series linearization Replicate weights Model-based estimation

Variance estimation in complex surveys Taylor series linearization Uses at least one clustering variable (PSU) and at least one stratification variable Replicate weights Uses many replicate weights, usually numbered sequentially (e.g., weight01-weight100) Model-based estimation Uses clustering and stratification variables in multilevel modeling

Variance estimation with linearization Taylor series linearization Typically has stratum variable, cluster (PSU) variable, one sampling weight to use at a time Sometimes multiple sampling weights are available for use under different circumstances If stratum, cluster variables are available, assume Taylor linearization (or check with curator) Do not assume if only sampling weight available Subpopulation indicator needed

What is a subpopulation indicator? Subpopulation – interested in a specific subgroup of your sample e.g., adolescents 12-17 years of age in a general population study Indicator – binary variable coded so that 0 = do not include in analysis 1 = include in analysis If you are looking at cigarette smokers aged 12-17, code so that 1 = everyone aged 12-17 who smokes cigarettes 0 = everyone 18+ 0 = 12-17 year olds who do not smoke cigarettes

Variance estimation in complex surveys Taylor series linearization Subpopulation indicator necessary Replicate weights Subpopulation indicator not used Model-based estimation

Software for complex survey analysis Stata Mplus R (packages ‘survey,’ ‘lavaan.survey’) SAS SUDAAN LISREL EQS WesVar SPSS with Complex Samples module (Taylor linearization only)

Software for NOT for complex survey analysis AMOS HLM

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

Finite population correction

Finite population correction

SPSS Taylor Linearization Example Skip unless n/N = .05 or more

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example /*Setting up sampling plan*/ * Analysis Preparation Wizard. CSPLAN ANALYSIS /PLAN FILE=‘location\example plan.csaplan' /PLANVARS ANALYSISWEIGHT=weight /SRSESTIMATOR TYPE=WR /PRINT PLAN /DESIGN STRATA=stratavar CLUSTER=clustervar /ESTIMATOR TYPE=WR.

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

SPSS Taylor Linearization Example

Don’t use Select Cases

Weighted SPSS Example (n=17,062)

Unweighted SPSS Example (n=17,062)

Especially notable differences

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Stata Taylor linearization example

Mplus Taylor linearization example

Mplus Taylor linearization example

Variance estimation with replicates Jackknife repeated replicates (jkn; jrr; jrrw) Three types of jackknifed replication Certain types may require the application of a multiplier file Pay attention to documentation! Balanced repeated replicates (brr; brr-Fay; Fay) If a Fay’s adjustment is needed, documentation should say Fay’s input depends partly on program used Typically dozens of replicate weights Subpopulation indicator can be used, but not necessary

Stata replicates example

Stata replicates example

Stata replicates example If indicated by the documentation

Some syntax adapted from Koziol & Arthur (2011) Basic Stata syntax /*Taylor series linearization*/ svyset [pweight=wtvar], psu(clustervar) strata(stratavar) vce(linearized) /*Jackknife replicates*/ svyset [pweight=wtvar], jkrw(repwt1-repwtn) vce(jack) mse Note: Jackknife syntax varies by type (jk1, jk2, jkn). Additional syntax may be needed if more than 1 stratum per PSU. Stata cannot accommodate different numbers of strata in different PSUs. /*Balanced repeated replicates*/ svyset [pweight=wtvar], brrweight(repwt1-repwtn) vce(brr) mse Note: Additional syntax may be needed if documentation specifies that a Fay’s adjustment needs to be applied. Some syntax adapted from Koziol & Arthur (2011)

Some syntax adapted from Koziol & Arthur (2011) Basic Mplus syntax /*Taylor series linearization*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE stratavar clustervar weightvar outcome predictors and covariates; MISSING ARE ALL (missingdatacode); SUBPOPULATION IS (indicat eq 1); only if subpopulation analysis WEIGHT = weightvar; STRATIFICATION = stratavar; CLUSTER = clustervar; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

Some syntax adapted from Koziol & Arthur (2011) Basic Mplus syntax /*Replicate weights*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE weightvar repweight1-repweightn outcome predictors and covariates; MISSING ARE ALL (missingdatacode); WEIGHT=weightvar; REPWEIGHTS=repwt1-repwtn; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types REPSE=JACKKNIFE1; substitute with other replicate type as needed OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

SAS

SUDAAN

WesVar

WesVar

WesVar

R

Additional introductory resources Heeringa, S. G., West, B. T., & Berglund, P. A. (2017). Applied Survey Data Analysis (2nd Ed.). Boca Raton, FL: Chapman & Hall/CRC. Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: John Wiley & Sons, Inc. Less novice friendly Lohr, S. L. (2009). Sampling: Design and Analysis (2nd ed.). Boston, MA: Brooks/Cole, Cengage Learning.