Introduction to secondary analysis of complex survey data

Introduction to secondary analysis of complex survey data
Brandon Nakawaki, PhD November 8, 2017

Introduction Me What is secondary analysis
What is “complex” survey data

Outline Secondary analysis Sampling concepts
Use with statistical programs Goal: Basic understanding of typical secondary datasets, why weighting must be used with complex survey data, and how to use those weights. Why this talk?

Secondary analysis Advantages Cost efficient Often representative
Provides a potentially useful comparison Sometimes it’s the only data source Lots of variables Large unweighted sample size Disadvantages Measures not ideal Statistical background required

Pooling datasets Cross-sectional designs
Further boost stability of sample Different or adjusted weights Measures and major design elements must not change

Where to get data Interuniversity Consortium for Political and Social Research (ICPSR) Simple Online Data Archive for Population Studies (SODAPOP) National Data Archive on Child Abuse and Neglect (NDACAN) Institutional websites (government, college, contractor, research) Project websites Other locations

Image adapted from Koziol & Arthur (2011)
Sampling Simple random sampling (SRS) Equal probability of selection Observations are independent and identical in distribution Basis for most statistics Near SRS Image adapted from Koziol & Arthur (2011)

Sampling Probability samples Stratification Divide into groups, randomly sample within those groups Ensures a “good” sample, increases precision Image adapted from Koziol & Arthur (2011)

Sampling Probability samples Clustering Randomly sample entire groups Convenient, decreases precision Image adapted from Koziol & Arthur (2011)

Sampling Probability samples
Multiple stages of selection With or without replacement Poststratification May be based on available accurate population totals (e.g., age, sex, geographic region) Strong correlates of key survey variables Predictors of noncoverage Oversampling May have fewer or more weighting options (strata, psu, weights, replicates, etc.)

What to look for in documentation
Generalizability Missing data Weights and design variables “Weight” “Stratum” “Strata” “Cluster” “Primary sampling unit (PSU)” Variance estimation method “Linearization” “Taylorization” “Replicate” “Faye”

Common weights Sampling weights
Make the marginals look like the population from which they were drawn Often include poststratification adjustment (e.g., person-level nonresponse)

NSDUH 2012 Age Unweighted 12 2, , , , , ,038 Total 17,399 Weighted 4,054,868 4,049,003 4,156,730 4,097,288 4,293,852 4,281,311 24,933,052 2010 Census 25,296,465

Common weights Design weights/variables PSU Strata Replicate weights

Variance estimation in complex surveys
Again, not SRS – must account for the sampling design with weights Standard errors usually change with weights Point estimates may also change

Weighted example (n=17,062)

Unweighted example (n=17,062)

Especially notable differences

Again, not SRS – must account for the sampling design with weights Three common methods Taylor series linearization Replicate weights Model-based estimation

Taylor series linearization Uses at least one clustering variable (PSU) and at least one stratification variable Replicate weights Uses many replicate weights, usually numbered sequentially (e.g., weight01-weight100) Model-based estimation Uses clustering and stratification variables in multilevel modeling

Variance estimation with linearization
Taylor series linearization Typically has stratum variable, cluster (PSU) variable, one sampling weight to use at a time Sometimes multiple sampling weights are available for use under different circumstances If stratum, cluster variables are available, assume Taylor linearization (or check with curator) Do not assume if only sampling weight available Subpopulation indicator needed

What is a subpopulation indicator?
Subpopulation – interested in a specific subgroup of your sample e.g., adolescents years of age in a general population study Indicator – binary variable coded so that 0 = do not include in analysis 1 = include in analysis If you are looking at cigarette smokers aged 12-17, code so that 1 = everyone aged who smokes cigarettes 0 = everyone 18+ 0 = year olds who do not smoke cigarettes

Taylor series linearization Subpopulation indicator necessary Replicate weights Subpopulation indicator not used Model-based estimation

Software for complex survey analysis
Stata Mplus R (packages ‘survey,’ ‘lavaan.survey’) SAS SUDAAN LISREL EQS WesVar SPSS with Complex Samples module (Taylor linearization only)

Software for NOT for complex survey analysis
AMOS HLM

SPSS Taylor Linearization Example

Finite population correction

Skip unless n/N = .05 or more

/*Setting up sampling plan*/ * Analysis Preparation Wizard. CSPLAN ANALYSIS /PLAN FILE=‘location\example plan.csaplan' /PLANVARS ANALYSISWEIGHT=weight /SRSESTIMATOR TYPE=WR /PRINT PLAN /DESIGN STRATA=stratavar CLUSTER=clustervar /ESTIMATOR TYPE=WR.

Don’t use Select Cases

Weighted SPSS Example (n=17,062)

Unweighted SPSS Example (n=17,062)

Especially notable differences

Stata Taylor linearization example

Mplus Taylor linearization example

Variance estimation with replicates
Jackknife repeated replicates (jkn; jrr; jrrw) Three types of jackknifed replication Certain types may require the application of a multiplier file Pay attention to documentation! Balanced repeated replicates (brr; brr-Fay; Fay) If a Fay’s adjustment is needed, documentation should say Fay’s input depends partly on program used Typically dozens of replicate weights Subpopulation indicator can be used, but not necessary

Stata replicates example

Stata replicates example
If indicated by the documentation

Some syntax adapted from Koziol & Arthur (2011)
Basic Stata syntax /*Taylor series linearization*/ svyset [pweight=wtvar], psu(clustervar) strata(stratavar) vce(linearized) /*Jackknife replicates*/ svyset [pweight=wtvar], jkrw(repwt1-repwtn) vce(jack) mse Note: Jackknife syntax varies by type (jk1, jk2, jkn). Additional syntax may be needed if more than 1 stratum per PSU. Stata cannot accommodate different numbers of strata in different PSUs. /*Balanced repeated replicates*/ svyset [pweight=wtvar], brrweight(repwt1-repwtn) vce(brr) mse Note: Additional syntax may be needed if documentation specifies that a Fay’s adjustment needs to be applied. Some syntax adapted from Koziol & Arthur (2011)

Basic Mplus syntax /*Taylor series linearization*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE stratavar clustervar weightvar outcome predictors and covariates; MISSING ARE ALL (missingdatacode); SUBPOPULATION IS (indicat eq 1); only if subpopulation analysis WEIGHT = weightvar; STRATIFICATION = stratavar; CLUSTER = clustervar; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

Basic Mplus syntax /*Replicate weights*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE weightvar repweight1-repweightn outcome predictors and covariates; MISSING ARE ALL (missingdatacode); WEIGHT=weightvar; REPWEIGHTS=repwt1-repwtn; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types REPSE=JACKKNIFE1; substitute with other replicate type as needed OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

SUDAAN

WesVar

Additional introductory resources
Heeringa, S. G., West, B. T., & Berglund, P. A. (2017). Applied Survey Data Analysis (2nd Ed.). Boca Raton, FL: Chapman & Hall/CRC. Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: John Wiley & Sons, Inc. Less novice friendly Lohr, S. L. (2009). Sampling: Design and Analysis (2nd ed.). Boston, MA: Brooks/Cole, Cengage Learning.

Introduction to secondary analysis of complex survey data

Similar presentations

Presentation on theme: "Introduction to secondary analysis of complex survey data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to secondary analysis of complex survey data

Similar presentations

Presentation on theme: "Introduction to secondary analysis of complex survey data"— Presentation transcript:

Similar presentations

About project

Feedback