Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to secondary analysis of complex survey data

Similar presentations


Presentation on theme: "Introduction to secondary analysis of complex survey data"— Presentation transcript:

1 Introduction to secondary analysis of complex survey data
Brandon Nakawaki, PhD November 8, 2017

2 Introduction Me What is secondary analysis
What is “complex” survey data

3 Outline Secondary analysis Sampling concepts
Use with statistical programs Goal: Basic understanding of typical secondary datasets, why weighting must be used with complex survey data, and how to use those weights. Why this talk?

4 Secondary analysis Advantages Cost efficient Often representative
Provides a potentially useful comparison Sometimes it’s the only data source Lots of variables Large unweighted sample size Disadvantages Measures not ideal Statistical background required

5 Pooling datasets Cross-sectional designs
Further boost stability of sample Different or adjusted weights Measures and major design elements must not change

6 Where to get data Interuniversity Consortium for Political and Social Research (ICPSR) Simple Online Data Archive for Population Studies (SODAPOP) National Data Archive on Child Abuse and Neglect (NDACAN) Institutional websites (government, college, contractor, research) Project websites Other locations

7 Image adapted from Koziol & Arthur (2011)
Sampling Simple random sampling (SRS) Equal probability of selection Observations are independent and identical in distribution Basis for most statistics Near SRS Image adapted from Koziol & Arthur (2011)

8 Image adapted from Koziol & Arthur (2011)
Sampling Probability samples Stratification Divide into groups, randomly sample within those groups Ensures a “good” sample, increases precision Image adapted from Koziol & Arthur (2011)

9 Image adapted from Koziol & Arthur (2011)
Sampling Probability samples Clustering Randomly sample entire groups Convenient, decreases precision Image adapted from Koziol & Arthur (2011)

10 Sampling Probability samples
Multiple stages of selection With or without replacement Poststratification May be based on available accurate population totals (e.g., age, sex, geographic region) Strong correlates of key survey variables Predictors of noncoverage Oversampling May have fewer or more weighting options (strata, psu, weights, replicates, etc.)

11 What to look for in documentation
Generalizability Missing data Weights and design variables “Weight” “Stratum” “Strata” “Cluster” “Primary sampling unit (PSU)” Variance estimation method “Linearization” “Taylorization” “Replicate” “Faye”

12 Common weights Sampling weights
Make the marginals look like the population from which they were drawn Often include poststratification adjustment (e.g., person-level nonresponse)

13 NSDUH 2012 Age Unweighted 12 2, , , , , ,038 Total 17,399 Weighted 4,054,868 4,049,003 4,156,730 4,097,288 4,293,852 4,281,311 24,933,052 2010 Census 25,296,465

14 Common weights Design weights/variables PSU Strata Replicate weights

15 Variance estimation in complex surveys
Again, not SRS – must account for the sampling design with weights Standard errors usually change with weights Point estimates may also change

16 Weighted example (n=17,062)

17 Unweighted example (n=17,062)

18 Especially notable differences

19 Variance estimation in complex surveys
Again, not SRS – must account for the sampling design with weights Three common methods Taylor series linearization Replicate weights Model-based estimation

20 Variance estimation in complex surveys
Taylor series linearization Uses at least one clustering variable (PSU) and at least one stratification variable Replicate weights Uses many replicate weights, usually numbered sequentially (e.g., weight01-weight100) Model-based estimation Uses clustering and stratification variables in multilevel modeling

21 Variance estimation with linearization
Taylor series linearization Typically has stratum variable, cluster (PSU) variable, one sampling weight to use at a time Sometimes multiple sampling weights are available for use under different circumstances If stratum, cluster variables are available, assume Taylor linearization (or check with curator) Do not assume if only sampling weight available Subpopulation indicator needed

22 What is a subpopulation indicator?
Subpopulation – interested in a specific subgroup of your sample e.g., adolescents years of age in a general population study Indicator – binary variable coded so that 0 = do not include in analysis 1 = include in analysis If you are looking at cigarette smokers aged 12-17, code so that 1 = everyone aged who smokes cigarettes 0 = everyone 18+ 0 = year olds who do not smoke cigarettes

23 Variance estimation in complex surveys
Taylor series linearization Subpopulation indicator necessary Replicate weights Subpopulation indicator not used Model-based estimation

24 Software for complex survey analysis
Stata Mplus R (packages ‘survey,’ ‘lavaan.survey’) SAS SUDAAN LISREL EQS WesVar SPSS with Complex Samples module (Taylor linearization only)

25 Software for NOT for complex survey analysis
AMOS HLM

26 SPSS Taylor Linearization Example

27 SPSS Taylor Linearization Example

28 SPSS Taylor Linearization Example

29 SPSS Taylor Linearization Example

30 SPSS Taylor Linearization Example

31 Finite population correction

32 Finite population correction

33 SPSS Taylor Linearization Example
Skip unless n/N = .05 or more

34 SPSS Taylor Linearization Example

35 SPSS Taylor Linearization Example
/*Setting up sampling plan*/ * Analysis Preparation Wizard. CSPLAN ANALYSIS /PLAN FILE=‘location\example plan.csaplan' /PLANVARS ANALYSISWEIGHT=weight /SRSESTIMATOR TYPE=WR /PRINT PLAN /DESIGN STRATA=stratavar CLUSTER=clustervar /ESTIMATOR TYPE=WR.

36 SPSS Taylor Linearization Example

37 SPSS Taylor Linearization Example

38 SPSS Taylor Linearization Example

39 SPSS Taylor Linearization Example

40 Don’t use Select Cases

41 Weighted SPSS Example (n=17,062)

42 Unweighted SPSS Example (n=17,062)

43 Especially notable differences

44 Stata Taylor linearization example

45 Stata Taylor linearization example

46 Stata Taylor linearization example

47 Stata Taylor linearization example

48 Stata Taylor linearization example

49 Stata Taylor linearization example

50 Stata Taylor linearization example

51 Stata Taylor linearization example

52 Stata Taylor linearization example

53 Mplus Taylor linearization example

54 Mplus Taylor linearization example

55 Variance estimation with replicates
Jackknife repeated replicates (jkn; jrr; jrrw) Three types of jackknifed replication Certain types may require the application of a multiplier file Pay attention to documentation! Balanced repeated replicates (brr; brr-Fay; Fay) If a Fay’s adjustment is needed, documentation should say Fay’s input depends partly on program used Typically dozens of replicate weights Subpopulation indicator can be used, but not necessary

56 Stata replicates example

57 Stata replicates example

58 Stata replicates example
If indicated by the documentation

59 Some syntax adapted from Koziol & Arthur (2011)
Basic Stata syntax /*Taylor series linearization*/ svyset [pweight=wtvar], psu(clustervar) strata(stratavar) vce(linearized) /*Jackknife replicates*/ svyset [pweight=wtvar], jkrw(repwt1-repwtn) vce(jack) mse Note: Jackknife syntax varies by type (jk1, jk2, jkn). Additional syntax may be needed if more than 1 stratum per PSU. Stata cannot accommodate different numbers of strata in different PSUs. /*Balanced repeated replicates*/ svyset [pweight=wtvar], brrweight(repwt1-repwtn) vce(brr) mse Note: Additional syntax may be needed if documentation specifies that a Fay’s adjustment needs to be applied. Some syntax adapted from Koziol & Arthur (2011)

60 Some syntax adapted from Koziol & Arthur (2011)
Basic Mplus syntax /*Taylor series linearization*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE stratavar clustervar weightvar outcome predictors and covariates; MISSING ARE ALL (missingdatacode); SUBPOPULATION IS (indicat eq 1); only if subpopulation analysis WEIGHT = weightvar; STRATIFICATION = stratavar; CLUSTER = clustervar; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

61 Some syntax adapted from Koziol & Arthur (2011)
Basic Mplus syntax /*Replicate weights*/ DATA: FILE=“filepath\filename.csv”; ANALYSIS: NAMES ARE all variable names here in order of appearance in dataset; USEVARIABLES ARE weightvar repweight1-repweightn outcome predictors and covariates; MISSING ARE ALL (missingdatacode); WEIGHT=weightvar; REPWEIGHTS=repwt1-repwtn; ANALYSIS: TYPE=COMPLEX; can be combined with other analysis types REPSE=JACKKNIFE1; substitute with other replicate type as needed OUTCOME: outcome ON predictor; Some syntax adapted from Koziol & Arthur (2011)

62 SAS

63 SUDAAN

64 WesVar

65 WesVar

66 WesVar

67 R

68 Additional introductory resources
Heeringa, S. G., West, B. T., & Berglund, P. A. (2017). Applied Survey Data Analysis (2nd Ed.). Boca Raton, FL: Chapman & Hall/CRC. Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R. Hoboken, NJ: John Wiley & Sons, Inc. Less novice friendly Lohr, S. L. (2009). Sampling: Design and Analysis (2nd ed.). Boston, MA: Brooks/Cole, Cengage Learning.


Download ppt "Introduction to secondary analysis of complex survey data"

Similar presentations


Ads by Google