Download presentation
Presentation is loading. Please wait.
1
Melanie Dove, MPH, ScD UC Davis
Katherine Heck, MPH UC San Francisco QSCERT-PC Postdoc, UCD Surveys: National Health and Nutrition Examination Survey (NHANES), California Health Interview Survey (CHIS) Previously: California Department of Public Health, CDC/NCHS Research analyst, UCSF Surveys: California Maternal and Infant Health Assessment (MIHA) survey, Listening to Mothers-CA Previously: California Department of Public Health, CDC/NCHS
2
Survey data analysis made easy with SAS
Melanie Dove, MPH, ScD UC Davis Katherine Heck, MPH UC San Francisco
3
Overview Background Survey design factors (weight and variance)
How to analyze the data Spend some time on the sampling methods and survey design factors because you need to know a little about the particular survey you’re using in order to analyze the data. We plan to use the California Health Interview Survey and the National Health and Nutrition Examination Surveys as examples.
4
Surveys Representativeness: Using a sample of individuals to represent a population
5
Survey data Different types: Cross sectional Data collection methods
Health, economic, marketing, sociology, psychology Cross sectional Data collection methods In person, phone, mail, online
6
California Health Interview Survey (CHIS)
Health survey that represents California California’s population: 39,809,693 (1/1/2018) Does anyone know the population of California? State of California, Department of Finance, E-1 Population Estimates for Cities, Counties and the State with Annual Percent Change — January 1, 2017 and Sacramento, California, May 2018.
7
Sampling Convenience Simple random Stratified
8
Sampling Cluster Stratification
within specified groups or geographic areas sometimes called primary sampling units (PSUs) Stratification select a specified number of individuals from a particular population group can be used for oversampling Cluster example – schools within a district Stratification example – age group or county Define oversampling?
9
Stratified Cluster Which one is the stratified sampling scheme?
Stratified Cluster
10
Variance Individuals within clusters are similar
Overestimate variance – significance Need to account for the sample design if any stratification, clustering, or weighting was used If not, result will be incorrect standard errors and confidence intervals SAS survey procedures allow you to indicate design variables in the syntax Will correct for these effects For example, if counties were used as a cluster, individuals within the same county will be similar to each other.
11
Weighting Weight: a value indicating the number of people the respondent represents CA - 39,809,693 CHIS - 24,031 When data are weighted, resulting estimates are representative of the population. Corrects for: differing probability of sampling within clusters or strata nonresponse Weight
12
Weights Single weight variable -or-
Replicate weights, a series of weight variables which must be used in combination to correctly weight the sample SAS survey syntax in the weighting statement differs slightly depending on the type of weight used
13
SAS survey procedures SAS survey procedures:
Proc Surveyfreq: Frequencies, crosstabs Proc Surveymeans: Means, medians Proc Surveyreg: Linear regression Proc Surveylogistic: Logistic regression Proc Surveyphreg: Cox proportional hazards model Proc Surveyselect: Sample selection Procedures can produce standard errors and confidence intervals Variables may be continuous or categorical. There are many test statistics you can produce with these, such as chi-square, t-tests, and so forth .
14
Results with and without survey procedures: confidence intervals
Example: CHIS, 2016 adult survey Weighted percent and confidence interval * Ever diagnosed with asthma, age 30-34 Proc Freq results: % (13.85%-13.93%) Proc Surveyfreq results: 13.89% (9.97%-17.80%) What do you notice about the percentages and the 95% confidence intervals?
15
Survey components and syntax
Stratification: STRATA statement Clustering: CLUSTER statement Weighting: Subpopulation analyses: DOMAIN statement or “flag” variables Do not use “where” to subset data WEIGHT statement (and REPWEIGHT if using replicate weights) What do you think the weighting statement is called? This is a short workshop, but we want to let you know that there is also syntax to incorporate something called the Finite Population Correction, which makes a minor adjustment of the standard error based on the fraction of the population that is sampled; some survey data sets include this.
16
Survey procedure examples
17
Proc Surveyfreq - stratum/cluster
proc surveyfreq data=dataset varmethod=taylor; strata stratum; cluster PSU; weight weightvar; tables agegrp; run; proc freq data=dataset; tables agegrp; run; Basic code. Varmethod = taylor is the default, so don’t actually need this statement. But, need to know what the variance method is.
18
Proc Surveyfreq - stratum/cluster
Missing data proc surveyfreq nomcar data=dataset total=c.sampfrac; strata stratum; cluster PSU; weight weightvar; tables agegrp * disease / row col cl; format agegrp agegrpf.; run; Finite pop correction Code with extras. Confidence limits Row % Col %
19
Proc Surveyfreq - replicate weights
proc surveyfreq data=dataset varmethod=jackknife; weight weightvar; repweight wtvar1-wtvar80/JKCOEFS=1; tables agegrp * disease / row cl ; format agegrp agegrpf.; run; Variance estimation method Two weighting statements Is there a statement that appears to be missing from this SAS code? Stratum information is contained in the replicate weights, so you don’t need a separate ‘Strata’ statement if you use the replicate weights. CHIS doesn’t use clustering, so you don’t need a ‘cluster’ statement.
20
Libname statement data adult; run; libname CHIS ‘C:\HOW\Heck’;
set chis.adult; run; Example with NHANES using NHANES variables- actually run in SAS?
21
CHIS age variable
22
Proc Surveyfreq - age proc surveyfreq data=adult varmethod= ?????;
weight ????? ; repweight ????? /JKCOEFS=1; tables ????? ; run; Jackknife variance method * Weight variable = rakedw0 Repweight = rakedw1-rakedw80 Age = srage_p1
23
Proc Surveyfreq - age proc surveyfreq data=adult varmethod=jackknife;
weight rakedw0; repweight rakedw1-rakedw80/JKCOEFS=1; tables srage_p1 / cl ; run;
24
Proc Surveyfreq - Results
25
CHIS: Asthma variable
26
Proc Surveyfreq syntax
proc surveyfreq data=adult varmethod=jackknife; weight ?????; repweight ????? / JKCOEFS=1 ; tables ????? * ????? / row cl nototal ; run; Category (age) Outcome (asthma) No row/col totals
27
Proc Surveyfreq syntax
proc surveyfreq data=c.adult varmethod=jackknife; weight rakedw0; repweight rakedw1-rakedw80 / JKCOEFS=1 ; tables srage_p1 * ab17 / row cl nototal ; run; Category (age) Outcome (asthma) No row/col totals
28
Proc Surveyfreq output
29
Proc Surveyfreq with chi-square
proc surveyfreq data=c.adult varmethod=jackknife; weight weightvar; repweight wtvar1-wtvar80 / JKCOEFS=1 ; tables srsex * ab29 / row cl nototal chisq ; run; Gender Hypertension Chi-square
30
Proc Surveyfreq output
31
Proc Surveymeans example CHIS 2016, number of times walked for leisure, past 7 days, by family type
proc surveymeans data=c.adult varmethod=JACKKNIFE; weight rakedw0; repweight rakedw1-rakedw80 / JKCOEFS=1 ; var AD41W ; domain FAMT4 ; run; Can also add the ‘class’ statement to proc surveymeans. AD41W = how often walked Domain = group(s) of interest FAMT4 = family structure
32
Results
33
Proc Surveylogistic example Usual source of care by uninsured, adults 18-64, CHIS 2016
proc surveylogistic data=adult varmethod=JACKKNIFE; weight rakedw0; repweight rakedw1-rakedw80/JKCOEFS=1; class uninsured (ref='Insured'); model nousual (descending) = uninsured ; format uninsured unins.; run; We created a variable for having no usual place where the respondent obtained health care when they need it. The logistic outcome variable is coded 1 if they had no usual source of care, or 0 if they had a usual source.
34
Proc Surveylogistic results
35
Resources to analyze CHIS data
Analyze CHIS Data website: spx Webinar: chis-data-analysis-webinar-recording/
36
Thank you! Questions?
37
Contact Information Name: Melanie Dove Company: UC Davis City/State: Sacramento, CA Phone:
38
Contact Information Name: Katherine Heck Company: UCSF City/State: San Francisco, CA Phone:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.