Presentation is loading. Please wait.

Presentation is loading. Please wait.

Melanie Dove, MPH, ScD UC Davis

Similar presentations


Presentation on theme: "Melanie Dove, MPH, ScD UC Davis"— Presentation transcript:

1 Melanie Dove, MPH, ScD UC Davis
Katherine Heck, MPH UC San Francisco QSCERT-PC Postdoc, UCD Surveys: National Health and Nutrition Examination Survey (NHANES), California Health Interview Survey (CHIS) Previously: California Department of Public Health, CDC/NCHS Research analyst, UCSF Surveys: California Maternal and Infant Health Assessment (MIHA) survey, Listening to Mothers-CA Previously: California Department of Public Health, CDC/NCHS

2 Survey data analysis made easy with SAS
Melanie Dove, MPH, ScD UC Davis Katherine Heck, MPH UC San Francisco

3 Overview Background Survey design factors (weight and variance)
How to analyze the data Spend some time on the sampling methods and survey design factors because you need to know a little about the particular survey you’re using in order to analyze the data. We plan to use the California Health Interview Survey and the National Health and Nutrition Examination Surveys as examples.

4 Surveys Representativeness: Using a sample of individuals to represent a population

5 Survey data Different types: Cross sectional Data collection methods
Health, economic, marketing, sociology, psychology Cross sectional Data collection methods In person, phone, mail, online

6 California Health Interview Survey (CHIS)
Health survey that represents California California’s population: 39,809,693 (1/1/2018) Does anyone know the population of California? State of California, Department of Finance, E-1 Population Estimates for Cities, Counties and the State with Annual Percent Change — January 1, 2017 and Sacramento, California, May 2018.

7 Sampling Convenience Simple random Stratified

8 Sampling Cluster Stratification
within specified groups or geographic areas sometimes called primary sampling units (PSUs) Stratification select a specified number of individuals from a particular population group can be used for oversampling  Cluster example – schools within a district Stratification example – age group or county Define oversampling?

9 Stratified Cluster Which one is the stratified sampling scheme?
Stratified Cluster

10 Variance Individuals within clusters are similar
Overestimate variance – significance Need to account for the sample design if any stratification, clustering, or weighting was used If not, result will be incorrect standard errors and confidence intervals SAS survey procedures allow you to indicate design variables in the syntax Will correct for these effects  For example, if counties were used as a cluster, individuals within the same county will be similar to each other.

11 Weighting Weight: a value indicating the number of people the respondent represents CA - 39,809,693 CHIS - 24,031 When data are weighted, resulting estimates are representative of the population. Corrects for: differing probability of sampling within clusters or strata nonresponse Weight

12 Weights Single weight variable -or-
Replicate weights, a series of weight variables which must be used in combination to correctly weight the sample SAS survey syntax in the weighting statement differs slightly depending on the type of weight used

13 SAS survey procedures SAS survey procedures:
Proc Surveyfreq: Frequencies, crosstabs Proc Surveymeans: Means, medians Proc Surveyreg: Linear regression Proc Surveylogistic: Logistic regression Proc Surveyphreg: Cox proportional hazards model Proc Surveyselect: Sample selection Procedures can produce standard errors and confidence intervals Variables may be continuous or categorical. There are many test statistics you can produce with these, such as chi-square, t-tests, and so forth .

14 Results with and without survey procedures: confidence intervals
Example: CHIS, 2016 adult survey Weighted percent and confidence interval * Ever diagnosed with asthma, age 30-34 Proc Freq results: % (13.85%-13.93%) Proc Surveyfreq results: 13.89% (9.97%-17.80%) What do you notice about the percentages and the 95% confidence intervals?

15 Survey components and syntax
Stratification: STRATA statement Clustering: CLUSTER statement Weighting: Subpopulation analyses: DOMAIN statement or “flag” variables  Do not use “where” to subset data WEIGHT statement (and REPWEIGHT if using replicate weights) What do you think the weighting statement is called? This is a short workshop, but we want to let you know that there is also syntax to incorporate something called the Finite Population Correction, which makes a minor adjustment of the standard error based on the fraction of the population that is sampled; some survey data sets include this.

16 Survey procedure examples

17 Proc Surveyfreq - stratum/cluster
proc surveyfreq data=dataset varmethod=taylor; strata  stratum; cluster PSU; weight  weightvar; tables  agegrp; run; proc freq data=dataset; tables agegrp; run; Basic code. Varmethod = taylor is the default, so don’t actually need this statement. But, need to know what the variance method is.

18 Proc Surveyfreq - stratum/cluster
Missing data proc surveyfreq nomcar data=dataset total=c.sampfrac; strata  stratum; cluster PSU; weight  weightvar; tables  agegrp * disease / row col cl; format  agegrp agegrpf.; run; Finite pop correction Code with extras. Confidence limits Row % Col %

19 Proc Surveyfreq - replicate weights
proc surveyfreq data=dataset varmethod=jackknife; weight  weightvar; repweight  wtvar1-wtvar80/JKCOEFS=1; tables  agegrp * disease / row cl ; format  agegrp agegrpf.; run; Variance estimation method Two weighting statements Is there a statement that appears to be missing from this SAS code? Stratum information is contained in the replicate weights, so you don’t need a separate ‘Strata’ statement if you use the replicate weights. CHIS doesn’t use clustering, so you don’t need a ‘cluster’ statement.

20 Libname statement data adult; run; libname CHIS ‘C:\HOW\Heck’;
set chis.adult; run; Example with NHANES using NHANES variables- actually run in SAS?

21 CHIS age variable

22 Proc Surveyfreq - age proc surveyfreq data=adult varmethod= ?????;
weight   ????? ; repweight   ????? /JKCOEFS=1; tables  ?????  ; run; Jackknife variance method * Weight variable = rakedw0 Repweight = rakedw1-rakedw80 Age = srage_p1

23 Proc Surveyfreq - age proc surveyfreq data=adult varmethod=jackknife;
weight  rakedw0; repweight  rakedw1-rakedw80/JKCOEFS=1; tables  srage_p1 / cl ; run;

24 Proc Surveyfreq - Results

25 CHIS: Asthma variable

26 Proc Surveyfreq syntax
proc surveyfreq data=adult varmethod=jackknife; weight   ?????; repweight   ????? / JKCOEFS=1 ; tables  ????? * ?????  / row cl nototal ; run; Category (age) Outcome (asthma) No row/col totals

27 Proc Surveyfreq syntax
proc surveyfreq data=c.adult varmethod=jackknife; weight  rakedw0; repweight  rakedw1-rakedw80 / JKCOEFS=1 ; tables  srage_p1 * ab17 / row cl nototal ; run; Category (age) Outcome (asthma) No row/col totals

28 Proc Surveyfreq output

29 Proc Surveyfreq with chi-square
proc surveyfreq data=c.adult varmethod=jackknife; weight  weightvar; repweight  wtvar1-wtvar80 / JKCOEFS=1 ; tables  srsex * ab29 / row cl nototal chisq ; run; Gender Hypertension Chi-square

30 Proc Surveyfreq output

31 Proc Surveymeans example CHIS 2016, number of times walked for leisure, past 7 days, by family type 
proc surveymeans data=c.adult varmethod=JACKKNIFE;  weight    rakedw0; repweight rakedw1-rakedw80 / JKCOEFS=1 ; var       AD41W ; domain    FAMT4 ; run; Can also add the ‘class’ statement to proc surveymeans. AD41W = how often walked Domain = group(s) of interest FAMT4 = family structure

32 Results

33 Proc Surveylogistic example Usual source of care by uninsured, adults 18-64, CHIS 2016
proc surveylogistic data=adult varmethod=JACKKNIFE;  weight    rakedw0; repweight rakedw1-rakedw80/JKCOEFS=1; class     uninsured (ref='Insured'); model     nousual (descending) = uninsured ; format    uninsured unins.; run; We created a variable for having no usual place where the respondent obtained health care when they need it. The logistic outcome variable is coded 1 if they had no usual source of care, or 0 if they had a usual source.

34 Proc Surveylogistic results

35 Resources to analyze CHIS data
Analyze CHIS Data website: spx Webinar: chis-data-analysis-webinar-recording/

36 Thank you! Questions?

37 Contact Information Name: Melanie Dove Company: UC Davis City/State: Sacramento, CA Phone:

38 Contact Information Name: Katherine Heck Company: UCSF City/State: San Francisco, CA Phone:


Download ppt "Melanie Dove, MPH, ScD UC Davis"

Similar presentations


Ads by Google