Presentation is loading. Please wait.

Presentation is loading. Please wait.

JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.

Similar presentations


Presentation on theme: "JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey."— Presentation transcript:

1 JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 JSAYLOR@UDEL.EDU Essentials of Complex Data Analysis Utilizing National Survey Data

2 National Surveys: Advantages Guides actions & policies to improve the health Allows researchers to ask & answer questions on a population level from previously collected data Available data without any replication of effort Prohibitive cost of obtaining primary data Probability-based complex sample designs

3 National Surveys: Challenges Locating a dataset that includes the variables to address the research question Additions or deletions of variables and differences in methods of assessment of variables in different waves (years) in each survey Merging multiple data files Analyzing data using complex sample design

4 Complex Sample Designs Used to acquire representation of an entire population using a sample of the population More efficient than simple random samples:  Do not require complete enumeration of the population  Allows researchers to visit compact areas to obtain in person data (interview or laboratory)  Able to oversample small or sensitive subgroups to adequately represent their variability Assuming simple random sampling gives low variance estimates without accounting for the correlation among individuals within clusters yielding biased results

5 Complex Sample Analysis Accurately estimates population means and frequencies of the population from the sample after taking into account over or under sampling of specific groups Statistical Programs to address sampling design elements such as stratification, clusters, and weights  SUDAAN®, Complex Sample analysis in SPSS®, & survey procedures in SAS® Weighting without complex samples analysis leads to grossly reduced estimates of population variability  Estimates are computed as if the measures were obtained from the number of cases in the entire population rather than the number of cases in the sample in the data set Clusters include participants who are more similar to one another than those in another cluster

6 National Health and Nutrition Examination Survey Combination of health and nutrition questionnaires and physical examination to assess the health and nutritional status of adults and children in the United States 1. National Health and Nutrition Examination Survey (NHANES) data can be used to:  Produce estimates of personal health conditions  Vital statistics  Establish national standards for measurements (i.e. blood pressure)  Analyze risk factors for diseases  Examine disparities in health status 60-minute Interview: demographic, socioeconomic, dietary, & health-related questions Physical Examination: most performed in mobile examination centers- medical, dental, physiological measurements, & laboratory tests depending on the participants’ age & gender

7 NHANES Sampling Plan Probability-based complex sample design represents the civilian, non- institutionalized U.S. population. Excludes individuals:  Residing in nursing homes  Armed forces  Institutionalized  U.S. nationals living outside the U.S Stage 1: Fifteen Primary Sampling Units (PSUs)- counties or small groups of contiguous counties depending on the population of the counties; sample size in each PSU is approximately 5,000 examined participants per year. Stage 2: Segments within the PSUs are selected- a cluster of households in a block or a group of blocks depending on population density Stage 3: Households within the segments are selected Stage 4: One or more participants within the households are randomly selected Diagram: NHANES Sampling Plan 2

8 NHANES 2007-2008 Sampling Oversamples small and sensitive subgroups:  Persons over 60  African Americans  Low income population  Entire Hispanic population (not only Mexican Americans) A representative sample of these groups by age, sex, and income Reliable and precise health status indicator estimates Each NHANES 2007-2008 participant represents approximately 50,000 other U.S. residents.

9 NHANES Survey Weights Purpose: account for oversampling, survey non- response, & post-stratification Sample weights are assigned to each person based on the number of people they represent within the U.S. Census non-institutionalized civilian population NHANES provides three weights (2 & 4 year increments)  Interview weights: all people interviewed  Medical examination weights: interviewed & medical examination  Fasting laboratory weights: interviewed, medical examination, & fasting laboratory tests

10 Creating Analytical File from NHANES Download data files & codebooks  Combination of 11 individual data files in NHANES 07-08 Transfer text data files to statistical package (SPSS, SAS) Screen each data files for unused variables & delete If names of variables or responses changed between multiple survey years, rename variables and recode responses before merging Merge different data files by sequence number assigned to each participant  Data collected varied by age; Files do not have the same number of records in each file Recode 07-08 variables to create the study variable

11 2007-2008 Files Merged to Create Analytical File In SPSS, complex sample analysis, a complex sample plan file is created with NHANES 2007-2008 2 year fasting laboratory weight (WTSAF2YR) and design variables: strata (SDMVSTR) and cluster (SDMVPSU). Data Analysis Prep In SPSS complex sample analysis: a complex sample plan file is created with NHANES 2007-2008 2 year fasting laboratory weight (WTSAF2YR) and design variables: strata (SDMVSTR) and cluster (SDMVPSU From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.

12 Comparison of Descriptive Statistics of Categorical Data Frequencies: weighting & complex sample results are the same because the sample size is the same. Race/Ethnicity variable:  Un-Weighting- Racial minorities account for 52% of the sample (oversampled)  Complex Samples- Racial minorities account for only 30% when estimated for the U.S. population using (more representative of the U.S.)

13 Comparison of Descriptive Statistics of Continuous Data Mean for each continuous variable changes when weighted Mean remains the same with weighting and complex samples analysis  The proportion of cases with each value remains constant Standard error of the mean for weighting  Almost non-existent when weighting since the sample size appears to be the entire population Complex samples analysis mean and standard error  Accurate because the mean is estimated for the entire population based on calculations of the # of cases from which data are obtained Note. Not all the variables from the metabolic syndrome study are presented in the table; * measured using Patient Health Questionnaire From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231– 237

14 Comparison of Logistic Regression: Education Regressed on Metabolic Syndrome Logistic Regression with two dichotomous variables  Education high school education =1  Metabolic syndrome is coded as absent =0 and present =1 Chi-Square is the statistical methodology, which analyzes frequencies that are not affected by dispersion (only small differences) Un-weighted analysis: those who have less than high-school education are 62% more likely to have metabolic syndrome Weighted and complex samples analysis: likelihood increases to 74% Odds Ratio is the same for the weighted and complex sampling analysis, but the 95% CI are unrealistically narrow for the weighted analysis Notes. See below for coding. From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

15 Comparison of Linear Regression: Depressive Symptoms Regressed on Diet Linear Regression with two continuous variables Linear Un-weighted and weighted data: biased results  Depressive symptoms- measured via Patient Health Questionnaire (PHQ-9)  Diet- measured as the number of calories consumed per day Complex samples analysis: depressive symptoms do not predict (p =.151) diet  Depressive symptoms predict diet (p =.006, p <.001, respectively Notes. See below for coding. From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

16 Limitations: NHANES & Other Secondary Data Cross-sectional data: unable to determine causality Unable to control definitions of variables, measurement, & data collection  Unable to exclude subjects with a history of psychosis since NHANES does not collect these data  Definition of smoking limited due to collected NHANES 07- 08 data in the metabolic syndrome study  Unable to choose how depressive symptoms were measured

17 Conclusion Use of national data sets allows use of extensive, expensive, well documented survey data for exploratory questions but limits analysis to those variables included in the data set Large sample: examine multiple predictors & interactive relationships Challenges of National databases  Merging data files  Differentiating the availability of data in different waves of surveys  Using complex sampling techniques to provide a representative sample Complex samples data analysis programs allows inclusion of sampling design elements (Stratification, Clusters, & weights)  Provides unbiased population estimates of frequencies, means, & variability  Provide results representative of the US population

18 FOR FURTHER INFORMATION, PLEASE CONTACT ME AT JSAYLOR@UDEL.EDUJSAYLOR@UDEL.EDU. THANK YOU Questions or Comments?

19 References 1 NCHS. (2009). NHANES 2007-2008 public data general release file documentation, from http://www.cdc.gov/nchs/nhanes/nhanes2007- 2008/generaldoc_e.htm http://www.cdc.gov/nchs/nhanes/nhanes2007- 2008/generaldoc_e.htm 2 NCHS. (2010). Continuous NHANES Web Tutorial: Sample Design, from http://www.cdc.gov/nchs/tutorials/Nhanes/SurveyDesign /SampleDesign/intro.htm http://www.cdc.gov/nchs/tutorials/Nhanes/SurveyDesign /SampleDesign/intro.htm Labeled Tables: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.


Download ppt "JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey."

Similar presentations


Ads by Google