JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.

Slides:



Advertisements
Similar presentations
Andrea M. Landis, PhD, RN UW LEAH
Advertisements

9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
Departments of Medicine and Biostatistics
Exploring Multiple Dimensions of Asthma Disparities Using the Behavioral Risk Factor Surveillance System Kirsti Bocskay, PhD, MPH Office of Epidemiology.
Selection of Research Participants: Sampling Procedures
Chapter 7 Sampling Distributions
Analysis of Complex Survey Data Katherine M. Keyes
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Trends in Chronic Diseases by Demographic Variables, Hawaii’s Older Population, Hawaii Health Survey (HHS) K. Kromer Baker 1, A. T. Onaka 1, B. Horiuchi.
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Multiple Choice Questions for discussion
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
DESIGN FEATURES OF NCHS SURVEYS By Iris Shimizu Mathematical Statistician Office of Research and Methodology, NCHS Disclaimer: The opinions in this presentation.
2004 Falls County Health Survey Texas Behavioral Risk Factor Surveillance System (BRFSS)
Comparable Health Data Between Canada and the U.S. n Many organizations such as the United Nations, World Health Organization and the Organization of Economic.
HS499 Bachelor’s Capstone Week 6 Seminar Research Analysis on Community Health.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Jacqueline Wilson Lucas, B.A., MPH Renee Gindi, Ph.D. Division of Health Interview Statistics Presented at the 2012 National Conference on Health Statistics.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
Analyzing NCHS Drug Data Amy B. Bernstein, Sc.D. Presented at the NCHS Board of Scientific Counselors Meeting January 28, 2005 U.S. DEPARTMENT OF HEALTH.
National Health and Nutrition Examination Survey: A Very General Overview Taken from various NHANES sources and Lein’s comments.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Small Area Health Insurance Estimates (SAHIE) Program Joanna Turner, Robin Fisher, David Waddington, and Rick Denby U.S. Census Bureau October 6, 2004.
WWEIA, NHANES Dietary Data: Data Preparation Steps for Dietary Analysis Randy P. LaComb Food Surveys Research Group Beltsville Human Nutrition Research.
DTC Quantitative Methods Survey Research Design/Sampling (Mostly a hangover from Week 1…) Thursday 17 th January 2013.
National Hospital Discharge Survey: A Hands-On Workshop Using Public-Use Data Files Michelle N. Podgornik, MPH 2006 Data Users Conference July 11, 2006.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Assessing SES differences in life expectancy: Issues in using longitudinal data Elsie Pamuk, Kim Lochner, Nat Schenker, Van Parsons, Ellen Kramarow National.
Shane Lloyd, MPH 2011, 1,2 Annie Gjelsvik, PhD, 1,2 Deborah N. Pearlman, PhD, 1,2 Carrie Bridges, MPH, 2 1 Brown University Alpert Medical School, 2 Rhode.
National Center for Health Statistics National Health and Nutrition Examination Survey OP96S002.
Building Wave Response Rates in a Longitudinal Survey: Essential for Nonsampling Error Reduction or Last In - First Out? Steven B. Cohen Fred Rohde and.
Psychological Distress and Recurrent Pain: Results from the 2002 NHIS Psychological Distress and Recurrent Pain: Results from the 2002 NHIS Loren Toussaint,
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Racial/Ethnic Differences in the Use of Health Information to Self-Advocate During the Medical Encounter: Is Having Health Information Enough? Jacqueline.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Introduction to Survey Sampling
Analytical Example Using NHIS Data Files John R. Pleis.
 2013 Cengage-Wadsworth A National Nutrition Agenda for the Public’s Health.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
CASE STUDY: NATIONAL SURVEY OF FAMILY GROWTH Karen E. Davis National Center for Health Statistics Coordinating Center for Health Information and Service.
1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago
Clifford Johnson, Director U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Introduction to NCHS Rob Weinzimer, Special Assistant for Outreach Centers for Disease Control and Prevention National Center for Health Statistics.
Research proposal (Lecture 3) Dr.Rehab F Gwada. Objectives of the Lecture The student at the end of this lecture should Know Identify Target Population.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
AC 1.2 present the survey methodology and sampling frame used
DISCUSSION & CONCLUSIONS
Associations between Depression and Obesity: Findings from the National Health and Nutrition Examination Survey, Arlene Keddie, Ph.D. Assistant.
Emilia H. Koumans, Fujie Xu, Maya Sternberg, Lauri E. Markowitz
Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
Introduction to Survey Data Analysis
Complex Surveys
Deanna Kruszon-Moran, MS
Presentation transcript:

JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey Data

National Surveys: Advantages Guides actions & policies to improve the health Allows researchers to ask & answer questions on a population level from previously collected data Available data without any replication of effort Prohibitive cost of obtaining primary data Probability-based complex sample designs

National Surveys: Challenges Locating a dataset that includes the variables to address the research question Additions or deletions of variables and differences in methods of assessment of variables in different waves (years) in each survey Merging multiple data files Analyzing data using complex sample design

Complex Sample Designs Used to acquire representation of an entire population using a sample of the population More efficient than simple random samples:  Do not require complete enumeration of the population  Allows researchers to visit compact areas to obtain in person data (interview or laboratory)  Able to oversample small or sensitive subgroups to adequately represent their variability Assuming simple random sampling gives low variance estimates without accounting for the correlation among individuals within clusters yielding biased results

Complex Sample Analysis Accurately estimates population means and frequencies of the population from the sample after taking into account over or under sampling of specific groups Statistical Programs to address sampling design elements such as stratification, clusters, and weights  SUDAAN®, Complex Sample analysis in SPSS®, & survey procedures in SAS® Weighting without complex samples analysis leads to grossly reduced estimates of population variability  Estimates are computed as if the measures were obtained from the number of cases in the entire population rather than the number of cases in the sample in the data set Clusters include participants who are more similar to one another than those in another cluster

National Health and Nutrition Examination Survey Combination of health and nutrition questionnaires and physical examination to assess the health and nutritional status of adults and children in the United States 1. National Health and Nutrition Examination Survey (NHANES) data can be used to:  Produce estimates of personal health conditions  Vital statistics  Establish national standards for measurements (i.e. blood pressure)  Analyze risk factors for diseases  Examine disparities in health status 60-minute Interview: demographic, socioeconomic, dietary, & health-related questions Physical Examination: most performed in mobile examination centers- medical, dental, physiological measurements, & laboratory tests depending on the participants’ age & gender

NHANES Sampling Plan Probability-based complex sample design represents the civilian, non- institutionalized U.S. population. Excludes individuals:  Residing in nursing homes  Armed forces  Institutionalized  U.S. nationals living outside the U.S Stage 1: Fifteen Primary Sampling Units (PSUs)- counties or small groups of contiguous counties depending on the population of the counties; sample size in each PSU is approximately 5,000 examined participants per year. Stage 2: Segments within the PSUs are selected- a cluster of households in a block or a group of blocks depending on population density Stage 3: Households within the segments are selected Stage 4: One or more participants within the households are randomly selected Diagram: NHANES Sampling Plan 2

NHANES Sampling Oversamples small and sensitive subgroups:  Persons over 60  African Americans  Low income population  Entire Hispanic population (not only Mexican Americans) A representative sample of these groups by age, sex, and income Reliable and precise health status indicator estimates Each NHANES participant represents approximately 50,000 other U.S. residents.

NHANES Survey Weights Purpose: account for oversampling, survey non- response, & post-stratification Sample weights are assigned to each person based on the number of people they represent within the U.S. Census non-institutionalized civilian population NHANES provides three weights (2 & 4 year increments)  Interview weights: all people interviewed  Medical examination weights: interviewed & medical examination  Fasting laboratory weights: interviewed, medical examination, & fasting laboratory tests

Creating Analytical File from NHANES Download data files & codebooks  Combination of 11 individual data files in NHANES Transfer text data files to statistical package (SPSS, SAS) Screen each data files for unused variables & delete If names of variables or responses changed between multiple survey years, rename variables and recode responses before merging Merge different data files by sequence number assigned to each participant  Data collected varied by age; Files do not have the same number of records in each file Recode variables to create the study variable

Files Merged to Create Analytical File In SPSS, complex sample analysis, a complex sample plan file is created with NHANES year fasting laboratory weight (WTSAF2YR) and design variables: strata (SDMVSTR) and cluster (SDMVPSU). Data Analysis Prep In SPSS complex sample analysis: a complex sample plan file is created with NHANES year fasting laboratory weight (WTSAF2YR) and design variables: strata (SDMVSTR) and cluster (SDMVPSU From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.

Comparison of Descriptive Statistics of Categorical Data Frequencies: weighting & complex sample results are the same because the sample size is the same. Race/Ethnicity variable:  Un-Weighting- Racial minorities account for 52% of the sample (oversampled)  Complex Samples- Racial minorities account for only 30% when estimated for the U.S. population using (more representative of the U.S.)

Comparison of Descriptive Statistics of Continuous Data Mean for each continuous variable changes when weighted Mean remains the same with weighting and complex samples analysis  The proportion of cases with each value remains constant Standard error of the mean for weighting  Almost non-existent when weighting since the sample size appears to be the entire population Complex samples analysis mean and standard error  Accurate because the mean is estimated for the entire population based on calculations of the # of cases from which data are obtained Note. Not all the variables from the metabolic syndrome study are presented in the table; * measured using Patient Health Questionnaire From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231– 237

Comparison of Logistic Regression: Education Regressed on Metabolic Syndrome Logistic Regression with two dichotomous variables  Education high school education =1  Metabolic syndrome is coded as absent =0 and present =1 Chi-Square is the statistical methodology, which analyzes frequencies that are not affected by dispersion (only small differences) Un-weighted analysis: those who have less than high-school education are 62% more likely to have metabolic syndrome Weighted and complex samples analysis: likelihood increases to 74% Odds Ratio is the same for the weighted and complex sampling analysis, but the 95% CI are unrealistically narrow for the weighted analysis Notes. See below for coding. From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

Comparison of Linear Regression: Depressive Symptoms Regressed on Diet Linear Regression with two continuous variables Linear Un-weighted and weighted data: biased results  Depressive symptoms- measured via Patient Health Questionnaire (PHQ-9)  Diet- measured as the number of calories consumed per day Complex samples analysis: depressive symptoms do not predict (p =.151) diet  Depressive symptoms predict diet (p =.006, p <.001, respectively Notes. See below for coding. From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

Limitations: NHANES & Other Secondary Data Cross-sectional data: unable to determine causality Unable to control definitions of variables, measurement, & data collection  Unable to exclude subjects with a history of psychosis since NHANES does not collect these data  Definition of smoking limited due to collected NHANES data in the metabolic syndrome study  Unable to choose how depressive symptoms were measured

Conclusion Use of national data sets allows use of extensive, expensive, well documented survey data for exploratory questions but limits analysis to those variables included in the data set Large sample: examine multiple predictors & interactive relationships Challenges of National databases  Merging data files  Differentiating the availability of data in different waves of surveys  Using complex sampling techniques to provide a representative sample Complex samples data analysis programs allows inclusion of sampling design elements (Stratification, Clusters, & weights)  Provides unbiased population estimates of frequencies, means, & variability  Provide results representative of the US population

FOR FURTHER INFORMATION, PLEASE CONTACT ME AT THANK YOU Questions or Comments?

References 1 NCHS. (2009). NHANES public data general release file documentation, from /generaldoc_e.htm /generaldoc_e.htm 2 NCHS. (2010). Continuous NHANES Web Tutorial: Sample Design, from /SampleDesign/intro.htm /SampleDesign/intro.htm Labeled Tables: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.