NHANES 1999-2004 Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
An Assessment of the Impact of Two Distinct Survey Design Modifications on Health Insurance Coverage Estimates in a National Health Care Survey Steven.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Analysis of Complex Survey Data Day 5, Special topics: Developing weights and imputing data.
NIMH Collaborative Psychiatric Epidemiology Surveys
Analysis of Complex Survey Data Katherine M. Keyes
Who and How And How to Mess It up
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Analysis of National Health Interview Survey Data
Chapter 5: Descriptive Research Describe patterns of behavior, thoughts, and emotions among a group of individuals. Provide information about characteristics.
Trends in Chronic Diseases by Demographic Variables, Hawaii’s Older Population, Hawaii Health Survey (HHS) K. Kromer Baker 1, A. T. Onaka 1, B. Horiuchi.
Understanding and Using NAMCS and NHAMCS Data Data Tools and Basic Programming Techniques 2010 National Conference on Health Statistics August 16, 2010.
Joint Canada/U.S. Health Survey Catherine Simile, National Center for Health Statistics Patrice Mathieu, Statistics Canada Ed Rama, Statistics Canada NCHS.
Sample Design.
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
DESIGN FEATURES OF NCHS SURVEYS By Iris Shimizu Mathematical Statistician Office of Research and Methodology, NCHS Disclaimer: The opinions in this presentation.
2004 Falls County Health Survey Texas Behavioral Risk Factor Surveillance System (BRFSS)
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
National Health and Nutrition Examination Survey: A Very General Overview Taken from various NHANES sources and Lein’s comments.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
WWEIA, NHANES Dietary Data: Data Preparation Steps for Dietary Analysis Randy P. LaComb Food Surveys Research Group Beltsville Human Nutrition Research.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
American Community Survey “It Don’t Come Easy”, Ringo Starr Jane Traynham Maryland State Data Center March 15, 2011.
Assessing SES differences in life expectancy: Issues in using longitudinal data Elsie Pamuk, Kim Lochner, Nat Schenker, Van Parsons, Ellen Kramarow National.
MEPS WORKSHOP Household Component Survey Estimation Issues Household Component Survey Estimation Issues Steve Machlin, Agency for Healthcare Research and.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
National Center for Health Statistics National Health and Nutrition Examination Survey OP96S002.
Building Wave Response Rates in a Longitudinal Survey: Essential for Nonsampling Error Reduction or Last In - First Out? Steven B. Cohen Fred Rohde and.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
The National Health and Nutrition Examination Survey U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center.
Analytical Example Using NHIS Data Files John R. Pleis.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Analysis of Experiments
CASE STUDY: NATIONAL SURVEY OF FAMILY GROWTH Karen E. Davis National Center for Health Statistics Coordinating Center for Health Information and Service.
1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Appropriate use of Design Effects and Sample Weights in Complex Health Survey Data: A Review of Articles Published using Data from Add Health, MTF, and.
Table 1. Methodological Evaluation of Observational Research (MORE) – observational studies of incidence or prevalence of chronic diseases Tatyana Shamliyan.
1 ANALYZING DATA FROM THE NATIONAL IMMUNIZATION SURVEY __________________________________________ Michael P. Battaglia Abt Associates Inc. Meena Khare.
Working with the ECLS-B Datasets Weights and other issues.
Medical Expenditure Panel Survey
Introduction to Survey Data Analysis
Complex Surveys
Deanna Kruszon-Moran, MS
Journal reviews 이승호.
STEPS Site Report.
Presentation transcript:

NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics

Analyzing Data NHANES Preparing your data files Downloading demographic, questionnaire, exam and lab files. Files are no longer available as self-extracting zip files. Documentation and procedure files are now in Adobe PDF format and can be viewed or accessed directly via the web link Clicking on the data link will allow you to store the data file or open it directly with SAS. Data files are in SAS transport (.xpt) format.

Know your data Read the documentation !!

Preparing your data files Merging: Merge all files by sequence number to the demographic file. Verify the numbers of records merged and the final sample number against the published frequencies on the web. Be sure they are what you expected and all merges worked correctly.

Know your data Run basic frequencies. Know your target population. Understand how item was measured   (how is the item defined, topcoded, recoded) Recode variables as necessary   (example: age groups, positive/negative lab tests, high/low BP, high/low cholesterol etc.). Recode unknown/refusals as missing data (77, 99 recode to missing). Check your coding – run frequencies in SAS.

Know your data Continuous Outcome Data: Look for outliers in your measure.   Run Proc Univariate. Look for outliers among the weights.   Use Proc Univariate on the weight variable.   Outlying variables especially those with large weights can really influence your estimates. Look at normality.   Consider transformations.   Log, square root, power.

NHANES Sample Design NHANES is a complex, multistage, probability cluster design of the civilian, noninstitutionalized US population.

Sample Weights To analyze NHANES data you must use the sample weights to account for :

Stage 4 Individuals Stage 1 Counties Stage 2 Segments Stage 3 Households 1. The base probability of selection

2. Over sampling NHANE Oversampled African Americans Mexican Americans Persons with low income Adolescents aged Persons aged 60+

3.Non-response to the interview & exam Sample persons age 20+ Household interview N= % MEC Exam N= % Screening interview N=13312 Exam Non-response 7% Interview Non-response 22%

Non-response issues for NHANES Non-response: Most components have some level of individual item or component non-response. ONLY non-response to the interview and exam has already been accounted for in the weights. All additional non-response to the outcome measure of interest should be examined against all possible predictors. Potential biases should be discussed. If non-response is “high”, re-weighting should be considered.

Why weight? SampleSubdomain % US Population % sample unweighted % sample weighted Non-Hispanic Blacks 13%25%12% Mexican Americans 9%28%9% year olds12%24%12%

Sample weights – Which weights? Weight Variables to Use Household Interview Data ONLY ANY Data from Exam/Lab/MEC Interview Any 2 yrs of data ( or or )WTINT2YRWTMEC2YR 4 yrs of data ( ) *WTINT4YRWTMEC4YR 4 or 6 yrs of data ( ) or ( ) Combine appropriate 2 or 4 year weights as follows:

Two, Four, Six, Eight - How can we estimate? For 4 years of data from MEC4YR = 1/2 WTMEC2YR ; For 6 years of data from – if sddsrvyr=1 or sddsrvyr=2 then MEC6YR = 2/3 WTMEC4YR ; /* for */ If sddsrvyr=3 then MEC6YR = 1/3 WTMEC2YR ; /* for */ * Only when analyzing years , you should not combined 2 year weights but use the 4 year weights provided.

Two, Four, Six, Eight - How can we estimate? Future years of data will be combined similarly: For 6 years of data from if sddsrvyr in (1,2,3) then MEC6YR = 1/3 WTMEC2YR; For 8 years of data from – if sddsrvyr=1 or sddsrvyr=2 then MEC8YR = 1/2 WTMEC4YR ; /* for */ if sddsrvyr=3 or sddsrvyr=4 then MEC8YR = 1/4 WTMEC2YR etc; /* for */

Sample Weights - Subsamples Subsamples and appropriate weights: Look at your primary variable of interest and the corresponding weight. Look at all other variables you want to combine with it. Are all from the interview? Exam? Subsample (i.e. fasting, audiometry, dioxin, VOC’s …) ? Use the weight from the smallest subsample for your analysis. Be consistent!

Sample Weights - Subsamples Subsamples and appropriate weights: Be careful about combining subsamples beyond MEC + VOC’s, Interview + Dioxin etc. Combining subsamples such as Environmental + AM fasting could be problematic. Some subsamples are mutually exclusive. Weights were not designed for combining subsamples and may not produce good estimates.

Preparing for Analyses Subsetting the data for SUDAAN: If using MEC exam weights - SUBSET the data on those MEC EXAMINED in SAS before using SUDAAN. If using other subsample weights – subset the data on those in the subsample corresponding to the weights you are using. Then use the SUBPOPN statement in the SUDAAN procedure to further subset your data by age, gender etc. to reflect the target population you are interested in analyzing.

Sample Weights Example: You are interested in examining the association of high triglycerides, blood pressure, and body mass index (BMI) controlling for race/ethnicity on females age from the 6 years of data from

Sample Weights Step 1 – Determine the smallest sample population for the analysis to determine the correct weight to use. Race/ethnicity, gender and age are in the interview. Blood pressure and weight come from the MEC exam a subset of those interviewed. Triglycerides were measured on a subsample of those MEC examined who fasted for 8 hours and came to the AM MEC exam. Therefore, the fasting subsample is the smallest subsample in the analysis and you would use the AM fasting weights (WTSAF2YR and WTSAF4YR).

Sample Weights Step 2 – Combine weights in SAS prior to the SUDAAN procedure for the 6 years from : If sddsrvyr in (1,2) then WEIGHT6 =2/3*WTSAF4YR ; /* */ If sddsrvyr=3 then WEIGHT6= 1/3*WTSAF2YR ; /* */

Sample Weights Step 3 – Subset your data set in SAS to reflect the weight being used (AM fasting weights WTSAF2YR or WTSAF4YR) : SAS Code: IF WTSAF2YR ne. or WTSAF4YR ne. ;

Sample Weights Step4 – Last specify the correct weight to use using the weight statement in SUDAAN and subset your data to obtain the subpopulation of interest using the SUBPOPN statement in SUDAAN (females age 20-59): WEIGHT WEIGHT6 ; SUBPOPN riagendr=2 and ridageyr > 19 and ridageyr < 60 ;

NHANES Variance Estimation Why must you use the sample design to estimate the variance? NHANES is a cluster design Individual within a cluster are more similar than those in other clusters. This homogeneity or clustering results in a reduction of our effective sample size because we choose individuals within cluster vs randomly throughout the population.

NHANES Variance Estimation Why must you use the sample design to estimate the variance? Variance estimates that do not account for this intra cluster correlation are too low and biased. Survey software such as SUDAAN or SAS Survey procedures must be used to account for the complex design and produce unbiased variance estimates These procedures require information on the sample design (i.e. identification of the PSU and strata) for each sample person.

NHANES Variance Estimation For the initial data release we recommended: Using JK-1/Jackknife/”leave-one-out” procedure. Required 52 replicate weights for each of 52 groups created. Only provided for Can still be used if you have software that can produce the replicate weights. Replicate weights for this procedure will no longer be created on the data set. Too cumbersome

NHANES Variance Estimation We now recommend: Using the Taylor series (linearization) method Same as that used in NHANES III. We now provide “Masked Variance Units” (MVU’s) in place of primary sampling units (PSU’s) to maintain confidentiality. Design variables are called - SDMVSTRA and SDMVPSU.

Design Variables SDMVSTRA and SDMVPSU Found in the demographic file. Found in all two year data sets and can be combined for 4 or 6 or … year data sets. Can be used the same as the actual stratum and PSU variables. Produce variance estimates close to those using the “true” design. Data MUST be sorted by SDMVSTRA and SDMVPSU first, before using SUDAAN.

Sample SUDAAN Code In SAS: IF WTMEC2YR NE. ; (Include only those with weights) PROC SORT OUT=Datasort ; BY SDMVSTRA SDMVPSU; (sort on design variables) SUDAAN code : PROC Descript DATA=Datasort DESIGN=WR ; NEST SDMVSTRA SDMVPSU ; WEIGHT WTMEC2YR ; SUBPOPN RIDAGEYR > 11 AND RIDAGEYR < 50 AND TOXTEST=1 ;

Preparing for Analysis Setting up the procedure in SAS Surveymeans SAS code : PROC Surveymeans data=data ; Strata sdmvstra; Cluster sdmvpsu; Weight WTMEC2YR ; Where RIDAGEYR > 11 AND RIDAGEYR < 50 AND TOXTEST=1 ;

Other data analysis issues from NHANES Calculating Population Totals Estimates of the number of persons in the U.S. population with a particular condition must be done carefully. Recommended procedure is to: First, estimate the proportion with the condition for each subdomain of interest. Mutliply that by the population control totals for that subdomain. Tables are available on the NCHS web site with the current March 2001 CPS control totals as part of the analytic guidelines.

Other data analysis issues from NHANES Calculating Population Totals Estimates of number of persons with a condition can be obtained by summing the weights of those positive. These estimates will be less reliable due to   item non response   and sampling error Not the recommended method.

Analyzing within NHANES Things to consider: Data released in two year cycles. We STRONGLY RECOMMEND using two or more cycles (4 or more years )to produce reliable estimates. Verify data items collected were comparable in wording and methods. When combining years remember to use correct combined weights.

Analyzing trends with NHANES NHANES III to NHANES Things to consider: What is your sample from each survey–age? How different was the question worded or the interview methods ? How different were the lab or exam methodologies ? Cutoffs used? Definitions? For current NHANES sample sizes may be smaller depending on number of years measured - especially in sub domains Larger sampling variation. May need to limit comparisons.

Race/Ethnicity NHANES Two variables available RIDRETH1 & RIDRETH2

Race/Ethnicity NHANES Ridreth1- Use for analyses of data alone. 1=Mexican American 2=other Hispanic 3=non-Hispanic white 4=non-Hispanic black 5=other races including multiracial. For 2 and 4 years of data we know there is insufficient sample size to analyze “other Hispanics” (group 2) alone or to analyze “all Hispanics”. Analyses to evaluate whether 6 years of data ( ) are sufficient to analyze these Hispanic groups are ongoing. Groups 2 and 5 can AND should continue to be combined to represent all other races.

Race/Ethnicity NHANES Ridreth2 Use for analyzing trends from NHANES III to NHANES Most comparable to race/ethnicity variable collected in NHANES III. Coded as : 1=non-Hispanic white 2=non-Hispanic black 3=Mexican American 4=other – including Multi-Racial 5=other Hispanic

Analyzing data from NHANES Crude versus Age Standardized Estimates: Age distributions within survey samples vary by racial/ethnic group. Age distributions also vary by survey – NHANES III vs. NHANES When comparing estimates across racial/ethnic groups or between surveys you may need to age standardize. Also present all age specific estimates!

Analyzing data from NHANES When Age Standardizing: Use the 2000 U.S. Census Population for consistency for both NHANES III and all NHANES or above. For guidelines and population proportions see the website below for the Klein and Schoenborn HP2010 Statistical Notes on “Age Adjustment using the 2000 Projected U.S. Population”.

Analyzing data from NHANES When Age Standardizing: In SUDAAN, use the STDVAR and STDWGT statements. STDVAR –variable name for the age groups. STDWGT – corresponding proportion of the 2000 U.S. Census population for that age subgroup.

Age standardization for NHANES Crude vs. Age Standardized Estimates Example: Hepatitis B NHANES III Non-Hispanic White Non-Hispanic Black Mexican American Crude Prevalence 3.1 ( )11.9 ( )3.6 ( ) Age Standardized 2.6 ( )11.9 ( )4.4 ( )

Analyzing Data from NHANES Analytic Guidelines: Detailed guidelines for working with NHANES data can be found at: This document contains everything discussed today and will continue to grow to include guidelines for statistical tests, multivariate analyses, modeling and more! Web based tutorial also currently in creation. Target date for release is Dec 31 st 2006.