MEPS WORKSHOP Household Component Survey Estimation Issues Household Component Survey Estimation Issues Steve Machlin, Agency for Healthcare Research and.

Slides:



Advertisements
Similar presentations
Calculation of Sampling Errors MICS3 Data Analysis and Report Writing Workshop.
Advertisements

Multiple Indicator Cluster Surveys Survey Design Workshop
The Impact of Survey Design Modifications on Health Care Utilization Estimates in a National Longitudinal Health Care Survey Steven B. Cohen, Ph.D. Trena.
Proxy Pattern-Mixture Analysis of Missing Health Expenditure Variables in the Medical Expenditure Panel Survey Proxy Pattern-Mixture Analysis of Missing.
SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—DESCRIPTIVE STUDIES ?
An Assessment of the Impact of Two Distinct Survey Design Modifications on Health Insurance Coverage Estimates in a National Health Care Survey Steven.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Changes in Race Differentials: The Impact of the New OMB Standards.
Estimates and sampling errors for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Medical Expenditure Panel Survey Karen Beauregard Steve Machlin Jeffrey Rhoades.
NIMH Collaborative Psychiatric Epidemiology Surveys
© John M. Abowd 2005, all rights reserved Household Samples John M. Abowd March 2005.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Why sample? Diversity in populations Practicality and cost.
A new sampling method: stratified sampling
Formalizing the Concepts: Simple Random Sampling.
Analysis of National Health Interview Survey Data
Chapter 5: Descriptive Research Describe patterns of behavior, thoughts, and emotions among a group of individuals. Provide information about characteristics.
Understanding and Using NAMCS and NHAMCS Data Data Tools and Basic Programming Techniques 2010 National Conference on Health Statistics August 16, 2010.
Sample Design.
1 Understanding and Using NAMCS and NHAMCS Data: A Hands-On Workshop Susan M. Schappert Donald K. Cherry.
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Major Areas of Health Research Topics Using MEPS Data Access Access Use Use Expenditures Expenditures Health insurance Health insurance Health status and.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
DESIGN FEATURES OF NCHS SURVEYS By Iris Shimizu Mathematical Statistician Office of Research and Methodology, NCHS Disclaimer: The opinions in this presentation.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Integrated Health Care Survey Designs: Analytical Enhancements Achieved Through Linkage of Surveys and Administrative Data 2008 European Conference on.
Medical Expenditure Panel Survey SURVEY OVERVIEW.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
National Health and Nutrition Examination Survey: A Very General Overview Taken from various NHANES sources and Lein’s comments.
Tobacco Use Supplement to the Current Population Survey User’s Workshop June 9, 2009 Tips and Tricks Analyzing TUS-CPS Data Lloyd Hicks Westat.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 Essentials of Complex Data Analysis Utilizing National Survey.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Robin A. Cohen, PhD National Center for Health Statistics National Conference on Health Statistics August 6, 2012 Analytic Uses of National Health Interview.
Current Population Survey Sponsor: Bureau of Labor Statistics Collector: Census Bureau Purpose: Monthly Data for Analysis of Labor Market Conditions –CPS.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Building Wave Response Rates in a Longitudinal Survey: Essential for Nonsampling Error Reduction or Last In - First Out? Steven B. Cohen Fred Rohde and.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Interviewer Effects on Paradata Predictors of Nonresponse Rachael Walsh, US Census Bureau James Dahlhamer, NCHS European Survey Research Association, 2015.
Introduction to Survey Sampling
Analytical Example Using NHIS Data Files John R. Pleis.
Medical Expenditure Panel Survey (MEPS), Health Care Expenditures for the Elderly with Chronic Conditions in 2012 Jeffrey Rhoades.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
CASE STUDY: NATIONAL SURVEY OF FAMILY GROWTH Karen E. Davis National Center for Health Statistics Coordinating Center for Health Information and Service.
1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
NHANES Analytic Strategies Deanna Kruszon-Moran, MS Centers for Disease Control and Prevention National Center for Health Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
RESEARCH METHODS Lecture 28. TYPES OF PROBABILITY SAMPLING Requires more work than nonrandom sampling. Researcher must identify sampling elements. Necessary.
1 ANALYZING DATA FROM THE NATIONAL IMMUNIZATION SURVEY __________________________________________ Michael P. Battaglia Abt Associates Inc. Meena Khare.
Utility of an overlapping panel design in the MEPS Steven B. Cohen, Ph.D.
Working with the ECLS-B Datasets Weights and other issues.
Medical Expenditure Panel Survey
Trena M. Ezzati-Rice, Frederick Rohde, Robert Baskin
Introduction to Survey Data Analysis
Deanna Kruszon-Moran, MS
Presentation transcript:

MEPS WORKSHOP Household Component Survey Estimation Issues Household Component Survey Estimation Issues Steve Machlin, Agency for Healthcare Research and Quality Paul Gorrell, Social and Scientific Systems

Overview Annual person-level estimates – – Overlapping panels Estimation variables – – Weights – – Variance Pooling multiple years of annual data Longitudinal analysis of MEPS panels – – Two-year period Family-level estimation Other miscellaneous issues

Annual Person-Level Files

MEPS Annual Files YearPanel (96-97) Yr. 2 2 (97-98) Yr. 1 Yr. 2 3 (98-99) Yr. 1 Yr. 2 4 (99-00) Yr. 1 Yr. 2 5 (00-01) Yr. 1 Yr. 2 6 (01-02) Yr. 1

MEPS Annual Files YearPanel (01-02) 6 (01-02) Yr. 2 7 (02-03) 7 (02-03) Yr. 1 Yr. 2 8 (03-04) 8 (03-04) Yr. 1 Yr. 2 9 (04-05) 9 (04-05) Yr. 1

MEPS Annual Person Level Estimation File Number HC- 020 HC- 028 HC- 038 HC- 050 HC- 060 Persons with weight > 0 32,63622,95323,56523,83932,122 Weighted Persons: All million million million million million INSC1231=1 (in target pop. at end of year) million million million million million

MEPS Annual Person Level Estimation (continued) File Number HC-070HC-079HC-089 Persons with weight > 0 37,41832,68132,737 Weighted Persons: All million million 293.5million INSC1231=1 (in target pop. at end of year) million million 289.7million

Weights and Variance Estimation Variables

MEPS Sample Design Each panel is sub-sample of household respondents for the previous year’s National Health Interview Survey (NHIS) – – NHIS sponsor is National Center for Health Statistics NHIS sample based on complex stratified multi-stage probability design Civilian non-institutionalized population

NHIS Sample Design ( ) U.S. partitioned into 1,995 Primary Sampling Units (Counties or groups of adjacent counties) PSU’s grouped into 237 design strata – – 358 PSU’s sampled across strata Second Stage Units (SSU’s) – – Clusters of housing units – – Oversample of SSU’s with large Black/Hispanic populations MEPS based on subsample of about 200 PSU’s from NHIS

Oversampling in MEPS Every year: Blacks and Hispanics – – Carryover from NHIS 1997: Selected subpopulations – – Functionally impaired adults – – Children with activity limitations – – Adults predicted to have high medical expenditures – – Low income – – Adults with other impairments 2002 and beyond: – – Asians – – Low income – – Additional oversampling of blacks in 2004

Estimation from Complex Surveys Estimates need to be weighted to reflect sample design and survey nonresponse – – Unweighted estimates are biased Use appropriate method to compute standard errors to account for complex design – – Assuming simple random sampling usually underestimates sampling error

Development of Person Weights Base Weight (NHIS) Base Weight (NHIS) – Compensates for oversampling and nonresponse Adjustments for Adjustments for – Household nonresponse (MEPS Round 1) – Attrition of persons (Subsequent Rounds) – Poststratification (Census Population Estimates) – Trimming of extreme weights Final Person Weight Final Person Weight – Weight > 0: person selected and in-scope for survey – Weight = 0 (about 4% in 2002): person not in-scope for survey but living in household with in-scope person(s)

Distribution of MEPS Sample Person Final Weights Average8,31211,91711,73011,6798,849 Minimum Maximum68,51884,58780,06278,15767,537 Variable Name WTDPER97WTDPER98PERWT99FPERWT00FPERWT01F

Distribution of Sample Person Final Weights (continued) Average7,7028,8928,966 Minimum Maximum46,76660,27363,728 Variable Name PERWT02FPERWT03FPERWT04F

Types of Basic Point Estimates Means Means Proportions Proportions Totals Totals Differences between subgroups Differences between subgroups

Variance Estimation Basic software procedures assume simple random sampling (SRS) Basic software procedures assume simple random sampling (SRS) – MEPS not SRS – Point estimates correct (if weighted) – Standard errors usually too small Software to account for complex design using Taylor Series approach Software to account for complex design using Taylor Series approach – SUDAAN (stand-alone or callable within SAS) – STATA (svy commands) – SAS 8.2 (survey procedures) – SPSS (new complex survey features in 13.0)

Estimation Example: Average Total Expenditures, 2001 Weighted mean = $2,555 per capita Weighted mean = $2,555 per capita – Unweighted mean of $2,400 is biased SE based on Taylor Series = 55 SE based on Taylor Series = 55 – SAS V8.2: PROC SURVEYMEANS – SUDAAN: PROC DESCRIPT – Stata: svymean SE assuming SRS = 41 (too low) SE assuming SRS = 41 (too low) – SAS V8.2: PROC UNIVARIATE or MEANS

Computing Standard Errors for MEPS Estimates Document on MEPS website Document on MEPS website vey_comp/standard_errors.jsp vey_comp/standard_errors.jsp vey_comp/standard_errors.jsp vey_comp/standard_errors.jsp

Example (Point estimates and SEs): SAS V8.2 proc surveymeans data=work.h60 mean; stratum varstr01; cluster varpsu01; weight perwt01f; var totexp01; proc surveymeans data=work.h60 mean; stratum varstr01; cluster varpsu01; weight perwt01f; var totexp01;

Example (Point estimates and SEs): SUDAAN (SAS-callable) First need to sort file by varstr01 & varpsu01 First need to sort file by varstr01 & varpsu01 proc descript data=work.h60 filetype=SAS design=wr; nest varstr01 varpsu01; weight perwt01f; var totexp01; proc descript data=work.h60 filetype=SAS design=wr; nest varstr01 varpsu01; weight perwt01f; var totexp01;

Example (Point estimates and SEs): Stata svyset [pweight=perwt01f], strata(varstr01) psu (varpsu01) svymean(totexp01)

Analysis of Subpopulations Analyzing files that contain only a subset of MEPS sample may produce error messages or incorrect standard errors Analyzing files that contain only a subset of MEPS sample may produce error messages or incorrect standard errors Each software package has capability to produce subpopulation estimates from entire person-level file Each software package has capability to produce subpopulation estimates from entire person-level file See “Computing Standard Errors for MEPS Estimates” See “Computing Standard Errors for MEPS Estimates” – _comp/standard_errors.jsp _comp/standard_errors.jsp _comp/standard_errors.jsp

Assessing Precision/Reliability of Estimates Sample Sizes Sample Sizes Standard Errors/Confidence Intervals Standard Errors/Confidence Intervals Relative Standard Errors Relative Standard Errors – standard error of estimate  estimate

Example: Average total expenses per capita, 2001 Sample Size = 32,122 Sample Size = 32,122 Estimate = $2,555 Estimate = $2,555 Standard Error = 55 Standard Error = 55 95% Confidence Interval: (2447, 2663) 95% Confidence Interval: (2447, 2663) Relative Standard Error (RSE) or Coefficient of Variation (CV) = 55 ÷ 2555 =.021 = 2.1% Relative Standard Error (RSE) or Coefficient of Variation (CV) = 55 ÷ 2555 =.021 = 2.1%

Types of Basic Point Estimates: Examples Means Means – Annual per capita expenses in 2001 = $2,555 Proportions Proportions – Percent with some health expenses in 2001 = 85.4% – Two methods to generate estimates: percents obtained from frequency tables percents obtained from frequency tables means of dichotomous variable means of dichotomous variable Totals Totals – Total expenses in 2001 = $726.4 billion – Total number of persons (sum of weights) Differences between subgroups Differences between subgroups

Pooling Multiple Years of MEPS Data

Reasons for Pooling Reduce standard error of estimate(s) Reduce standard error of estimate(s) Stabilize trend analyzes Stabilize trend analyzes Enhance ability to analyze small subgroups Enhance ability to analyze small subgroups

Minimum Sample Sizes CFACT Standards CFACT Standards – Minimum unweighted sample of 100 – Flag estimates with RSE > 30% Confidence intervals become problematic with small samples and/or highly skewed data Confidence intervals become problematic with small samples and/or highly skewed data – Consider larger minimum sample sizes for highly skewed variables – Analysts may be comfortable with smaller minimums for less skewed variables – ASA Paper: Yu and Machlin (Skewness) ublications/workingpapers/wp_04002.pdf ublications/workingpapers/wp_04002.pdf

Example: Annual Sample Sizes (Unpooled) Year Total Population Children 0-5 Asian/PI Children* ,5712, ,6363, ,9532, ,5652,15693 * Sample sizes do not meet AHRQ minimum requirement (n=100) to publish estimates.

Pooled Sample Sizes YearsTotalSampleChildren0-5 Asian/PI Children ,2075, ,5184, ,7259,370311

Relative Standard Errors for Estimated Mean Expenditures: Asian/PI Children 0-5 Annual2 year4 year

Creating a Pooled File for Analysis ( ) Need to work with Pooled Estimation File (HC- 036) when 1+ years being pooled include any year from 1996 through 2001 Need to work with Pooled Estimation File (HC- 036) when 1+ years being pooled include any year from 1996 through 2001 – Stratum and PSU variables obtained from HC-036 for – Documentation for HC-036 provides instructions on how to properly create pooled analysis file Stratum and PSU variables properly standardized for pooling years from 2002 onward (i.e., do not need HC-036) Stratum and PSU variables properly standardized for pooling years from 2002 onward (i.e., do not need HC-036)

Creating Pooled Files: Summary of Important Steps Rename analytic and weight variables from different years to common names. Rename analytic and weight variables from different years to common names. – Expenditures: TOTEXP99 & TOTEXP00 = TOTEXP – Weights: PERWT99F & PERWT00F = POOLWT Divide weight variable by number of years pooled to produce estimates for “an average year” during the period. Divide weight variable by number of years pooled to produce estimates for “an average year” during the period. – Keep original weight value if estimating total for period Concatenate annual files Concatenate annual files Merge variance estimation variables from HC-036 onto file (only if 1+ years prior to 2002) Merge variance estimation variables from HC-036 onto file (only if 1+ years prior to 2002) – Strata variable: STRA9603 – PSU variable: PSU9603

Estimates from Pooled Files Produce estimates in analogous fashion as for individual years Produce estimates in analogous fashion as for individual years Estimates interpreted as “average annual” for pooled period Estimates interpreted as “average annual” for pooled period Example: Pooled data Example: Pooled data – The average annual total health care expenditures for Asian/Pacific Islander children under 6 years of age during the period from was $525 (SE=97).

Pooling Annual Data: Lack of Independence Across Years Legitimate to pool data for persons in consecutive years Legitimate to pool data for persons in consecutive years – Each yr. constitutes nationally representative sample – Pooling produces average annual estimates – Stratum & PSU variables sufficient to account for lack of independence between years Lack of independence actually begins with first stage of sample selection Lack of independence actually begins with first stage of sample selection – Same PSUs are used to select each MEPS panel See HC-036 documentation See HC-036 documentation

Longitudinal Analysis of MEPS Panels

MEPS Longitudinal Analysis: Panel 4: Panel 4: Round 2Round 3 Round 4Round /1/ Round 1 12/31/2000

MEPS Longitudinal Analysis National estimates of person-level changes over two-year period National estimates of person-level changes over two-year period – two-year period is relatively short Examine characteristics associated with changes Examine characteristics associated with changes – mainly round 1 data

Variables that may change between years or rounds Insurance coverage Insurance coverage – Monthly indicators (24 measures) – Annual summary (2 measures per person) Health status Health status – Each round (5 measures) Having a usual source of care Having a usual source of care – Rounds 2 & 4 (2 measures) Use and expenditures Use and expenditures – Annual (2 measures per person)

MEPS Longitudinal Weight Files Currently Available (Oct., 2006) MEPS Longitudinal Weight Files Currently Available (Oct., 2006) MEPS Panel Years Covered PUF Number HC HC HC HC HC HC HC-080

Creating Longitudinal Files (Panel 4) : Summary of Important Steps Select Panel 4 records from annual files Select Panel 4 records from annual files – 1999 (PUF HC-038) – 2000 (PUF HC-050) Obtain MEPS Longitudinal File (HC-058) Obtain MEPS Longitudinal File (HC-058) – Contains weight and variance estimation variables – Contains variable indicating whether complete data are available for 1 or both years of panel Link using DUPERSID Link using DUPERSID

Longitudinal Weight Longitudinal Weight Variable Name: LONGWTP# Variable Name: LONGWTP# Produces estimates for persons in civilian noninstitutionalized population in two consecutive years when applied to persons participating in both years of a given panel (YRINDP# = 1) Produces estimates for persons in civilian noninstitutionalized population in two consecutive years when applied to persons participating in both years of a given panel (YRINDP# = 1)

Examples: Longitudinal Estimates Examples: Longitudinal Estimates Of those without insurance at any time in 1999, estimated 76.9% (SE=1.6) also uninsured throughout 2000 Of those without insurance at any time in 1999, estimated 76.9% (SE=1.6) also uninsured throughout 2000 Estimated 8.2% (SE=0.4) of the population had no insurance throughout Estimated 8.2% (SE=0.4) of the population had no insurance throughout Of those with no expenses in 1999, estimated 47.6% (SE=1.3) had some expenses in 2000 Of those with no expenses in 1999, estimated 47.6% (SE=1.3) had some expenses in 2000 Of top 5% of spenders in 1996, 30% retain this position in Of top 5% of spenders in 1996, 30% retain this position in 1997.

Family-Level Estimation

Need to roll up persons to families Need to roll up persons to families – MEPS vs. CPS definitions – Any time during year or December 31 – Instructions in person file documentation Avg. number of persons per family = 2.4 Avg. number of persons per family = 2.4 Use appropriate family weight variable Use appropriate family weight variable – Family weight = 0 if full-year data not obtained for all in-scope family members (about 2% of cases in 2002)

MEPS Annual Files: Annualized Family Sample Sizes File Number HC- 020 HC- 028 HC- 038 HC- 050 HC- 060 Families(unwtd)13,0879,0239,3459,51512,852 Weighted million million million million million Family Weight Variable Name WTFAMF97WTFAMF98FAMWT99FFAMWT00FFAMWT01F

MEPS Annual Files: Annualized Family Sample Sizes (con.) File Number HC-070HC-079HC-089 Families(unwtd)14,82812,86013,018 Weighted million million million Family Weight Variable Name FAMWT02FFAMWT03FFAMWT04F

Family-Level Example 2001 average total expenses per family 2001 average total expenses per family Estimates based on families in scope at any time during year Estimates based on families in scope at any time during year Family size EstimateSE All$6, $4, $7, $6, $6, $7,518389

Other Miscellaneous Estimation Issues

Medical Event as Unit of Analysis Can use event files to estimate average expense per event Can use event files to estimate average expense per event Examples: In 2001, Examples: In 2001, – mean facility expense per inpatient stay was $6,629 (SE=263). – mean expense per office visit to a medical provider was $114 (SE=2)

Special Supplements Self Administered Questionnaire (SAQ) Self Administered Questionnaire (SAQ) – Use SAQ weight Parent Administered Questionnaire (PAQ) Parent Administered Questionnaire (PAQ) – 2000 only – Use PAQ weight Diabetes Care Survey (DCS) Diabetes Care Survey (DCS) – Use DCS weight Variables on person-level files Variables on person-level files – Consult documentation for appropriate weight