Tobacco Use Supplement to the Current Population Survey User’s Workshop June 9, 2009 Tips and Tricks Analyzing TUS-CPS Data Lloyd Hicks Westat.

Slides:



Advertisements
Similar presentations
1 State & County Characteristics: Overview The basics State –The general method –July 1, 2000 beginning population –Domestic migration IRS pre-processing.
Advertisements

Tobacco Use Supplement To The Current Population Survey Users’ Workshop June 2009 Tips and Tricks of Handling the TUS Data James “Todd” Gibson Information.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
National Center for Health Statistics DCC CENTERS FOR DISEASE CONTROL AND PREVENTION Changes in Race Differentials: The Impact of the New OMB Standards.
Parameter Estimation Chapter 8 Homework: 1-7, 9, 10 Focus: when  is known (use z table)
U.S. Department of Health and Human Services Centers for Disease Control and Prevention National Center for Health Statistics Division of Vital Statistics.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Statistics. Overview 1. Confidence interval for the mean 2. Comparing means of 2 sampled populations (or treatments): t-test 3. Determining the strength.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
How to calculate Confidence Intervals and Weighting Factors
GS/PPAL Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Objectives of Multiple Regression
Effects of Income Imputation on Traditional Poverty Estimates The views expressed here are the authors and do not represent the official positions.
Hypothesis Testing in Linear Regression Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Confidence Intervals and Two Proportions Presentation 9.4.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Using the ACS: Issues with studying small areas and change over time Presented to Association of Public Data Users January 20, 2011.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 10. Hypothesis Testing II: Single-Sample Hypothesis Tests: Establishing the Representativeness.
American Community Survey Getting the Most Out of ACS Jane Traynham Maryland State Data Center.
Bridged-Race Population Estimates for Chicago 6 June, 2004 NAPHSIS – Portland, OR Mark Flotow Illinois Center for Health Statistics.
Estimating the Value of a Parameter Using Confidence Intervals
American Community Survey (ACS) 1 Oregon State Data Center Meeting Portland State University April 14,
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
CLASS 5 Normal Distribution Hypothesis Testing Test between means of related groups Test between means of different groups Analysis of Variance (ANOVA)
Confidence intervals and hypothesis testing Petter Mostad
Assessing SES differences in life expectancy: Issues in using longitudinal data Elsie Pamuk, Kim Lochner, Nat Schenker, Van Parsons, Ellen Kramarow National.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Reasoning in Psychology Using Statistics
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Conducting post-hoc tests of compound coefficients using simple slopes for a categorical.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Statistical planning and Sample size determination.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.
Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.
Analytical Example Using NHIS Data Files John R. Pleis.
U.S. Department of Commerce Economics and Statistics Administration U.S. CENSUS BUREAU Income, Poverty, and Health Insurance Coverage: 2008 September 2009.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
1 Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International.
Statistical Weights and Methods for Analyzing HINTS Data HINTS Data Users Conference January 21, 2005 William W. Davis, Ph.D. Richard P. Moser, Ph.D. National.
1 SPSS MACROS FOR COMPUTING STANDARD ERRORS WITH PLAUSIBLE VALUES.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Quality of Race and Hispanic Origin Reporting on Death Certificates in the US Elizabeth Arias, Ph.D. Mortality Statistics Branch Division of Vital Statistics.
Sample Design of the National Health Interview Survey (NHIS) Linda Tompkins Data Users Conference July 12, 2006 Centers for Disease Control and Prevention.
Presented by: Khaleel S. Hussaini PhD Bureau Chief, Public Health Statistics Division of Public Health Preparedness Judy Bass Arizona’s BRFSS Coordinator.
Issues in the Classification of Race and Ethnicity Data Centers for Disease Control and Prevention National Center for Health Statistics.
Review Statistical inference and test of significance.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Peter Linde, Interviewservice Statistics Denmark
LAMAS Working Group 7-8 December 2015
Statistical Inference about Regression
Narrative Reviews Limitations: Subjectivity inherent:
15.1 The Role of Statistics in the Research Process
STEPS Site Report.
Presentation transcript:

Tobacco Use Supplement to the Current Population Survey User’s Workshop June 9, 2009 Tips and Tricks Analyzing TUS-CPS Data Lloyd Hicks Westat

Talk Outline 1.Uses of Standard Errors in Analyzing Data 2.Methods to Compute Standard Errors for TUS- CPS estimates – Generalized variance functions (SE parameters) – BRR replication – Fay’s method (replicate weights) 3.Special Topics for Analysts – Change in Race/Ethnicity Questions – Overlap Sample – Replicate Weights when Data Sets Merged

Uses of Standard Errors Constructing confidence intervals - reflects the accuracy of survey estimates Hypothesis testing - compare estimates between subgroups (within same year) - compare estimates across time

Uses of SEs: Confidence Intervals Formula: X= estimate SE(X) = standard error z = confidence interval coefficient (e.g for 90% CI) Example: 90% CI for males 18+ smokers (20%) 20% ± × 0.15%= 20% ± 0.25% = [19.75%, 20.25%]

Uses of SEs: Hypothesis Testing Formula: statistical significance X is the estimate for the 1 st group Y is the estimate for the 2 nd group SE(X – Y) is the standard error of difference z = critical value threshold

Hypothesis Testing: Example 1 Note: difference is statistically significant (since 4.71 is greater than z where z =1.645 at 90% confidence level)

Hypothesis Testing – Example 2 Note: difference is not statistically significant (since 1.56 is less than z where z =1.645 at 90% confidence level)

Methods of Estimating Standard Errors for TUS-CPS 1. Generalized variance functions (GVF) – Fast, easy but only approximate – More practical for large number of survey items – Requires a and b parameters from source and accuracy statements – Standard errors formulas for means, totals, percentages and their differences – Standard errors for complex estimates not possible (e.g. regression)

GVF Example Standard error for a percentage p is the estimate of the percentage x is the estimate of the base of the percentage b is the b parameter obtained from S&A statement

GVF Example P = percentage of male smokers 18+ = 20.7% X = 101,244,000 b parameter = 1,575 (from S&A table) Note: Data from 2003 TUS-CPS

Methods of Estimating Standard Errors for TUS-CPS 2.Balanced repeated replication (BRR) based on replication weights – Replicate weights not on TUS-CPS public use file (available from NCI on request) – Requires special software (Sudaan, WesVar, etc.) – Provides a more accurate standard error than GVF – Standard errors for medians and other quantiles can be problematic

SE Formula for CPS-TUS Using BRR (Fay’s Method) X(r) = replicate estimate X(0) = full sample estimate R = number of replicates 48 for 1992 – 1993 files (1980 decennial based samples) 80 for 1995 – 2003 files (1990 decennial based samples) 160 for 2006 – 2007 files (2000 decennial based samples) 4 = Fay Adjustment Factor (required in Sudaan)

Special Topics for Analysts 1. Changes in Race/Ethnicity Data /2003 Overlap Sample 3. Merging Data Sets

Special Topics 1: Changes to CPS Race/ethnicity data starting in 2003  Respondents can now select more than one race when answering the survey.  Asian or Pacific Islander (API) category split: 1. Asian 2. Native Hawaiian or Other Pacific Islander (NHOPI)  The ethnicity question asked directly whether the respondent was Hispanic  Ordering of race and ethnicity reversed

Implication of Race/ethnicity Change On TUS-CPS data 1.No effect on estimates and trends for entire nation 2. Potential impact on estimates and trends by race/ethnicity

Issues when Analyzing TUS-CPS Data By Race/ethnicity 1.Can’t use race data for post-2003 data in same manner as pre-2003 Use single race = “only” category Use “any mention” category Neither group same as pre-2003 group 2. Analyzing Trends for single race groups spanning pre-2003 and post-2003 NCI developed “race bridge” approach to construct single-race estimates for post-2003 data

TUS-CPS Race bridging approach NCI developed model to predict pre-2003 race/ethnicities given post-2003 value (using May 2002 CPS data supplied by Census) Paper summarizing the approach on website ( bridging pdf). Paper summarizing application of approach on TUS-CPS data on website (

Special Topic 2: 2002/2003 Overlap Sample (for Limited Longitudinal Analysis) Persons in overlap sample (respondents in both) – TUS-CPS in Feb – TUSCS-CPS in Feb – Approximately 22,000 in overlap sample Responses from both studies can be analyzed as a longitudinal study New weights were developed for overlap sample

Development of Overlap Sample Weights New weights for the overlap sample developed from 2003 weights New weights were derived to reflect 2003 population for gender, race/ethnicity, age, and geography Overlap sample weight w* = r * w Overlap weights = (adjustment factor) * (2003 weights) Full sample and replicate weights using same approach

Overlap Sample Weights: Derivation of Adjustment factor Choose adjustment factor so that sums of overlap sample weights match sums of 2003 sample weights in groups defined by – Census region (4) – Gender (2) – Race/ethnicity (4) – Age categories (19) Details in cps/TUS-CPS_overlap.pdf

Special Topic 3: Replicate Weights for Merged Data Within Same Sample Design (Correlated) Blend replicates (no new replicate weights needed) Still Use Fay Factor of 4 Across Sample Design (Uncorrelated) Stack replicates (add replicate weights together) Ex = 240 Adjust replicate weights to account for stacking Change Fay Factor from 4 to 16

Talk Recap 1.Uses of Standard Errors in Analyzing Data 2.Methods to Compute Standard Errors for TUS- CPS estimates – Generalized variance functions (SE parameters) – BRR replication – Fay’s method (replicate weights) 3.Special Topics for Analysts – Change in Race/Ethnicity Questions – Overlap Sample – Merged Data Sets