Download presentation
Presentation is loading. Please wait.
Published byErik Ellis Modified over 8 years ago
1
An Introduction to the SIPP Synthetic Beta Holly Monti and Lori Reeder University of Michigan August 8, 2016 1
2
Outline Background Multiple Imputation Approach How to use the SIPP Synthetic Beta (SSB) SSB Research Future Plans SSB data sources (if time permits) 2
3
SSB Background February 2001: Federal Regulation published authorizing sharing of data items from W-2 tax forms for the purpose of improving survey products More demand to access survey data combined with administrative records Meyer, Mok, & Sullivan 2015, Household Surveys in Crisis Card, Chetty, Feldstein, & Saez 2012, Expanding Access to Administrative Data for Research in the United States Can we make such a product easier to access? 3
4
SSB Background Long histories of earnings and benefits data from IRS and SSA for all SIPP respondents with approval to link Several SIPP panels: currently 1984, 1990, 1991, 1992, 1993, 1996, 2001, 2004, and 2008 Stack SIPP panels and link to administrative records. Named this the SIPP Gold Standard File (GSF) 4
5
Data Sources: Census Bureau SIPP panels 1984, 1990, 1991, 1992, 1993, 1996, 2001, 2004, 2008 Crosswalks Links SIPP identifiers to PIK (privatized SSN) by panel Not all individuals link Match/PIK rate by SIPP panel: 1984 =75% 1990s = 78%-84% 2001 = 47% 2004 = 72% 2008 = 82% 2014 = 89% 5
6
Data Sources: Administrative Records Summary Earnings Record (SER) 1951-2011 Annual total FICA covered earnings Annual total covered quarters (i.e. credits needs to qualify for benefits) Detailed Earnings Record Extract (DER) 1978-2011 Uncapped earnings from the W-2, Form 1040 schedule C and SE SSB variables: sum total earnings across all jobs Annual total FICA wages from the DER Annual total non-FICA wages from the DER Annual total deferred FICA wages from the DER Annual total deferred non-FICA wages from the DER 6
7
Data Sources: Administrative Records Master Beneficiary Record (MBR) 1962-2012 Old Age, Survivor, Disability Insurance (OASDI) program MBR variables on the SSB Retirement, widow/spouse, and aged spouse benefits Whether a benefit was received Start date of benefit; Total monthly amount of benefit Payment History Update System (PHUS) 1984-2012 Records actual payments made to beneficiaries SSB variables: Dummy variable for receipt Benefit start date Total amount of benefit 7
8
Data Sources: Administrative Records Social Security Disability Insurance (SSDI) From MBR file Date of disability onset Disability adjudication date Date of entitlement to disability Date of disability benefits cessation Total benefit amount Disability diagnosis code Up to 4 disability applications: always kept first ever, last ever, first in SIPP panel, last in SIPP panel; if more applications than these, delete older rejections, then older accepts. 8
9
Data Sources: Administrative Records Supplemental Security Record (SSR): 1974-2012 Supplemental Security Income (SSI) program Slightly less complicated file with only 6500+ variables SSB variables: Dummies for applied, received, or ceased receiving benefits Application date Amount received Type of benefit First/last payments Diagnosis type 9
10
Data Sources Visualization 10
11
How to Make This Product Accessible? Limited number of internal users, and can be difficult for external users to get RDC access especially for projects with IRS data. Decision to pursue a new SIPP public use file that contains the administrative data linked to the survey data Challenge: Confidentiality Solution: Data Synthesis 11
12
Data Synthesis Underlying data presents two problems Missing Values Disclosure Risk We attempt to solve both problems with multiple imputation Estimate joint distribution of all variables Take multiple draws from estimated distribution to fill in missing values resulting in multiple, completed data sets (completed implicates) Take multiple draws from estimated distribution to replace sensitive values resulting in multiple, synthetic data sets (synthetic implicates) 12
13
Why Multiple Imputation? Easy for analyst to use On any given implicate, analyst can use any estimation technique that one would use on a dataset with no missing data Data user only needs to know how to generate the estimand of interest and its measure of uncertainty on a single dataset, and then plugs the multiple estimates into a simple formula (which we provide) Variance estimation in multiple imputation setting works with almost any conceivable analysis (is not tailored to a specific estimand) Provides a very natural way of uncovering true variance 13
14
Multiple Imputation Visual 14 Start with the GSF Could handle missing data by simply dropping records with missing values: list-wise deletion Potentially skewing sample and throwing away a lot of non- missing data
15
Multiple Imputation Visual 15 Single Imputation of missing data If user treats imputations the same as non-missing data – too confident in results: variance biased down No obvious way to correct variance
16
Multiple Imputation Visual 16 Multiple imputations Only difference in estimates between implicates comes from different imputations; thus revealing the uncertainty of the imputation This variation allows user to correctly adjust variance and still use all non-missing data
17
Multiple Imputation Visual 17 Synthesize each completed implicate once Two sources of uncertainty in variation between synthetic implicates: missing data imputation and synthesis Can not distinguish and estimates will overstate the variance (Reiter)
18
Multiple Imputation Visual 18 Multiple Synthetic implicates for each completed implicate Can measure uncertainty from synthesis AND uncertainty from missing data imputation Adjust variance without overstating uncertainty from missing data imputation
19
Release History Version 4: Spring 2007 1990, 1991, 1992, 1993, and 1996 panels People age 15+ Gender, spouse link, OASDI benefit not synthesized Version 5: Fall 2010 Add 2001, 2004 panels Version 5.1: May 2013 Corrections for error in underlying earnings data Add SIPP arrays, state Synthesize OASDI benefit Version 6.0: March 2015 Added 1984, 2008 panels All individuals regardless of age included Added disability application history variables and SSI history Base weight added 19
20
Estimating the joint distribution 20 Var1Var2Var3Var4 X XX XXX XXXX
21
Missing Data Pattern Usually Not Monotone Var1Var2Var3Var4 XXX XX XX XX XXX XX X XX XXX 21
22
Sequential Regression Multivariate Imputation (SRMI) 22
23
SRMI Visual for a Single Record Iteration #Var1Var2Var3Var4 1XX 2XX 3XX 23 Iteration #Var1Var2Var3Var4 1XXX 2XX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XXX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XXXX
24
Synthesize Data 24
25
Start with a completed implicate Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX 25 Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Estimate p(x1) & impute Estimate p(x2|x1) & impute Estimate p(x3|x1,x2) & impute Estimate p(x4|x1,…,x3) & impute
26
Using the SSB: Getting Started Go to: http://www.census.gov/programs-surveys/sipp/guidance/sipp- synthetic-beta-data-product.html Codebook Technical Documentation Application Submit application Synthetic Data Server We support code written in SAS and STATA Other software is available, and may be used with understanding of less support 26
27
Using the SSB: Requesting Validation on Internal Data Census will run your programs on the confidential data and release these results to you Summary of steps Write and test code on synthetic data Prepare memo describing programs and output Email memo to Census SSB staff and request validation and disclosure review Can take as little as 2-3 weeks Only works as well as your code 27
28
Topics that can be studied Wage Inequality Retirement patterns Disability applications Immigrant outcomes Education outcomes Fertility history questions Spouse behavior 28
29
SSB Users 125 of SSB users since 2007; 40 new users in the last year Several dissertations Michael Carr and Emily Wiemers received a Russsel Sage Foundation grant for work with SSB on earnings instability and variability 2016 LERA/ASSA Session Title: Data Gold! Exploiting the rich research potential of lifetime earnings and Social Security benefits data from the U.S. Census Bureau’s SIPP Synthetic Beta and Gold Standard files 29
30
SSB 2016 LERA/ASSA Session Rutledge, Wu, & Vitagliano Studies whether catch-up provision tax incentives effectively increased earnings deferrals. Shore-Sheppard Studies patterns of earnings prior to birth of 1 st child and compares these patterns between varying levels of educational attainment of parents. Carr & Wiemers Study trends in earnings volatility by educational attainment and gender over the last thirty-five years. Wicks-Lim Studies prevalence and duration of low-wage careers across a changing labor market. 30
31
Bertrand, Kamenica, & Pan (QJE 2015) 31
32
Future Approach Issues Hard to add new variables to SSB SSB Users have no choice over missing data approach Solution: Synthesize incomplete data directly Missing data pattern is synthesized User can choose how to handle missing data Faster to add new variables to synthetic file Fewer implicates in total (r synthetic implicates, instead of r synthetic implicates for each of m completed implicates) 32
33
SSB Workshop Project Mincer Earnings Regression Select Sample of working age males from 1996 panel for whom we are likely to have observed full education Table of descriptive statistics on sample Simple Mincer Earnings Regression Add year fixed effects 33
34
SSB Workshop Project Start with one implicate Once working properly, loop over all 16 implicates storing results for each Combine results to get proper point and variance estimates for estimands of interest Test combination code for completed data on subset of 4 synthetic implicates 34
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.