An Introduction to the SIPP Synthetic Beta Holly Monti and Lori Reeder University of Michigan August 8,

Slides:



Advertisements
Similar presentations
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Advertisements

1 The Synthetic Longitudinal Business Database Based on presentations by Kinney/Reiter/Jarmin/Miranda/Reznek 2 /Abowd on July 31, 2009 at the Census-NSF-IRS.
INFO 7470/ECON 7400 Synthetic Data Creation and Use John M. Abowd and Lars Vilhuber with a big assist from Abigail Cooke, Javier Miranda, Martha Stinson,
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Estimating Pensions in MINT Prepared for the Conference on Methodologies for Measuring Pension Wealth Federal Interagency Forum on Aging-Related Statistics.
Social Security Essentials for OPERS Employees. Earning Credits 40 Credits for retirement 40 Credits for retirement Maximum 4 credits in 1 year Maximum.
A Public Service Presentation provided by the Society of Certified Senior Advisors.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Social Security Unemployment.  What’s included: ◦ old-age, survivors, and disability insurance (OASDI) benefits  They do not include supplemental security.
Recent Advances In Confidentiality Protection – Synthetic Data John M. Abowd April 2007.
The United States Social Security System “Nuts and Bolts” October 11, 2006.
INFO 7470/ILRLE 7400 Survey of Income and Program Participation (SIPP) Synthetic Beta File John M. Abowd and Lars Vilhuber April 26, 2011.
The United States Social Security System “Nuts and Bolts” October 2, 2007.
Matthew S. Rutledge Research Economist Center for Retirement Research at Boston College 17th Annual Joint Meeting of the Retirement Research Consortium.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
1 Reengineering the SIPP: An Assessment of the Use of Administrative Records Jim Farber and Sally Obenski US Census Bureau CNSTAT Panel January 26, 2007.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Social Security Daniel Bowman Public Affairs Specialist.
All EN Payments Call April 28, Agenda Common Payment Denials When to use an “Employer Prepared Earnings Statement” vs an “EN Supplemental Earnings.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Presented by: Insert Name Here. AGENDA Social Security Basics Claiming Options SSI Maximization Strategies Real-Life Case Scenarios Maximizing Your SS.
SSA and Child Support -- What’s the Connection Ellery Brown Ann Ziff Dee Price-Sanders 2.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Social.Security Contribution to SS is in the form of the Federal Insurance Contributions Act (FICA) Taxes are withheld from most paychecks. Medicare is.
INFO 7470 Statistical Tools: Edit and Imputation Examples of Multiple Imputation John M. Abowd and Lars Vilhuber April 18, 2016.
Chapter 25 – Configuration Management 1Chapter 25 Configuration management.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
The Social Security Statement and Timing of Retirement Benefit Receipt Barbara A. Smith, Social Security Administration Kenneth A. Couch, University of.
Experience the Gold Standard
11.
Social Security: When is the right time to start your benefits?
Best Practices for Handling Missing Data
2011 Annual Disability Statistics Compendium
Employer Reporting June 2015.
When is it hard to make ends meet
Social Security
Withholding Taxes.
Stan Fromuth & Joe Olenski
SOCIAL SECURITY Your Life-long Partner.
Problem 9-3, Page 473 Key Control, Control Test Evaluation
Social Security: With You Through Life’s Journey…
FEASIBILITY STUDY Feasibility study is a means to check whether the proposed system is correct or not. The results of this study arte used to make decision.
2017 Annual Disability Statistics Compendium
Unemployment compensation and social security
Withholding Taxes.
Multiple Imputation Using Stata
Design Changes to the SOI Public Use File (PUF)
2018 NM Community Survey Data Entry Training
CDR Felix Vaks Actuary, Office of the Chief Actuary
IPUMS CPS Summer Data Workshop June 4, 2018 Kari Williams
Work and Retirement.
Unit 3 Accounting for a Payroll System
MoneyCounts: A Financial Literacy Series
Presenter: Ting-Ting Chung July 11, 2017
The European Statistical Training Programme (ESTP)
Compliments of Madison Park Capital Advisors & Kurt Czarnowski
Social Security and Retirement Planning: A Hit or Myth Proposition
Non response and missing data in longitudinal surveys
Individual Retirement Accounts
Treatment of Missing Data Pres. 8
Clinical prediction models
Transition to Retirement Age
Chapter 13: Item nonresponse
STEPS Site Report.
Social Security: With You Through Life’s Journey…
Jerome Reiter Department of Statistical Science Duke University
Presentation transcript:

An Introduction to the SIPP Synthetic Beta Holly Monti and Lori Reeder University of Michigan August 8,

Outline  Background  Multiple Imputation Approach  How to use the SIPP Synthetic Beta (SSB)  SSB Research  Future Plans  SSB data sources (if time permits) 2

SSB Background  February 2001: Federal Regulation published authorizing sharing of data items from W-2 tax forms for the purpose of improving survey products  More demand to access survey data combined with administrative records  Meyer, Mok, & Sullivan 2015, Household Surveys in Crisis  Card, Chetty, Feldstein, & Saez 2012, Expanding Access to Administrative Data for Research in the United States  Can we make such a product easier to access? 3

SSB Background  Long histories of earnings and benefits data from IRS and SSA for all SIPP respondents with approval to link  Several SIPP panels: currently 1984, 1990, 1991, 1992, 1993, 1996, 2001, 2004, and 2008  Stack SIPP panels and link to administrative records. Named this the SIPP Gold Standard File (GSF) 4

Data Sources: Census Bureau  SIPP panels  1984, 1990, 1991, 1992, 1993, 1996, 2001, 2004, 2008  Crosswalks  Links SIPP identifiers to PIK (privatized SSN) by panel  Not all individuals link  Match/PIK rate by SIPP panel:  1984 =75%  1990s = 78%-84%  2001 = 47%  2004 = 72%  2008 = 82%  2014 = 89% 5

Data Sources: Administrative Records  Summary Earnings Record (SER)  Annual total FICA covered earnings  Annual total covered quarters (i.e. credits needs to qualify for benefits)  Detailed Earnings Record Extract (DER)  Uncapped earnings from the W-2, Form 1040 schedule C and SE  SSB variables: sum total earnings across all jobs  Annual total FICA wages from the DER  Annual total non-FICA wages from the DER  Annual total deferred FICA wages from the DER  Annual total deferred non-FICA wages from the DER 6

Data Sources: Administrative Records  Master Beneficiary Record (MBR)  Old Age, Survivor, Disability Insurance (OASDI) program  MBR variables on the SSB  Retirement, widow/spouse, and aged spouse benefits  Whether a benefit was received  Start date of benefit; Total monthly amount of benefit  Payment History Update System (PHUS)  Records actual payments made to beneficiaries  SSB variables:  Dummy variable for receipt  Benefit start date  Total amount of benefit 7

Data Sources: Administrative Records  Social Security Disability Insurance (SSDI)  From MBR file  Date of disability onset  Disability adjudication date  Date of entitlement to disability  Date of disability benefits cessation  Total benefit amount  Disability diagnosis code  Up to 4 disability applications: always kept first ever, last ever, first in SIPP panel, last in SIPP panel; if more applications than these, delete older rejections, then older accepts. 8

Data Sources: Administrative Records  Supplemental Security Record (SSR):  Supplemental Security Income (SSI) program  Slightly less complicated file with only variables  SSB variables:  Dummies for applied, received, or ceased receiving benefits  Application date  Amount received  Type of benefit  First/last payments  Diagnosis type 9

Data Sources Visualization 10

How to Make This Product Accessible?  Limited number of internal users, and can be difficult for external users to get RDC access especially for projects with IRS data.  Decision to pursue a new SIPP public use file that contains the administrative data linked to the survey data  Challenge: Confidentiality  Solution: Data Synthesis 11

Data Synthesis  Underlying data presents two problems  Missing Values  Disclosure Risk  We attempt to solve both problems with multiple imputation  Estimate joint distribution of all variables  Take multiple draws from estimated distribution to fill in missing values resulting in multiple, completed data sets (completed implicates)  Take multiple draws from estimated distribution to replace sensitive values resulting in multiple, synthetic data sets (synthetic implicates) 12

Why Multiple Imputation?  Easy for analyst to use  On any given implicate, analyst can use any estimation technique that one would use on a dataset with no missing data  Data user only needs to know how to generate the estimand of interest and its measure of uncertainty on a single dataset, and then plugs the multiple estimates into a simple formula (which we provide)  Variance estimation in multiple imputation setting works with almost any conceivable analysis (is not tailored to a specific estimand)  Provides a very natural way of uncovering true variance 13

Multiple Imputation Visual 14 Start with the GSF Could handle missing data by simply dropping records with missing values: list-wise deletion Potentially skewing sample and throwing away a lot of non- missing data

Multiple Imputation Visual 15 Single Imputation of missing data If user treats imputations the same as non-missing data – too confident in results: variance biased down No obvious way to correct variance

Multiple Imputation Visual 16 Multiple imputations Only difference in estimates between implicates comes from different imputations; thus revealing the uncertainty of the imputation This variation allows user to correctly adjust variance and still use all non-missing data

Multiple Imputation Visual 17 Synthesize each completed implicate once Two sources of uncertainty in variation between synthetic implicates: missing data imputation and synthesis Can not distinguish and estimates will overstate the variance (Reiter)

Multiple Imputation Visual 18 Multiple Synthetic implicates for each completed implicate Can measure uncertainty from synthesis AND uncertainty from missing data imputation Adjust variance without overstating uncertainty from missing data imputation

Release History  Version 4: Spring , 1991, 1992, 1993, and 1996 panels People age 15+ Gender, spouse link, OASDI benefit not synthesized  Version 5: Fall 2010 Add 2001, 2004 panels  Version 5.1: May 2013 Corrections for error in underlying earnings data Add SIPP arrays, state Synthesize OASDI benefit  Version 6.0: March 2015 Added 1984, 2008 panels All individuals regardless of age included Added disability application history variables and SSI history Base weight added 19

Estimating the joint distribution 20 Var1Var2Var3Var4 X XX XXX XXXX

Missing Data Pattern Usually Not Monotone Var1Var2Var3Var4 XXX XX XX XX XXX XX X XX XXX 21

Sequential Regression Multivariate Imputation (SRMI) 22

SRMI Visual for a Single Record Iteration #Var1Var2Var3Var4 1XX 2XX 3XX 23 Iteration #Var1Var2Var3Var4 1XXX 2XX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XXX Iteration #Var1Var2Var3Var4 1XXXX 2XXXX 3XXXX

Synthesize Data 24

Start with a completed implicate Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX 25 Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Var1Var2Var3Var4 XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX Estimate p(x1) & impute Estimate p(x2|x1) & impute Estimate p(x3|x1,x2) & impute Estimate p(x4|x1,…,x3) & impute

Using the SSB: Getting Started  Go to: synthetic-beta-data-product.html  Codebook  Technical Documentation  Application  Submit application  Synthetic Data Server  We support code written in SAS and STATA  Other software is available, and may be used with understanding of less support 26

Using the SSB: Requesting Validation on Internal Data  Census will run your programs on the confidential data and release these results to you  Summary of steps  Write and test code on synthetic data  Prepare memo describing programs and output  memo to Census SSB staff and request validation and disclosure review  Can take as little as 2-3 weeks  Only works as well as your code 27

Topics that can be studied  Wage Inequality  Retirement patterns  Disability applications  Immigrant outcomes  Education outcomes  Fertility history questions  Spouse behavior 28

SSB Users  125 of SSB users since 2007; 40 new users in the last year  Several dissertations  Michael Carr and Emily Wiemers received a Russsel Sage Foundation grant for work with SSB on earnings instability and variability  2016 LERA/ASSA Session Title: Data Gold! Exploiting the rich research potential of lifetime earnings and Social Security benefits data from the U.S. Census Bureau’s SIPP Synthetic Beta and Gold Standard files 29

SSB 2016 LERA/ASSA Session  Rutledge, Wu, & Vitagliano  Studies whether catch-up provision tax incentives effectively increased earnings deferrals.  Shore-Sheppard  Studies patterns of earnings prior to birth of 1 st child and compares these patterns between varying levels of educational attainment of parents.  Carr & Wiemers  Study trends in earnings volatility by educational attainment and gender over the last thirty-five years.  Wicks-Lim  Studies prevalence and duration of low-wage careers across a changing labor market. 30

Bertrand, Kamenica, & Pan (QJE 2015) 31

Future Approach  Issues  Hard to add new variables to SSB  SSB Users have no choice over missing data approach  Solution: Synthesize incomplete data directly  Missing data pattern is synthesized  User can choose how to handle missing data  Faster to add new variables to synthetic file  Fewer implicates in total (r synthetic implicates, instead of r synthetic implicates for each of m completed implicates) 32

SSB Workshop Project  Mincer Earnings Regression  Select Sample of working age males from 1996 panel for whom we are likely to have observed full education  Table of descriptive statistics on sample  Simple Mincer Earnings Regression  Add year fixed effects 33

SSB Workshop Project  Start with one implicate  Once working properly, loop over all 16 implicates storing results for each  Combine results to get proper point and variance estimates for estimands of interest  Test combination code for completed data on subset of 4 synthetic implicates 34