INFO 7470 Statistical Tools: Edit and Imputation Examples of Multiple Imputation John M. Abowd and Lars Vilhuber April 18, 2016.

Slides:



Advertisements
Similar presentations
1 Local Employment Dynamics Powerful Analytic Tools: First Look at OnTheMap Version 3 (Beta) Colleen D. Flannery New Jersey State Data Center June 11,
Advertisements

Local Employment Dynamics Data: Advanced Topics C2ER Training Workshop June 4, 2012 Stephen Tibbets Erika McEntarfer LEHD Program US Census Bureau.
Local Employment Dynamics Training October 2014 Earlene Dowell Longitudinal Employer-Household Dynamics U.S. Census Bureau 1.
1 The Synthetic Longitudinal Business Database Based on presentations by Kinney/Reiter/Jarmin/Miranda/Reznek 2 /Abowd on July 31, 2009 at the Census-NSF-IRS.
INFO 7470/ECON 7400 Synthetic Data Creation and Use John M. Abowd and Lars Vilhuber with a big assist from Abigail Cooke, Javier Miranda, Martha Stinson,
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Presented to: Presented by: Transportation leadership you can trust. LEHD OnTheMap Data Planning Applications Conference, Session 2 Bruce Spear, Cambridge.
John M. Abowd Cornell University and Census Bureau
What are Wage Records? Wage records are an administrative database used to calculate Unemployment Insurance benefits for employees who have been laid-off.
INFO 7470/ILRLE 7400 Geographic Information Systems John M. Abowd and Lars Vilhuber March 29, 2011.
LEHD Longitudinal Employer Household Dynamics U.S. Census Bureau and U.S. Dept of Labor/ETA program
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2007.
© John M. Abowd 2005, all rights reserved Household Samples John M. Abowd March 2005.
© John M. Abowd 2005, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2005.
INFO 4470/ILRLE 4470 Social and Economic Data Populations and Frames John M. Abowd and Lars Vilhuber February 7, 2011.
MCCORMICK SRI: GOING DEEP WITH CENSUS DEMOGRAPHIC AND ECONOMIC DATA EMPLOYMENT AND UNEMPLOYMENT ESTIMATES FROM THE U.S. DEPARTMENT OF LABOR, BUREAU OF.
Presented to: Presented by: Transportation leadership you can trust. LEHD OnTheMap Data 2011 GIS in Public Transportation Tampa, FL Bruce Spear September.
The American Community Survey (ACS) Lisa Neidert NPC Workshop: Analyzing Poverty and Socioeconomic Trends Using the American Community Survey June 22 –
© John M. Abowd 2005, all rights reserved Recent Advances In Confidentiality Protection John M. Abowd April 2005.
© John M. Abowd 2005, all rights reserved Sampling Frame Maintenance John M. Abowd February 2005.
© John M. Abowd 2005, all rights reserved Statistical Programs of the Federal Government John M. Abowd February 2005.
Recent Advances In Confidentiality Protection – Synthetic Data John M. Abowd April 2007.
Census Bureau Employment Data ACS, EC, and LED… And why you should use the data from one program vs. another… SDC/CIC Annual Training Conference Wednesday,
INFO 4470/ILRLE 4470 Register-based statistics by example: County Business Patterns John M. Abowd and Lars Vilhuber February 14, 2011.
INFO 7470/ILRLE 7400 Universes, Populations, Frames, and Sampling John M. Abowd and Lars Vilhuber February 1, 2011.
State Data Center Annual Affiliate Meeting New York State Department of Labor Earlene Dowell LEHD Program Center for Economic Studies U.S. Census Bureau.
“OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and Jeremy Wu May 7, 2007 Daytona Beach, FL.
INFO 7470/ILRLE 7400 Survey of Income and Program Participation (SIPP) Synthetic Beta File John M. Abowd and Lars Vilhuber April 26, 2011.
Improvements in the BLS Business Register Richard Clayton David Talan 12th Meeting of the Group of Experts on Business Registers Paris, France September.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
INFO 7470/ILRLE 7400 Statistical Tools: Missing Data Methods John M. Abowd and Lars Vilhuber March 15, 2011.
Population Estimates and Projections in the U. S. John F. Long
Local Employment Dynamics Jeff Matson CURA, University of Minnesota Oriane Casale Labor Market Information Office, MN Dept. of Employment and Economic.
0 presented to Model Task Force Meeting presented by Vidya Mysore, FDOT Central Office Krishnan Viswanathan, Cambridge Systematics, Inc. 12/12/06 LEHD.
Saadia GreenbergElena Fazio Office of Performance and Evaluation Administration on Aging US Department.
Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2011 © 2011 John M. Abowd, Lars Vilhuber, all rights reserved.
OnTheMap and LODES Data Heath Hayward Geographer LEHD Program Center for Economic Studies.
12th Meeting of the Group of Experts on Business Registers
1 Supplementing ACS: The LEHD Program Jeremy S. Wu Marc Roemer U.S. Census Bureau May 12, 2005 Jeremy S. Wu Marc Roemer U.S. Census Bureau May 12, 2005.
INFO 7470/ILRLE 7400 Statistical Tools: Edit and Imputation John M. Abowd and Lars Vilhuber March 25, 2013.
Local Employment Dynamics (LED) & OnTheMap Nick Beleiciks Oregon Census State Data Center Meeting April 14, 2009.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
1 Longitudinal Employer- Household Dynamics (LEHD) Program Jeremy S. Wu U.S. Census Bureau May 11, 2005 Jeremy S. Wu U.S. Census Bureau May 11, 2005.
Assessing Disclosure for a Longitudinal Linked File Sam Hawala – US Census Bureau November 9 th, 2005.
Using Census Data to Understand Things ​ OpenGovChicago March 26, 2014.
LEHD and OnTheMap From Jobs to Transportation Matthew Graham Geographer U.S. Census Bureau 1.
Estimation of preliminary unemployment rates by means of multiple imputation UN/ECE-Work Session on Data Editing Vienna, April 2008 Thomas Burg, Statistics.
© John M. Abowd 2005, all rights reserved Multiple Imputation, II John M. Abowd March 2005.
INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013.
Household Surveys: American Community Survey & American Housing Survey Warren A. Brown February 8, 2007.
© John M. Abowd 2005, all rights reserved Assessing Data Quality John M. Abowd April 2005.
Local Employment Dynamics Getting in Touch with Your Local Workforce Earlene Dowell Longitudinal Employer-Household Dynamics Program Center for Economic.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
INFO 7470/ECON 7400/ILRLE 7400 Understanding Social and Economic Data John M. Abowd and Lars Vilhuber January 21, 2013.
INFO 7470/ECON 7400/ILRLE 7400 Statistics of Jobs John M. Abowd and Lars Vilhuber February 2013 with thanks to Stephen Tibbets, U.S. Census Bureau.
LED Local Employment Dynamics Bradley Keen Pennsylvania Department of Labor & Industry Center for Workforce Information & Analysis (CWIA)
INFO 7470 Statistical Tools: Edit and Imputation John M. Abowd and Lars Vilhuber April 11, 2016.
The LEHD Program and Employment Dynamics Estimates Ronald Prevost Director, LEHD Program US Bureau of the Census
Local Employment Dynamics: Partnership, Public-Use Data, and Innovative Web Tools Eric Coyle Data Dissemination Specialist U.S. Census Bureau 1.
LMI and You DWD – BLS. Federal – State Partnership DWD is charged to create the following monthly estimates of economic activity:DWD is charged to create.
INFO 7470/ECON 7400/ILRLE 7400 Register-based statistics John M. Abowd and Lars Vilhuber March 4, 2013 and April 4, 2016.
Developing job linkages for the Health and Retirement Study John Abowd, Margaret Levenstein, Kristin McCue, Dhiren Patki, Ann Rodgers, Matthew Shapiro,
Measuring Data Quality in the BLS Business Register Richard Clayton Sherry Konigsberg David Talan WiesbadenGroup on Business Registers Tallin, Estonia.
INFO 7470 Statistical Tools: Hierarchical Models and Network Analysis John M. Abowd and Lars Vilhuber May 2, 2016.
No Free Lunch: Working Within the Tradeoff Between Quality and Privacy
John M. Abowd and Lars Vilhuber February 16, 2011
INFO 7470 Statistical Tools: Edit and Imputation
Presentation transcript:

INFO 7470 Statistical Tools: Edit and Imputation Examples of Multiple Imputation John M. Abowd and Lars Vilhuber April 18, 2016

Applications to Complicated Data Computational formulas for MI data Examples of building Multiply-imputed data files – Survey of Consumer Finances – SIPP “gold standard” (harmonized files underlying the SIPP Synthetic Data) – Quarterly Workforce Indicators April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

Computational Formulas Assume that you want to estimate something as a function of the data Q(Y) Formulas account for missing data contribution to variance April 11, 20163© John M. Abowd and Lars Vilhuber 2016, all rights reserved

Survey of Consumer Finances Codebook description of missing data procedures Sensitive survey because of large very-wealthy oversample (based on IRS list of most wealthy households in the U.S.) The missing data and confidentiality protection procedures are both based on modeling the complex set of choices given to respondents about how much wealth information to reveal When a respondent does not want to provide an answer, bracketed interval choices are presented Missing data are multiply-imputed based on modeling the conditional distribution of the bracketed intervals, given covariates, and ignorability; see Kennickell (2001)2001 April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

“Gold Standard” for SIPP Synthetic Beta (SSB) We will discuss synthetic data in a later lecture, the synthetic SIPP data are described here: surveys/sipp/guidance/sipp-synthetic-beta-data- product.html surveys/sipp/guidance/sipp-synthetic-beta-data- product.html The harmonized variables from the SIPP panels have had all missing data imputed with a multiple imputation model that is the same one used to generate the synthetic data When a project from the SSB is validated (run on the confidential data by Census staff), the multiply-imputed completed harmonized files (completed Gold Standard Files) are used. These are the results released to the researcher April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 5

How are the QWIs Built? Raw input files: – UI wage records – QCEW/ES-202 report – Decennial census and ACS files – SSA-supplied administrative records – Census-derived administrative record household address files – LEHD geo-coding system Processed data files: – Individual characteristics – Employer characteristics – Employment history with earnings April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

Processing the Input Files Each quarter the complete history of every individual, every establishment, and every job is processed through the production system Missing data on the individuals are multiply imputed at the national level, posterior predictive distribution is stored Missing data on the employment history record are multiply imputed each quarter from fresh posterior predictive distribution Missing data on the employer characteristics are singly- imputed (explanation to follow) April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 7

Examples of Missing Data Problems Missing demographic data on the national individual file (birth date, sex, race, ethnicity, place of residence, and education) – Multiple imputations using information from the individual, establishment, and employment history files – Model estimation component updated irregularly – Imputations performed once for those in estimation universe, then once when a new PIK is encountered in the production system This process was used on the current QWI and for the S2011 snapshot April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

A Very Difficult Missing Data Problem The employment history records only code employer to the UI account level Establishment characteristics (industry, geo- codes) are missing for multi-unit establishments The establishment (within UI account) is multiply imputed using a dynamic multi-stage probability model Estimation of the posterior predictive distribution depends on the existence of a state with establishments coded on the UI wage record (MN) April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

How Is It Done? Every quarter the QWI processes over 7 billion employment histories (unique person-employer pair) covering 1990 to the current quarter less 9 months (2015:Q2, currently) Approximately 30-40% of these histories require multiple employer imputations to acquire workplace characteristics So, the system does more than 25 billion full information imputations every quarter The information used for the imputations is current, it includes all of the historical information for the person and every establishment associated with that person’s UI account April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

Does It Work? Full assessment of total jobs, beginning-of- quarter employment, full-quarter employment, monthly earnings of full-quarter employed, total payroll Older assessment using the state that codes both (MN) Summary slide follows April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 12

April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 13

April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 14

April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 15

April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 16

April 11, © John M. Abowd and Lars Vilhuber 2016, all rights reserved

Cumulative Effect of All QWI Edits and Imputations Average Z-scoresMissingness Rates Sample Size Entity Size* Average Employment Beginning Employment (b) Full-quarter Employment (f) Earnings (z_w3) Beginning Employment (b) Full-quarter Employment (f) Earnings (z_w3) all % 31.2%237, %33.3%43.6%95, % 25.7%84, %21.6%20.5%21, % 19.1%11, %20.6%17.9%8, %20.1%16.1%15,654 *Entity is county x NAICS sector x race x ethnicity for 2008:q3. Z-score is the ratio of the QWI estimate to the square root of its total variation (within and between implicate components) Missingness rate is the ratio of the between variance to the total variance April 11, 2016 © John M. Abowd and Lars Vilhuber 2016, all rights reserved 18