Download presentation
Presentation is loading. Please wait.
Published byJukka-Pekka Yrjö Honkanen Modified over 5 years ago
1
The Administrative Income Statistics (AIS) Project: Research on the Use of Administrative Records to Improve Income and Resource Estimates Jonathan Rothbaum, Ph.D. Income Statistics Branch Social, Economic, and Housing Statistics Division Joint Statistical Meetings July 29, 2019 This presentation is to inform interested parties of ongoing research and to encourage discussion. The views expressed in this presentation are those of the author and not necessarily those of the U.S. Census Bureau. The data in this presentation has been cleared by the Census Bureau's Disclosure Review Board release authorization number CBDRB-FY19-ROSS-B0108.
2
Why should we use administrative records (Adrecs)
Biases in survey responses Measurement error Misreporting Under-reporting Non-response/Imputation bias Non-response increasing Imputations could bias results if assumptions of imputation model are not true Reduce burden
3
Underreporting – Looking at the Aggregates
Source: Rothbaum (2015) * ACS Transfers includes both Transfers, Pension, and Retirement Income due to the lower level of detail in the questionnaire.
4
Aggregates Don’t Tell the Whole Story
Do not tell us where in the distribution the income is missing Top of the distribution Doesn’t affect – median household income, poverty Does affect - inequality Bottom of the distribution Affects everything Need microdata to know who is affected by shortfalls
5
Examples of Misreporting with Microdata
Retirement income 45% report retirement income in the CPS ASEC when present in administrative records (Bee, 2013) SNAP < 60% of SNAP recipients report it in CPS ASEC (Meyer and Mittag, 2015; Stevens et al., 2018)
6
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
7
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
8
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
9
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
10
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
11
Under-reporting Example – 65+ Households
Source: Bee and Mitchell (2017) using 2013 CPS ASEC linked to W2, 1040, and 1099-R forms for persons 65+.
12
Earnings Comparing administrative records to Survey
Every dollar is there! Earnings are most of the money 78% of income in BEA National Income and Product Account benchmark 82% of income in CPS ASEC Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
13
Earnings – what the aggregates miss
But not in the same place So not the same dollars… Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
14
Earnings – not necessarily the same people
Earners In Both In Neither Only In Survey DER Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
15
Earnings – not necessarily the same people
Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
16
Earnings – when the same people, not necessarily the same amount
Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
17
Non-response over time
Share of Income Imputed in CPS ASEC Source: Bee, Gathright, and Meyer (2016) Source: Hokayem, Raghunathan, and Rothbaum (2019)
18
Non-response – is it random?
Trouble in the Tails (Bollinger et al., 2018) High/Low earners most likely to be non-respondents Result in biased income distribution statistics Source: Bollinger et al. (2018) from CPS ASEC linked to W2 records
19
Non-random Non-response – how it affects income estimates?
Imputation with administrative records (Hokayem, Rothbaum, Raghunathan, 2019) Address trouble in the tails non- random non-response/imputation bias Results – correcting for bias Poverty ↑ Median household income ↓ Inequality ↑ Adrecs help with precision, not necessary for non-response bias Source: Hokayem, Raghunathan, and Rothbaum, CPS ASEC linked with administrative records.
20
How should we use them? General Model
Given an underlying true income 𝑦 𝑖 ∗ , there are two reports of income 𝑘={𝑎,𝑠}, 𝑎 = administrative record and 𝑠 = survey report, where: 𝑦 𝑖 𝑠 = 𝑦 𝑖 ∗ + 𝑐 𝑖 𝑠 + 𝑣 𝑖 𝑠 𝑦 𝑖 𝑎 = 𝑦 𝑖 ∗ + 𝑐 𝑖 𝑎 + 𝑣 𝑖 𝑎 𝑐 𝑖 𝑘 = 𝑓 𝑘 𝑦 𝑖 ∗ , 𝑋 𝑖 - expected over-/under-reporting given true income and other individual characteristics 𝑣 𝑖 𝑘 = 𝑔 𝑘 𝑦 𝑖 ∗ , 𝑋 𝑖 - reporting noise given true income and other individual characteristics Assumed to be of mean zero where the dispersion is a function of 𝑦 𝑖 ∗ and 𝑋 𝑖 Loosely based on Abowd and Stinson (2013)
21
General model, less math
𝑐 𝑖 𝑘 - do people like this person under/over-report systematically (to survey or administrative agency)? This is a distribution and could be probability of reporting zero given some true underlying income or expected reporting error 𝑣 𝑖 𝑘 - do people like this person report with random noise? Rounding Not knowing your exact income Proxy reporting
22
Possible Options Direct Replacement – assume adrec is correct and just use administrative record and ignore survey response ( 𝑐 𝑖 𝑎 and 𝑣 𝑖 𝑎 =0) For when adrecs are clearly superior to surveys Very little/no(?) additional information in survey response when adrec is present Survey Only No adrec or surveys are clearly superior Very little/no(?) information available in any adrec source currently available or likely to be available in the short/medium term Combining/modeling from both sources Both sources provide information about “true” underlying income, but neither is without error Take advantage of strengths of both to estimate true income, which may require income type-specific adjustments or modeling
23
Direct Replacement Current data
Social security (PHUS file from SSA) – unlikely SSA paid you if they have no record of doing so SSI (SSR file from SSA) – ditto Interest and dividends (1040 filings) – not at individual level and missing for non-filers, but likely much more accurate than survey reports Retirement, survivor, and disability (from 1099-R)
24
Direct Replacement Potential future data
IRS (if IRS gave us everything they have…) Interest and dividends (1099-INT, 1099-DIV, K1s) Unemployment compensation (1099-G) Educational assistance (1098-T) Rent/royalty income (1040-E) Transfer income/tax credits (1040) – EITC, child tax credit, etc. Other income (1040 and various information returns) – capital gains (although not necessarily reported accurately to IRS), gambling winnings (losses may not be visible for non-itemizers), Alimony, Alaska dividends… Employer contributions to health insurance premiums (recent years only) Veteran’s Benefits (VA data)
25
Survey Only (given current data)
Veteran’s benefits Worker’s compensation Unemployment compensation Educational assistance Financial assistance Other income items (alimony, child support, capital gains, etc.) Taxes and transfers (modeled)
26
Combining/modeling from both sources Example – Wage and Salary Earnings
Challenges Both survey and adrecs have error Error assumptions Survey under-reporting – 𝑐 𝑖 𝑠 < 𝑐 𝑖 𝑎 for certain occupations and at certain earnings levels (or other characteristics in 𝑋 𝑖 ) Adrec under-reporting – for others 𝑐 𝑖 𝑠 > 𝑐 𝑖 𝑎 , systematic under-reporting of earnings is present in the administrative records for particular occupations (tip-based, likely under- the-table earners, at various kink-points in the tax code, etc.). Particular observable characteristics predict likely tax/survey under-reporting which can be evaluated systematically in the combined data Misclassification of self-employment vs. wage and salary in survey vs. adrecs
27
Combining/modeling from both sources Example – Wage and Salary Earnings
How about survey < adrec here? Does survey > adrec here indicate true income or survey reporting error? For some, for all, …? Source: Bee, Mitchell, and Rothbaum, 2019. 2013 CPS ASEC linked with 2012 DER, 2012 W-2s, and 2012 LEHD.
28
Combining/modeling from both sources Example – Wage and Salary Earnings
Can we take the max of the two? Source: O'Hara et al. (2017) using the 2011 ACS linked to 2010 W-2 records.
29
Combining/modeling from both sources Example – Self-Employment Earnings
Challenge Misalignment (Abraham et al., 2018) 𝑝 CPS SE≠0 DER SE>0 =0.49 𝑝 DER SE>0 CPS SE≠0 =0.35 Mis-reporting Survey Harder to recall than regular wage or salary earnings Consumption of the self-employed implies ~25% more income than reported (Hurst, Li, and Pugsley, 2014) Adrecs Under-reporting to IRS causes BEA to adjust self-employment income up by ~45% of total (NIPA Table 7.14) Strong incentive to under-report given lack of third-party information returns on most income or expenses
30
Some challenges with administrative data
Geographic Coverage Some states provide program data, but not all Examples – SNAP and WIC data Solutions (at least for states without data) Microsimulation model – TRIM3 from Urban Institute is an example General approach to handling known shortfalls relative to aggregates without microdata (which is a problem for many resource items) Generally, requires making unverifiable assumptions Still, sometimes it’s the best we can do… Impute across space – assume demographic/socioeconomic relationships and SNAP receipt/mis-reporting hold across space (in state lines) Currently being researched for SNAP benefits with encouraging results
31
SNAP Adrecs Example Source: Fox, Rothbaum, and Shantz (2019)
2014 CPS ASEC linked to state SNAP records.
32
Some challenges with administrative data
Timeliness – data not always available in time Example – DER data used for much of earnings analysis is generally not available within a year of tax filing Solutions Take longer to release data and wait for adrecs (not ideal) Impute across time – assume demographic/socioeconomic and survey-adrec income relationships hold across time (from year 𝑡−1 to year 𝑡) given known changes in adrec aggregates and survey response distributions Are these relationships constant? – tax law changes, program administration/rule changes Revise when adrec become available and better estimates are possible?
33
Some challenges with administrative data
Income coverage – some income types have administrative records that do not cover all possible sources Examples Cash welfare and TANF – TANF data available in some states, but that doesn’t cover all possible local, state, and federal sources of cash welfare Housing assistance and HUD data – federal HUD programs are not the only possible source of housing assistance Solutions – this requires more research…
34
Some challenges with administrative data
Linkability – not everyone can be matched to adrecs Requires “PIK” (Protected Identification Key) – basically, can individual be linked to their SSN Probabilistic match from identifying survey information 91-94% (depending on survey and year) are linked Remaining 6-9% - no adrecs Solutions Assume relationships of linked hold for unlinked – but linkage is probably not random Link by survey address to adrecs from that address Low linkage groups may also be less likely to be in adrec data (less likely to file 1040, for example) Use survey responses – but we know those have errors for the linked
35
Some challenges with administrative data
Can change over time due to statutory/regulatory changes that affect programs and agencies Economic Stimulus Act of 2008 provided a tax credit for those with low income → many more 1040s filed for one year Source: Rothbaum, 2018 ACS Linked to IRS records.
36
Some challenges with administrative data
Can change over time due to statutory/regulatory changes that affect programs and agencies Auten and Splinter (2018) argue that much of the inequality increase in tax data from 1960 to present in work by Piketty, Saez, and Zucman is due to changes in the tax code and the nature tax reporting, not in actual underlying income changes The differences in top 1% share are mostly explained by different methods of imputing under-reported income to individual tax units (38%) and including retirement income distributions (19%). For the after tax 1% share, more than 100% of the remaining difference is accounted for by distributing government consumption on a per capita basis (AS) vs. by income (PSZ).
37
Research Agenda Process Administrative Data and Compare to Survey Responses Clean data Compare to surveys Resolve Practical Issues in Estimation using Multiple Data Sources Combining data with error Incomplete geographic coverage Incomplete Income/Resource Coverage Leverage Administrative Records to Improve Survey Operations Survey frame coverage Imputation bias Produce Estimates in the Absence of Contemporaneous Data Research Production and Implementation Issues Disclosure protection Best implementation timeline Update Research Files and Estimates over the Course of the Research
38
Contact Information Jonathan Rothbaum Chief, Income Statistics Branch Social, Economic, and Housing Statistics Division (301)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.