The Survey of Income and Program Participation (SIPP) * Introduction to Data Quality * Accessing the Public Use Files H. Luke Shaefer University of Michigan.

Slides:



Advertisements
Similar presentations
THE URBAN INSTITUTE Preliminary: Not for Quotation or Distribution Dynamics of Medicaid and SCHIP Eligibility Among Children: Anna Sommers, Ph.D.,
Advertisements

Simulating Publicly Subsidized Reinsurance Strategies In Three States Lisa Clemans-Cope, Ph.D. (presenter) Randall R. Bovbjerg, J.D. (PI for Reinsurance.
The Survey of Income and Program Participation (SIPP)
The Survey of Income and Program Participation (SIPP) * Useful Tools for Data Analysis using the SIPP H. Luke Shaefer University of Michigan School of.
Copyright 2013 aha! Process, Inc.  1 Source: U.S. Census Bureau, Current Population Survey, 2013 Annual Social and Economic Supplement.
Considerations for Moving Forward Cindy Mann Executive Director Georgetown University Health Policy Institute Center for Children and Families Health Foundation.
The Uninsured: Policy and Data Issues Michael J. O’Grady, Ph.D. Assistant Secretary for Planning and Evaluation US Department of Health and Human Services.
The Dismal Economy Heather Boushey Center for Economic and Policy Research 8 April 2005.
The Uninsured in America Peter Cunningham, Ph.D. Center for Studying Health System Change Presentation to the Citizens’ Health Care Working Group May 11,
DataPost U.S. Household Incomes A Snapshot Federal Reserve Bank of San Francisco Economic Education Group Date last updated: October 6, 2014.
How College Shapes LivesFor detailed data, see: trends.collegeboard.org. SOURCE: National Center for Education Statistics, 2013, Tables 222, 306, and.
The Rising Price of a College Education Sandy Baum Michael McPherson Skidmore College & The Spencer Foundation The College Board The College Board College.
National Center for Higher Education Management Systems 3035 Center Green Drive, Suite 150 Boulder, Colorado The Public Agenda 5 Years Later Illinois.
Susan G. Queen, Ph.D. Assistant Secretary for Planning and Evaluation.
% of Households Lowest 20% Second 20% Middle 20% Fourth 20% Richest 20% % of Income This model shows the distribution of income across households.
Economic Inequality in the United States. Question #1 In the United States, the 80% of the population at the bottom and middle of the income distribution.
Center on Education and the Workforce Using Data to Inform the National Dialogue on Education and the Economy Presentation by Nicole Smith at the 2009.
Using the MEPS-HC For State-Level Estimates of High Financial Burden Presented at National Conference on Health Statistics, Washington, August 17, 2010.
Extreme Poverty, Poverty, and Near Poverty Rates for Children Under Age 5, by Living Arrangement: 2013 The data for Extreme Poverty, Poverty, and Near.
1 James P. Smith Childhood Health and the Effects on Adult SES Outcomes.
Income Mobility February 14, What is income mobility and why is it important? Income mobility refers to the amount of movement across income ranks.
Figure 1. The Distribution of Goodies over People none tons Goodies 100% Percent Of Persons.
Lecture 2 Income Inequality, Mobility, and the Limits of Opportunity.
Table 6.1 Share of Aggregate Income Received by Families by Each Income Quintile, 1929–2001 Income quintile Lowest
How Automatic Enrollment Affects the Likelihood and Distribution of 401(k) Contributions: Evidence from a National Survey Barbara Butrica and Nadia Karamcheva.
Figure 1. The Distribution of Goodies over People none tons Goodies 100% Percent Of Persons.
Education Pays Education Pays.
NEW YORK CITY: INCOME AND QUALITY OF LIFE FACTORS.
Chapter Ten: Economic and Social Inequality. Defining and Measuring Inequality.
Lecture 2 Income Inequality, Mobility, and the Limits of Opportunity.
To Accompany “Economics: Private and Public Choice 13th ed.” James Gwartney, Richard Stroup, Russell Sobel, & David Macpherson Slides authored and animated.
Knowledge for Equity Conference November 13, 2012 U.S. Department of Education Office of Special Education and Rehabilitative Services National Institute.
Lecture 2 : Inequality. Today’s Topic’s Schiller’s major points Introduction to Census data.
All About the Money: The State Budget One Voice: A Collaborative for Health and Human Services September 30, 2004 Eva De Luna Castro, Budget Analyst
Distribution of Household Income Created by Molly Abromitis.
Reynolds Farley The University of Michigan Population Studies Center Institute for Social Research 426 Thompson Ann Arbor, Michigan August 1,
Data Used to Model Health Reform: The Health Benefits Simulation Model (HBSM) Presented to: 2009 APDU Annual Conference by: John Sheils, Vice President.
Southern Regional Education Board SREB Overview of SREB Data Services Joe Marks Director of Education Data Services Alicia Diaz Assistant Director SAIR.
Comments by Paul S. Davies* Social Security Administration The Great Recession, the Social Safety Net, and Economic Security for Older Americans Presented.
Poverty and the Distribution of Income
3.4 Providing a Safety Net NCEE Standard 13: Role of Resources in Determining Incomes.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Small Area Health Insurance Estimates (SAHIE) Program Joanna Turner, Robin Fisher, David Waddington, and Rick Denby U.S. Census Bureau October 6, 2004.
UNDERSTANDING THE POLICY IMPACT OF SECTION 125 PLANS Lynn Quincy Mathematica Policy Research, Inc. (MPR) July 18, 2008 Lynn Quincy Mathematica Policy Research,
Supplement Your Reporting With CPS and SIPP Robert Gebeloff The New York Times McCormick SRI: Going Deep with Census Demographic.
N ational T ransfer A ccounts National Transfer Accounts: An Overview Andrew Mason East-West Center University of Hawaii at Manoa.
Inequality. Household income thresholds for selected percentiles (U.S. 2013) 10 th percentile? 20 th percentile? 50 th percentile? 80 th percentile? 90.
TRENDS IN HIGHER EDUCATION SERIES Trends in College Pricing and Trends in Student Aid 2009 OCTOBER 20, 2009.
1 December 8, 1999 Jamesburg, New Jersey Jeannette Russo Child-Based Budgeting Project Director/Fiscal Policy Analyst Association for Children of New Jersey.
1 Fifth Edition Fifth Edition Economics A Contemporary Introduction William A. McEachern Income Distribution and Poverty Fifth Edition Fifth Edition Economics.
LESSON 14 INCOME DISTRIBUTION 14-1 HIGH SCHOOL ECONOMICS 3 RD EDITION © COUNCIL FOR ECONOMIC EDUCATION, NEW YORK, NY Household Income Activity 1. Rank.
Economic Opportunity Studies, Inc Meg Power, PhD National Low-Income Energy Consortium St Louis MO. June 8, 2004.
BULL OR BEAR: The Business Climate in North Carolina.
Policy Uses of Federal Statistics Rebecca M. Blank Department of Commerce.
U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics Improving Estimates of the.
Insured, Uninsured and the Underinsured (US data). Olayinka Oladimeji Pharmaceutical Management for Underserved Populations. 03/21/07.
NYS Estimates of Income Eligibles and Unmet Need Wayne DeNyse Victoria Lazariu-Bauer Dan Kellis.
DataPost U.S. Household Incomes A Snapshot Federal Reserve Bank of San Francisco Economic Education Group Date last updated: February 4, 2014.
Graphs for Duncan-Murnane Briefing Document. Figure 1: Incomes of high and low-income families Source: U.S. Census Bureau; all income figures are in 2008.
Health Care and Entitlements Jeff Rubin Department of Economics Rutgers University October 15, 2012.
 What is the difference between wealth & Income?  How do you measure wealth?  What are assets & debts?  What does it mean to be wealthy but little.
US Income Inequality: Causes, Consequences, and Cures
Almost half of the world’s wealth is now owned by just one percent of the population.
There is wide variation in deductibles across markets for silver plans
Income Inequality and Poverty
Median per capita savings, among those with savings (in 2012 dollars)
Distribution of Income
Winner Take All Society?
Why are some people rich and some people poor?
Share of Medicare beneficiaries Median (50th percentile)
Presentation transcript:

The Survey of Income and Program Participation (SIPP) * Introduction to Data Quality * Accessing the Public Use Files H. Luke Shaefer University of Michigan School of Social Work National Poverty Center This presentation is part of the NSF ‑ Census Research Network project of the Institute for Social Research at the University of Michigan. It is funded by National Science Foundation Grant No. SES

What Do We Know About SIPP Data Quality? Czajka & Denmead (2008) analyzes income estimates for calendar year 2002 for: SIPP, CPS, ACS, MEPS, NHIS and PSID, HRS, and MCBS mpr.com/publications/PDFs/incomedata.pdf mpr.com/publications/PDFs/incomedata.pdf Earnings/income—reporting/distribution (full year) Public program participation This is an excellent resource for you, no matter which of these surveys you use Offers numerous estimates to use as benchmarks

A few Key Findings About SIPP data Quality The SIPP is at the low end in estimating total aggregate annual income: SIPP: $5.77 trillion (in 2002) CPS: $6.47 trillion Where did that $700 billion dollars go?!?!?! Not a result of under-representing high-income families The SIPP finds the highest amounts of income at the bottom, lowest amounts at the top The SIPP reports the least amount of income inequality across surveys Income estimates from wave 1 of each panel look different from later waves (more poverty, less income)

Income Estimates By Survey Calendar Year 2002 (Czajka & Denmead, 2008) EstimateSIPPCPSACSMEPS Total population (millions) Earners (millions) % with Earnings54.8% Ave earnings per worker $30,90035,60034,30032,800

Income Estimates By Survey Calendar Year 2002 (Czajka & Denmead, 2008) EstimateSIPPCPSACSMEPS Ave Family Income, Per Capita $20,51422,89322,85422,089 Family Income Per Capita by Quintile Lowest$6,9626,5136,5266,352 Highest$41,06249,31648,54343,855

Population Estimates By Survey Calendar Year 2002 (Czajka & Denmead, 2008) Estimate (in millions) SIPPCPSACS Total Population < 100% Poverty <200% Poverty Children <100% Poverty Receiving TANF/SNAP

Possible Explanations for Income Estimate Differences (Czajka & Denmead, 2008) Perhaps the monthly and detailed income questions are good at capturing income among the poor, and bad among those with higher incomes SIPP is much better—although not perfect—at capturing public program participation Perhaps the SIPP implementation—with its focus on program participation—is more focused on poor respondents Perhaps the seriousness of the difference shouldn’t be overstated… The surveys do have VERY different samples and methods, and the estimates do come pretty close

Under-reporting: The Scourge of Household Survey Data Meyer, Mok & Sullivan compare weighted totals for participation in major household surveys to administrative data They compare aggregate amounts (not participation of specific individuals) They compare $ amounts and participants per month from administrative totals to SIPP estimates Find high levels of underreporting across household surveys Doesn’t address false positives, may understate false negatives

TANF Participation Reporting Rates (Meyer, Mok & Sullivan, 2009) YearSIPPCPSPSID %74.4%62.1% NA

SNAP Participation Reporting Rates (Meyer, Mok & Sullivan, 2009) YearSIPPCPSPSID %67.2%69.7% The SIPP reporting rates, on the whole, are consistently better, and in many cases, much better Under-reporting remains a limitation of any research conducted using the SIPP or any household survey For many questions, the SIPP remains the best game in town

Accessing the Public Use SIPP files Official FTP site for full wave files: surveys/sipp/data.html surveys/sipp/data.html These are in SAS format Make sure you get your file path correct for inputs Savastata, a user-driven Stata command saves SAS datasets as Stata datasets to_stata/transfer-tools/savastata.html to_stata/transfer-tools/savastata.html A parallel command goes in the opposite direction

Accessing the Public Use SIPP files Common source for pre-formatted files with data labels: This is what I use You can use NBER data labels with data extracted from Census FTP site, with a little work Center for Economic Policy Research has some edited abstracts, which can be nice: extracts/sipp-data/ extracts/sipp-data/ If you want to draw down a few variables, you can use DataFerrett No reason to do this to pull down a full panel You might use this to pull down a topical module

SIPP Panels: Dates and Sample Size PanelDatesWave 1, ref 4 Household Heads Wave 1, ref 4 n Income Survey Development Program panel: Data can be accessed, and we can help you get them, but it will take some work panels: harder to access, different file structure—still, they are available ,80058, ,20037, ,50051, ,79652, ,73095, ,10090, ,500110, ,000105,600 Major redesign with the 1996 panel, so this week we will use that and the more recent panels

“The Early Years” Challenges with the Panels Structured as person-wave observations SIPP panels are person-months To make monthly variables useful, need to first “reshape long” into person-month Complicated by presence of 5 th month in some waves; can usually ignore this Huge files with many, many variables Input statements run up against variable limits when grabbing the full wave files Thanks to Matt Rutledge for creating these slides

“The Early Years” Challenges with the Panels Documentation spotty Like , many variables have unhelpful names Example: Hours worked in job 1 is WS12025 instead of EJBHRS1 Some variables even change names between waves Example: Hours worked in business 2 is SE22212 in wave 1, SE22312 in waves 2-7 of 1986 panel Missing some obvious variables 1984: no union status 1989: no citizenship Overlapping panels, but 1988 panel only 6 waves, 1989 only 3 waves Thanks to Matt Rutledge for creating these slides

SIPP Waves Similar file structure to the later panels, organized in person-month observations Still used a paper instrument (transitioning to a computer assisted instrument in 1996) Many variable names different from panels, but often only slightly different panels are shorter and overlap You can stack multiple panels for added statistical power for point-in-time estimates

Memory Issues (Not just mine as a dad with young kiddos…) SIPP files have many variables for many observations Can lead to serious memory limitations You need to check the capacity of your machine, and it’s worth it to invest in extra ram Will allow you to process faster, and keep doing other things in the meantime This is also why it’s good to build do files with your analyses, so you can make a change and set to run while you do something else When you load in a dataset, keep only the observations and variables you need

Technical Documentation SIPP User Guide: Comprehensive source of information. Has numerous updates guide.html guide.html Data Dictionaries: I like the SIPP FTP site for these documentation/data-dictionaries.html documentation/data-dictionaries.html Content of most variables stays the same across panels But there are some changes!!! Coding of the main race variable changes in 2004 panel Metropolitan Statistical Areas identified <= 2001 panel Changed to metro area = 0,1 in 2004 Detailed ethnic origin reduced to Hispanic Origin 0,1 in 2004

File Structure Reference Month Rot Grp 1Rot Grp 2Rot Grp 3Rot Grp 4 12/95W1 Ref1 1/96W1 Ref2W1 Ref1 2/96W1 Ref3W1 Ref2W1 Ref1 3/96W1 Ref4W1 Ref3W1 Ref2W1 Ref1 4/96W2 Ref1W1 Ref4W1 Ref3W1 Ref2 5/96W2 Ref2W2 Ref1W1 Ref4W1 Ref3 6/96W2 Ref3W2 Ref2W2 Ref1W1 Ref4 7/96W2 Ref4W2 Ref3W2 Ref2W2 Ref1 8/96W3 Ref1W2 Ref4W2 Ref3W2 Ref2 9/96W3 Ref2W3 Ref1W2 Ref4W2 Ref3 10/96W3 Ref3W3 Ref2W3 Ref1W2 Ref4

SIPP Wave Data Structure IdentifierRef Month Cal MonthHousehold Income EducationEmployed Luke1Jan$3,00021 Luke2Feb$3,25021 Luke3Mar$020 Luke4Apr$020 Daphne1Feb$7,00031 Daphne2Mar$7,10041 Daphne3Apr$7,23241 Daphne4May$7,00041 Sheldon3Mar$5,55441 Sheldon4Apr$5,25041

Suggested Practice Keep your complete SIPP wave files in their original state—never make changes to them, never save on these files, always clear without saving For any analysis, create a single do file for dataset construction, which pulls the variables from the panels and waves that you need Save that new dataset, without all the SIPP variables you don’t need, and work from that With this program created, it is easy to always go back and reconstruct a dataset with added variables

Let’s say you want to load in multiple files. To reduce your syntax, you can create a loop in stata that reads in the files and keeps the variables you want, automatically. /* This syntax loads in the first 4 waves of the 2008 panel, keeping just a few variables from each wave */ set more off use "F:\SIPP Files\2008\sipp08w1.dta", clear keep ssuid epppnum swave srefmon thtotinc whfnwgt thfdstp erace foreach j in { append using "F:\SIPP Files\2008\sipp08w`j'.dta" keep ssuid epppnum swave srefmon thtotinc whfnwgt thfdstp erace } Loading in Multiple Waves

Identifying Unique Respondents Because there are up to four observations per person, per wave, you need a person identifier to identify unique individuals In the 1996 – 2008 panels, you only need the sample unit identifier (ssuid) + the person number (epppnum) When stacking multiple panels, add the panel identifier In the 1990 – 1993 panels, you need the sample unit identifier + entry address identifier + person number Note: This is confusing in the Users’ Guide. Don’t freak out! Stata Syntax to generate a Unique Person Identifier: egen sippid = concat(spanel ssuid epppnum) Watch the form of epppnum across waves: is it “101” or is it “0101”? When you merge across waves, this has to match