Administrative and Web-Based Survey Data for Improving Public Policy Henry E. Brady University of California, Berkeley Henry E. Brady University of California,

Slides:



Advertisements
Similar presentations
11/19/2014 “Perceived” severity reported by individuals and “actual” disability as measured by clinical testing Washington Group on Disability Statistics.
Advertisements

The Health Uninsured Some Statistical Information July 27, 2007 Charles Maxey, Ph. D.
Presented to: Presented by: Transportation leadership you can trust. LEHD OnTheMap Data Planning Applications Conference, Session 2 Bruce Spear, Cambridge.
What are Wage Records? Wage records are an administrative database used to calculate Unemployment Insurance benefits for employees who have been laid-off.
Primary and Secondary Data
CE Overview Jay T. Ryan Chief, Division of Consumer Expenditure Survey December 8, 2010.
© John M. Abowd 2005, all rights reserved Household Samples John M. Abowd March 2005.
© 2002 Prentice-Hall, Inc.Chap 1-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 1 Introduction and Data Collection.
The Characteristics of Employed Female Caregivers and their Work Experience History Sheri Sharareh Craig Alfred O. Gottschalck U.S. Census Bureau Housing.
© John M. Abowd 2005, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2005.
Presented to: Presented by: Transportation leadership you can trust. LEHD OnTheMap Data 2011 GIS in Public Transportation Tampa, FL Bruce Spear September.
© John M. Abowd 2005, all rights reserved Sampling Frame Maintenance John M. Abowd February 2005.
Labor Statistics in the United States Grace York March 2004.
United Nations Workshop on Revision 3 of Principles and recommendations for Population and Housing Censuses and Census Evaluation Amman, Jordan, 19 – 23.
Basic Business Statistics (8th Edition)
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Health Insurance Coverage of California’s Working Latinos Howard Greenwald Suzanne O'Keefe Mark DiCamillo University of Southern California California.
THE UNIVERSITY OF MISSISSIPPI The University of Mississippi Institute for Advanced Education in Geospatial Science Census to American Community Survey.
1 Health Status and The Retirement Decision Among the Early-Retirement-Age Population Shailesh Bhandari Economist Labor Force Statistics Branch Housing.
1 Types and Sources of Data UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D.
12 th Global Conference on Ageing June 11-13, 2014 The Economic Support System for Senior Citizens in India: Restating the Obvious K S James Institute.
Child Support Services Date. Purpose Child Support Services exists to establish and enforce court orders for paternity, child support and medical support.
David Card, Carlos Dobkin, Nicole Maestas
11 The American Community Survey Steve Murdock, Ph.D. Director, Hobby Center for the Study of Texas Rice University.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
Effects of Income Imputation on Traditional Poverty Estimates The views expressed here are the authors and do not represent the official positions.
Becoming Canadian Citizens: Intent, process and outcome Kelly Tran, Tina Chui: Statistics Canada Stan Kustec, Martha Justus: Citizenship and Immigration.
Population Estimates and Projections in the U. S. John F. Long
Dr. Engr. Sami ur Rahman Assistant Professor Department of Computer Science University of Malakand Research Methods in Computer Science Lecture: Research.
The American Community Survey Texas Transportation Planning Conference Dallas, Texas July 19, 2012.
Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.
Overview of Administrative Records on Population and Housing
Unemployment What are the different types of unemployment?
Not a benefit … a necessity: What Paid Family Leave means for NYC’s low-income families Nancy Rankin, Vice President for Policy Research and Advocacy Apurva.
Completing the FAFSA Website: ‒ 16 FAFSA on the Web available on January 1, 2015 FAFSA on the Web Worksheet: Used as optional.
Statistics The science of collecting, analyzing, and interpreting data. The Statistical Problem Solving Process: 1.Ask a question of interest 2.Produce.
The Labor Supply of Undocumented Immigrants: Towards an Assessment of the Impact of Status Regularization George J. Borjas Harvard University August 6,
Chapter 1: The What and the Why of Statistics
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
1 Sources of gender statistics Angela Me UNECE Statistics Division.
United Nations Economic Commission for Europe Statistical Division Sources of gender statistics Angela Me UNECE Statistics Division.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
Section 1.2 ~ Sampling Introduction to Probability and Statistics Ms. Young.
Sampling is the other method of getting data, along with experimentation. It involves looking at a sample from a population with the hope of making inferences.
The What and the Why of Statistics The Research Process Asking a Research Question The Role of Theory Formulating the Hypotheses –Independent & Dependent.
Longitudinal Data Recent Experience and Future Direction August 2012.
Chapter 41 Sample Surveys in the Real World. Chapter 42 Thought Question 1 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 14) Nicotine.
Current Population Survey Sponsor: Bureau of Labor Statistics Collector: Census Bureau Purpose: Monthly Data for Analysis of Labor Market Conditions –CPS.
Data on the Foreign Born in 2010: Accessing Information on Immigrants and Immigration from the U.S. Census Bureau’s American Community Survey Thomas A.
1 NCHS Record Linkage Activities Kimberly A. Lochner Christine S. Cox NCHS Data Users Conference July 11, 2006 U.S. DEPARTMENT OF HEALTH AND HUMAN SERVICES.
VerdierView Graph # 1 OVERVIEW Problems With State-Level Estimates in National Surveys of the Uninsured Statistically Enhancing the Current Population.
Survey Research. In Case of a System Glitch… After forming into your usual teams: –Create a brief survey that seeks to discern citizens’ attitudes about.
1 For a Population Statistical Register Characteristics and Potentials for the Official Statistics Central department for administrative data and archives.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
1 1 Topics difficult to measure in a register-based census Harald Utne Census Project Statistics Norway UNECE-Eurostat Meeting on Population.
Regulations 201: Thorny Issues What is Research? Exempt and Expedited Reviews.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,
1 Understanding how the Trinidad and Tobago 2011 Census Data can inform National Development Presented by A. Noguera- Ramkissoon, UNFPA, OIC, SALISES Forum,
American Community Survey (ACS) Using Census Data by Block Group January 21, 2016 Presentation at the National Community Development Association Winter.
Using administrative data to produce official social statistics New Zealand’s experience.
COLLECTING DATA: SURVEYS AND ADMINISTRATIVE DATA PBAF 526 Rachel Garshick Kleit, PhD Class 8, Nov 21, 2011.
The LEHD Program and Employment Dynamics Estimates Ronald Prevost Director, LEHD Program US Bureau of the Census
1 Economically Active Population Survey Dong-Wook JEONG Employment Statistics Div. Statistics Korea.
Intro to Probability and Statistics 1-1: How Can You Investigate Using Data? 1-2: We Learn about Populations Using Samples 1-3: What Role Do Computers.
CAR for Immigration Stories Steve Doig Arizona State University.
Presentation transcript:

Administrative and Web-Based Survey Data for Improving Public Policy Henry E. Brady University of California, Berkeley Henry E. Brady University of California, Berkeley

Types of Data Survey Data: Sample is drawn and units (e.g., people, firms, organizations) are asked questions. Sample and questions chosen by researcher or policy analyst. Administrative Data: Data produced as a result of administrative operations—often transactional data (e.g., sending checks, registering, charging, paying, enrolling, etc.) Sample and data chosen by administrative agency.

Example: What is the Experience of Immigrants with Welfare Programs? Source: Henry E. Brady (UCB) and Jon Stiles (UCB) Source: Henry E. Brady (UCB) and Jon Stiles (UCB)

Immigrants and Welfare Question: What is the experience of immigrants with welfare programs? Does this explain why so many two-parent welfare households in California stay on welfare a long time? Problem: Very few datasets have both: –Immigration status (native, naturalized citizen, non-citizen) –Immigrant welfare and job experience over time

Census Survey Data Can Provide Nativity – Whether native or non-native and date of entry to US and citizenship status for non-natives SES and Demographics -- Household composition, education, sources of income, race/ethnicity, marital status, etc. Cross-Sectional Population Samples -- Description of both program participants and non-participants at a point in time.

Administrative Data Can Provide Program Participation Over Time – Medi-Cal Eligibility Data System (MEDS) –Monthly record of eligibility for welfare programs, –Programmatic basis for eligibility Work History Over Time – Employment Development Department - Base Wage files –Quarterly earnings as reported for UI/DI coverage from 1991 to 1999 –Identifies number of employers, total covered earnings

Samples are drawn each year with household and personal characteristics measured at sampling. The CPS follows sampled housing units for 4 months in 2 consecutive years, while the SIPP follows households for 2.5 years with interviews each 4 months CPS and SIPP samples Census Surveys with Program Participation by Nativity,

MEDS DATA Medi-Cal Eligibility and Program Participation identified monthly for samples following the initial survey interview for the year of sampling. California Administrative Data: Program Participation by Year (MEDS)

MEDS DATA … and each subsequent year through So individuals in each panel may be potentially tracked in the MEDS data for up to 13 years after initial sampling Year of MEDS coverage

EDD DATA Earnings in UI covered employment are identified for each quarter from mid-1991 through 1999 California Administrative Data: Wages by Year from UI Base Wage File

SIPP: Attempts to obtain SSN for all in the household CPS: Requests SSN for all persons aged 15+ in the household CES and LEHD assign “Protected Identification Keys” based on SSN MEDS: Obtains SSN for (almost) all Medi-Cal eligible persons EDD: Obtains SSN and wages from employers of UI/DI covered employees CES provides “crosswalk” between PIKs and publicly available identifiers for CPS and SIPP CES provides “anonymized” MEDS And EDD Base Wage records identified only by PIK Survey records and Administrative records are merged using the PIK to create a “matched” file Census Survey DataState Administrative Data Final Linked File

EDD DATA MEDS DATA Matched Survey, MEDS, and UI Earnings data cover pre- and post-Welfare Reform periods, and weak and strong economies.

Some Issues Complex matching problems Data quality issues for administrative and survey data Confidentiality issues –In fact, state data in this case must be matched by the Census Bureau

Two Big Findings in California Non-Citizen Elderly Immigrants on Welfare – Non-citizen (but legal) immigrants more likely to eventually end up on welfare for the elderly (SSI/SSP), especially if they came at older age (probably because of less Social Security based work) Non-Citizen Immigrant Women on Welfare – Non-citizen (but legal) immigrant women in two- parent families less likely to get off welfare (probably because of fewer skills, less language competency, perhaps cultural factors).

Percent of Adults on SSI/SSP at Some Point of Adults in Surveys who Were (or Became) 65 or Older During ’90-’02 0.0%10.0 % 20.0%30.0 % 40.0%50.0%60.0%70.0%80.0% Native Naturalized Non-Citizen < Citizenship Entry Age to US Percent ever observed on SSI/SSP Non-Citizen Naturalized Native SSI/SSP = Welfare Program for Elderly Poor or Disabled Non-Citizens and Naturalized Entire Population

Aid and Employment in Years after Sampling Women initially in 2 Parent AFDC/TANF cases (Total percentage declines over time as we lose track of people) Still on Welfare Working Welfare and Work Native WomenNon-Citizen Women Years Since Initial Sampling

Example: What is the Impact of Changing Polling Place Locations? Source: Henry E. Brady (UCB) and John McNulty (Binghamton University) Source: Henry E. Brady (UCB) and John McNulty (Binghamton University)

Another Example: Voting in Los Angeles Question: What is the impact of changing polling place locations on voting turnout? Data Files –Polling place locations and addresses –Voter rolls and addresses –Census data on blocks and tracts linked by geography Coded address data using GIS methods

What was done to get data (abbreviated)

Los Angeles County

2002 Polling Places

2003 Polling Places

Los Angeles County Voters

Thinking about Data: What are the dimensions of data?

Three Dimensions of Data— Quantity and Quality for Each Length of Time and Panel Integrity Number of Cases and Representativeness Number of Variables and Item Quality

Ideal Data Set Variables –As many variables as possible –High item quality Cases –As many cases as possible –Highly representative (e.g., random sample) Time –As long a period of time as possible –Continuous observation—no panel mortality

Why All this Information? Internal Validity—Insuring relationships are correct –Descriptive -- Describe how characteristics are related to one another: E.g., Single mothers have longer welfare spells than those in two-parent families –Causal and Inferential – Make inferences about what causes what. E.g., Being a single mother causes you to have a longer welfare spell External Validity—Relationships you find can be generalized –Representative –As representative as possible so that it reasonably tell us about the population –Theoretical – As theoretically rich as possible so that it can be generalized to other circumstances and situations

Surveys: Rich in Variables; For Short Time Periods; Not Many Cases Time Cases Variables Most Survey Data

Administrative Data: Weak in Variables; Rich in Cases; Rich in Time if Linked Over Time Time Cases Variables Linked Administrative Data

Problems with Surveys Designing/Implementing Good Sample Frames –Telephone: cell phones, no phones, etc. –Internet: choosing random sample, self-selection Responses Hard to Get: –Interview Response Rates Declining –Item non-responses problematic (e.g., income, race) Costs High: In-person & Telephone Expensive: –In-person – about $500 to $1500/interview –Telephone – about $50 to 150/interview –Internet – about $5 to $50/interview Confidentiality Concerns with Collected Data

Internet Surveys as Solution? Virtues: Inexpensive way to collect data but it requires addresses – hence hard to get random samples Basic Problems with Self-Selected Internet Surveys: Those who sign up are not typical. Why not?

Internet Surveys: Two Better Methods than Self-Selection Starting with random sample and give them computers Very expensive initially Hard to maintain random sample because of panel mortality “Matching Method” –File of addresses – Collects large numbers of addresses and personal information from those willing to be interviewed on the web. –File enumerating Americans – Chooses random samples from a file (like a phone book) constructed by a commercial firm which contains a nearly universal file of Americans and some demographic and SES information on each one of them. –Matched Sample – Interviews the nearest match in its address file to those in its random samples. Is this Representative Enough? – Still not sure but…

What about Administrative Data? Types of Administrative Data Social welfare Voting, political contributions Taxes, assessments Consumer transactions Official statistics Educational records Health records Licensing information Criminal Justice

Linked Social Services Data in American States--1999

Administrative Data as Solution? Virtues: Inexpensive way to collect data but it requires linking of data over time and across various data-sets using fallible identifiers Problems – –Mixed quality data –Confidentiality concerns and problems –Incomplete coverage –Change in computer systems over time

Administrative Data: Strengths and Limitations Participation: Usually excellent for characterizing participants in programs because it measures “real” involvement and level of involvement. –Low incidence phenomena -- Geographic incidence -- Administrative details of participation Participation and Population – Not so good for describing how participants relate to populations – the selection and denominator problems: –Selection: Participants are a self-selected group –Denominator: What fraction of people eligible for a program actually get it? How do participants compare to overall population?

Administrative Data: Strengths and Limitations Data Quality: Good for those things related to business purpose; not good for other things. Problems of “Legacy” Computer Systems and Poor Documentation – Data only available in old (e.g., COBOL) or proprietary data systems. Documentation very old or non-existent.

Examples Date of birth on: –Voting dataset –Job application dataset Education level on a: –Voting dataset –Job application dataset Income from job on a: –Tax dataset –Welfare dataset –Social security dataset

Administrative Data: Data Quality Motivation for collecting data? System for auditing data? Data entered by frontline worker? Edit checks in the information system? Analyses done in past for system? Are items critical for agency mission?

Linked Social Services Data in American States--1999

What was done to get data (abbreviated)

SIPP: Attempts to obtain SSN for all in the household CPS: Requests SSN for all persons aged 15+ in the household CES and LEHD assign “Protected Identification Keys” based on SSN MEDS: Obtains SSN for (almost) all Medi-Cal eligible persons EDD: Obtains SSN and wages from employers of UI/DI covered employees CES provides “crosswalk” between PIKs and publicly available identifiers for CPS and SIPP CES provides “anonymized” MEDS And EDD Base Wage records identified only by PIK Survey records and Administrative records are merged using the PIK to create a “matched” file Census Survey DataState Administrative Data Final Linked File

Administrative Data: Types of Data Linkage Linking over time Different information data sets across service areas Linking survey data to administrative data when survey is drawn from administrative dataset Linking sample data to administrative data when sample is independent Linking to contextual data (e.g., by place, organization, etc.)

Methods of Linkage Probabilistic Deterministic

Confidentiality How much information do we need to identify someone? Consider: Name SSI Address Geographic area Race Date of birth; Age Gender, Race, Geographic area, Date of Birth

Conclusions Exciting Possibilities –Internet Interviewing –Computerized administrative Data With Some Real Possibilities –Descriptive inference; Causal inference –Populations of people –Lots of linked information