Download presentation
Presentation is loading. Please wait.
Published byWarren Newman Modified over 8 years ago
1
Administrative and Web-Based Survey Data for Improving Public Policy Henry E. Brady University of California, Berkeley Henry E. Brady University of California, Berkeley
2
Types of Data Survey Data: Sample is drawn and units (e.g., people, firms, organizations) are asked questions. Sample and questions chosen by researcher or policy analyst. Administrative Data: Data produced as a result of administrative operations—often transactional data (e.g., sending checks, registering, charging, paying, enrolling, etc.) Sample and data chosen by administrative agency.
3
Example: What is the Experience of Immigrants with Welfare Programs? Source: Henry E. Brady (UCB) and Jon Stiles (UCB) Source: Henry E. Brady (UCB) and Jon Stiles (UCB)
4
Immigrants and Welfare Question: What is the experience of immigrants with welfare programs? Does this explain why so many two-parent welfare households in California stay on welfare a long time? Problem: Very few datasets have both: –Immigration status (native, naturalized citizen, non-citizen) –Immigrant welfare and job experience over time
5
Census Survey Data Can Provide Nativity – Whether native or non-native and date of entry to US and citizenship status for non-natives SES and Demographics -- Household composition, education, sources of income, race/ethnicity, marital status, etc. Cross-Sectional Population Samples -- Description of both program participants and non-participants at a point in time.
6
Administrative Data Can Provide Program Participation Over Time – Medi-Cal Eligibility Data System (MEDS) –Monthly record of eligibility for welfare programs, 1988-2002 –Programmatic basis for eligibility Work History Over Time – Employment Development Department - Base Wage files –Quarterly earnings as reported for UI/DI coverage from 1991 to 1999 –Identifies number of employers, total covered earnings
7
Samples are drawn each year with household and personal characteristics measured at sampling. The CPS follows sampled housing units for 4 months in 2 consecutive years, while the SIPP follows households for 2.5 years with interviews each 4 months 1990199119921993199419951996199719981999200020012002 CPS and SIPP samples Census Surveys with Program Participation by Nativity, 1990-02
8
1990199119921993199419951996199719981999200020012002 1990 12 1991 12 1992 12 1993 12 1994 12 1995 12 1996 12 1997 1 1998 1 1999 1 2000 12 2001 2002 MEDS DATA Medi-Cal Eligibility and Program Participation identified monthly for samples following the initial survey interview for the year of sampling. California Administrative Data: Program Participation by Year (MEDS)
9
1990199119921993199419951996199719981999200020012002 1990 12345678910111213 1991 123456789101112 1992 1234567891011 1993 12345678910 1994 123456789 1995 12345678 1996 1234567 1997 123456 1998 12345 1999 1234 2000 123 2001 2002 MEDS DATA … and each subsequent year through 2002. So individuals in each panel may be potentially tracked in the MEDS data for up to 13 years after initial sampling Year of MEDS coverage
10
1990199119921993199419951996199719981999200020012002 1990 1234567891012 1991 123456789 1992 12345678 1993 1234567 1994 123456 1995 12345 1996 1234 1997 123 1998 12 1999 1 2000 12 2001 2002 EDD DATA Earnings in UI covered employment are identified for each quarter from mid-1991 through 1999 California Administrative Data: Wages by Year from UI Base Wage File
11
SIPP: Attempts to obtain SSN for all in the household CPS: Requests SSN for all persons aged 15+ in the household CES and LEHD assign “Protected Identification Keys” based on SSN MEDS: Obtains SSN for (almost) all Medi-Cal eligible persons EDD: Obtains SSN and wages from employers of UI/DI covered employees CES provides “crosswalk” between PIKs and publicly available identifiers for CPS and SIPP CES provides “anonymized” MEDS And EDD Base Wage records identified only by PIK Survey records and Administrative records are merged using the PIK to create a “matched” file Census Survey DataState Administrative Data Final Linked File
12
23 2001 2002 EDD DATA MEDS DATA Matched Survey, MEDS, and UI Earnings data cover pre- and post-Welfare Reform periods, and weak and strong economies.
13
Some Issues Complex matching problems Data quality issues for administrative and survey data Confidentiality issues –In fact, state data in this case must be matched by the Census Bureau
14
Two Big Findings in California Non-Citizen Elderly Immigrants on Welfare – Non-citizen (but legal) immigrants more likely to eventually end up on welfare for the elderly (SSI/SSP), especially if they came at older age (probably because of less Social Security based work) Non-Citizen Immigrant Women on Welfare – Non-citizen (but legal) immigrant women in two- parent families less likely to get off welfare (probably because of fewer skills, less language competency, perhaps cultural factors).
15
Percent of Adults on SSI/SSP at Some Point of Adults in Surveys who Were (or Became) 65 or Older During ’90-’02 0.0%10.0 % 20.0%30.0 % 40.0%50.0%60.0%70.0%80.0% Native Naturalized Non-Citizen < 20 20-30 30-40 40-50 50-60 60-70 Citizenship Entry Age to US Percent ever observed on SSI/SSP Non-Citizen Naturalized Native SSI/SSP = Welfare Program for Elderly Poor or Disabled Non-Citizens and Naturalized Entire Population
16
Aid and Employment in Years after Sampling Women initially in 2 Parent AFDC/TANF cases (Total percentage declines over time as we lose track of people) Still on Welfare Working Welfare and Work Native WomenNon-Citizen Women Years Since Initial Sampling
17
Example: What is the Impact of Changing Polling Place Locations? Source: Henry E. Brady (UCB) and John McNulty (Binghamton University) Source: Henry E. Brady (UCB) and John McNulty (Binghamton University)
18
Another Example: Voting in Los Angeles Question: What is the impact of changing polling place locations on voting turnout? Data Files –Polling place locations and addresses –Voter rolls and addresses –Census data on blocks and tracts linked by geography Coded address data using GIS methods
19
What was done to get data (abbreviated)
21
Los Angeles County
22
2002 Polling Places
23
2003 Polling Places
24
Los Angeles County Voters
26
Thinking about Data: What are the dimensions of data?
27
Three Dimensions of Data— Quantity and Quality for Each Length of Time and Panel Integrity Number of Cases and Representativeness Number of Variables and Item Quality
28
Ideal Data Set Variables –As many variables as possible –High item quality Cases –As many cases as possible –Highly representative (e.g., random sample) Time –As long a period of time as possible –Continuous observation—no panel mortality
29
Why All this Information? Internal Validity—Insuring relationships are correct –Descriptive -- Describe how characteristics are related to one another: E.g., Single mothers have longer welfare spells than those in two-parent families –Causal and Inferential – Make inferences about what causes what. E.g., Being a single mother causes you to have a longer welfare spell External Validity—Relationships you find can be generalized –Representative –As representative as possible so that it reasonably tell us about the population –Theoretical – As theoretically rich as possible so that it can be generalized to other circumstances and situations
30
Surveys: Rich in Variables; For Short Time Periods; Not Many Cases Time Cases Variables Most Survey Data
31
Administrative Data: Weak in Variables; Rich in Cases; Rich in Time if Linked Over Time Time Cases Variables Linked Administrative Data
32
Problems with Surveys Designing/Implementing Good Sample Frames –Telephone: cell phones, no phones, etc. –Internet: choosing random sample, self-selection Responses Hard to Get: –Interview Response Rates Declining –Item non-responses problematic (e.g., income, race) Costs High: In-person & Telephone Expensive: –In-person – about $500 to $1500/interview –Telephone – about $50 to 150/interview –Internet – about $5 to $50/interview Confidentiality Concerns with Collected Data
33
Internet Surveys as Solution? Virtues: Inexpensive way to collect data but it requires e-mail addresses – hence hard to get random samples Basic Problems with Self-Selected Internet Surveys: Those who sign up are not typical. Why not?
34
Internet Surveys: Two Better Methods than Self-Selection Starting with random sample and give them computers Very expensive initially Hard to maintain random sample because of panel mortality “Matching Method” –File of e-mail addresses – Collects large numbers of e-mail addresses and personal information from those willing to be interviewed on the web. –File enumerating Americans – Chooses random samples from a file (like a phone book) constructed by a commercial firm which contains a nearly universal file of Americans and some demographic and SES information on each one of them. –Matched Sample – Interviews the nearest match in its e-mail address file to those in its random samples. Is this Representative Enough? – Still not sure but…
35
What about Administrative Data? Types of Administrative Data Social welfare Voting, political contributions Taxes, assessments Consumer transactions Official statistics Educational records Health records Licensing information Criminal Justice
36
Linked Social Services Data in American States--1999
37
Administrative Data as Solution? Virtues: Inexpensive way to collect data but it requires linking of data over time and across various data-sets using fallible identifiers Problems – –Mixed quality data –Confidentiality concerns and problems –Incomplete coverage –Change in computer systems over time
38
Administrative Data: Strengths and Limitations Participation: Usually excellent for characterizing participants in programs because it measures “real” involvement and level of involvement. –Low incidence phenomena -- Geographic incidence -- Administrative details of participation Participation and Population – Not so good for describing how participants relate to populations – the selection and denominator problems: –Selection: Participants are a self-selected group –Denominator: What fraction of people eligible for a program actually get it? How do participants compare to overall population?
39
Administrative Data: Strengths and Limitations Data Quality: Good for those things related to business purpose; not good for other things. Problems of “Legacy” Computer Systems and Poor Documentation – Data only available in old (e.g., COBOL) or proprietary data systems. Documentation very old or non-existent.
40
Examples Date of birth on: –Voting dataset –Job application dataset Education level on a: –Voting dataset –Job application dataset Income from job on a: –Tax dataset –Welfare dataset –Social security dataset
41
Administrative Data: Data Quality Motivation for collecting data? System for auditing data? Data entered by frontline worker? Edit checks in the information system? Analyses done in past for system? Are items critical for agency mission?
42
Linked Social Services Data in American States--1999
43
What was done to get data (abbreviated)
44
SIPP: Attempts to obtain SSN for all in the household CPS: Requests SSN for all persons aged 15+ in the household CES and LEHD assign “Protected Identification Keys” based on SSN MEDS: Obtains SSN for (almost) all Medi-Cal eligible persons EDD: Obtains SSN and wages from employers of UI/DI covered employees CES provides “crosswalk” between PIKs and publicly available identifiers for CPS and SIPP CES provides “anonymized” MEDS And EDD Base Wage records identified only by PIK Survey records and Administrative records are merged using the PIK to create a “matched” file Census Survey DataState Administrative Data Final Linked File
45
Administrative Data: Types of Data Linkage Linking over time Different information data sets across service areas Linking survey data to administrative data when survey is drawn from administrative dataset Linking sample data to administrative data when sample is independent Linking to contextual data (e.g., by place, organization, etc.)
46
Methods of Linkage Probabilistic Deterministic
47
Confidentiality How much information do we need to identify someone? Consider: Name SSI Address Geographic area Race Date of birth; Age Gender, Race, Geographic area, Date of Birth
48
Conclusions Exciting Possibilities –Internet Interviewing –Computerized administrative Data With Some Real Possibilities –Descriptive inference; Causal inference –Populations of people –Lots of linked information
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.