CPS Linking & Validation

Slides:



Advertisements
Similar presentations
Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.
Advertisements

Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Tobacco Use Supplement To The Current Population Survey Users’ Workshop June 2009 Tips and Tricks of Handling the TUS Data James “Todd” Gibson Information.
The Characteristics of Employed Female Caregivers and their Work Experience History Sheri Sharareh Craig Alfred O. Gottschalck U.S. Census Bureau Housing.
Time Use Surveys in Canada Jodi-Anne Brzozowski Statistics Canada International Seminar on Time Use September , 2010.
An Assessment of the Cohort-Component-Based Demographic Analysis Estimates of the Population Aged 55 to 64 in 2010 Kirsten West U.S. Census Bureau Applied.
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
12th Meeting of the Group of Experts on Business Registers
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
1 POPULATION PROJECTIONS Session 8 - Projections for sub- national and sectoral populations Ben Jarabi Population Studies & Research Institute University.
Longitudinal Data Recent Experience and Future Direction August 2012.
1 Overview of the National Long Term Care Survey (NLTCS) Conference on Chinese Healthy Aging and Socioeconomic Development Durham, NC August, 2004 Nicholas.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Current Population Survey Sponsor: Bureau of Labor Statistics Collector: Census Bureau Purpose: Monthly Data for Analysis of Labor Market Conditions –CPS.
Supplement Your Reporting With CPS and SIPP Robert Gebeloff The New York Times McCormick SRI: Going Deep with Census Demographic.
American Community Survey “It Don’t Come Easy”, Ringo Starr Jane Traynham Maryland State Data Center March 15, 2011.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-II Consistency check.
2009 Survey of Disability, Ageing and Carers (SDAC) – emerging data Presentation to Carers NSW Biennial Conference 17 March 2011 Steve Gelsi Assistant.
INFO 7470 Statistical Tools: Edit and Imputation Examples of Multiple Imputation John M. Abowd and Lars Vilhuber April 18, 2016.
WP5.4 – Transnational transfer of experiences Overview of methodology for population projections Jana Suklan – Piran.
Utility of an overlapping panel design in the MEPS Steven B. Cohen, Ph.D.
Which socio-demographic living arrangement helps to reach 100? Michel POULAIN & Anne HERM Orlando 8 January 2014.
Man-Yee Kan, University of Oxford Heather Laurie, University of Essex Who is doing the housework in multicultural.
Measuring International Migration: An Example from the U. S
Annual Meeting of the Retirement Research Consortium
3 Research Design.
Session 15 Merging Data in SPSS
Sex and Money: exploring how sexual and financial stressors, perceptions and resources influence marital stability for men and women David B. Allsop,
Anne Case and Angus Deaton
Is retention on ART underestimated due to patient transfers
Rabia Khalaila, RN, MPH, PHD Director, Department of Nursing
Module 9 Designing and using EFGR-responsive evaluation indicators
The Four Techniques for Gathering Data
Statistics Netherlands Division Social and Spatial Statistics
Enrique Ramirez1, Julie Morita1
National Center for Education Statistics
Data Entry and HQ volunteers
Demographic Data in the RDC
Retirement Prospects for Millennials: What Is the Early Prognosis
Eliminating Reproductive Risk Factors and Reaping Female Education and Work Benefits: A Constructed Cohort Analysis of 50 Developing Countries Qingfeng.
LISA, Anticipating the Next Generation of Longitudinal Data
IPUMS CPS Summer Data Workshop June 4, 2018 Jose Pacas
Haksoon Ahn, PhD Associate Professor
Linking to ATUS.
Families, Poverty, and Other Features of IPUMS-CPS
Job Satisfaction What are today’s employees looking for in work?
IPUMS CPS Summer Data Workshop June 4, 2018 Kari Williams
LISA, Anticipating the Next Generation of Longitudinal Data
The European Statistical Training Programme (ESTP)
Central Statistics Organization
POPULATION PROJECTIONS
Tabulations and Statistics
Haksoon Ahn, PhD Associate Professor
Hannu Pääkkönen – Group coordinator
Section 1: What are longitudinal studies?
Demographic Analysis and Evaluation
WORKSHOP ON THE DATA COLLECTION OF OCCUPATIONAL DATA Luxembourg, 28 November 2008 Occupation as a core variable in social surveys Sylvain Jouhette
IPUMS-International Integration Process
Sub-regional workshop on integration of administrative data, big data
Weights What’s delivered with the original data and what to do about weighting linked data.
BMTRY 738: The Study Population
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
UT-Austin FSRDC Grand Opening December 13, 2017
The Implications of Misreporting for Longitudinal Studies of SNAP
Adult Education Survey progress report Point 6
Andrew Jenkins and Rosalind Levačić
Census topics selection
Quality assurance and assessment in the vital statistics system
Chapter 5: The analysis of nonresponse
Presentation transcript:

CPS Linking & Validation

Opportunities and Challenges Linking CPS Data Opportunities and Challenges

Opportunities Large sample sizes Good coverage of subpopulations Short-run changes in employment and families; reactions to births and losses (death, divorce) Combine rich sources of information about different topics Work and volunteering; veterans and employment

Same Month One Year Apart Useful for studying year-to-year change Earnings and employment dynamics, Geographic mobility, Movement into and out of labor unions For the 1994-forward period, researchers can expect: Half of respondents in any selected month in year 1 should also be in the CPS in year 2. March 1994-1995 March 2009-2010 Linked N 48,140 53,486 Retention Rate 69.4% 78.8%

Single Cohort, All 8 Months Useful for studying short-term dynamics Change in economic arrangements as a function of social and demographic characteristics Changes in these relationships over time For the 1994-forward period, researchers can expect: Theoretically, 100% of cases in a cohort can be linked across 8 months. But, migration, mortality, and non-response are factors contributing to a lower sample retention rate. Cohort Entering January 1994 Cohort Entering January 2009 Linked N 10,069 11,528 Retention Rate 59.7% 68.0%

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: Rotation Pattern CPS is NOT a longitudinal survey that follows one or several cohorts as they age CPS does have a panel component where individuals are observed up to 8 times over 16 months, but the way they move through the survey is in a 4-8-4 rotation

Challenges: Rotation Pattern Enter IPUMS CPS RoPES Motivation Easily see CPS rotation pattern Explore what topical supplements can be examined together Name it!

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: Linking Keys Changes in variables needed to link Duplicate and recycled identifiers Feng 2001 Suppressed in some years

Challenges: Linking Keys January 1989 to December 1993 January 1994 to April 2004 May 2004 + HRHHID PULINENO STATEFIPS HUHHNUM HRSAMPLE HRHHID2 HRSERSUF **Transformations detailed in Drew, Flood & Warren 2014

Challenges: Linking Keys CPSID(P) – an IPUMS-created unique identifier Bridges changes in variable names as well as the technical aspects of how to link Unique for 1976 forward Identifies “mechanical” matches Drew, Flood, & Warren, 2014 Journal of Economic and Social Measurement

CPSID(P) Limitations Mechanical Created solely based on linking keys Does NOT condition on AGE, SEX, RACE Linking keys should work, but it is always good to check

CPSID(P) Limitations Not available for all respondents in ASEC…only those who are also in the March Basic 1970s and 1980s issues Some files can’t link 1977 supplements and basics Duplicate identifiers (1976-1983) Duplicates *never* link to adjacent months in CPSID Children are in some supplements but not basics In these years, children never link

Future Linking Work ASEC to ASEC to pull in oversamples A version of CPSIDP conditional on AGE, SEX, RACE matches

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: Same variables, different codes IPUMS! Harmonized across time to deal with changes in codes Use it for good, never for evil!

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: Non-response Households that were eligible to participate in CPS but did not because of non-response, death, migration Be careful! Make sure you’re linking what you think you’re linking Double check MIS combinations Look at merge results Imputation

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: ASEC Differently named variables, including linking keys, than basic monthly MARBASECID (1989+) Variable to link ASEC and March basic monthly Allows us to put CPSIDP on ASEC Flood & Pacas 2017 Journal of Economic and Social Measurement Reference periods

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Challenges: Data Management Can quickly find yourself managing a large number of files if you’re: Not using IPUMS Performing merges for linking Strategies for dealing with this include: Loops Temporary files Long data format

Challenges Rotation pattern Linking keys Same variables with different codes Non-response Another layer of complexity if ASEC is part of design Data management

Approaches, Our Assumptions, and a Note on Data Structure Validation Approaches, Our Assumptions, and a Note on Data Structure

Validation: Approaches Because CPSIDP is mechanical, this is a step we recommend you do if you are linking CPS data across months No right or wrong way to do this, but there are a couple approaches in the literature Use AGE, SEX, RACE (Madrian & Lefgren) Incorporate DQ flags Use Bayesian approach (Feng) Argues that some people are lost with first approach; this yields higher linkage rates

Validation: AGE, SEX, RACE Rules If the interviews you are linking are both in MIS 1-4 or MIS 5-8, difference between AGE at two time points is 0 or 1 If the interviews you are linking come from both MIS 1-4 and MIS 5-8, difference between AGE at two time points is 0, 1, or 2 Beware of AGE codes 80 and 85! Allow for a greater AGE increase. Age 80=80-84. Age 85=85+. SEX & RACE No change allowed

Validation: Age Rules MIS AGE 1 2 3 4 5 6 7 8 18 19 20 50 51 79 80 84 85 Can age 0 or 5 years compared to MIS 1 Can age 0 or 5 years compared to MIS 1 Can age 0 or 1 years compared to MIS 1 Can age 1 or 2 years compared to MIS 1

Validation: Our Assumptions For however many observations you are linking SEX and RACE may not change AGE may only change in expected ways Changes in AGE are always compared to the first time point being linked

Validation: Our Assumptions AGE, SEX, and RACE must ALL match in expected ways – GOOD! MISH AGE SEX RACE Six 21 Female White Seven Eight

Validation: Our Assumptions AGE, SEX, and RACE must ALL match in expected ways – BAD! MISH AGE SEX RACE Six 21 Female White Seven Asian only Eight

Validation: Data Structure Validation gets complicated quickly We want you to try to write validation code in the lab to think through validation We will provide validation code for you based on our all or nothing assumptions about AGE, SEX, and RACE for data in both Long format Wide format Just to be sure that people are familiar with long versus wide format data, I’ll show you an example.

Validation: Long Format For each individual # records = # times observed MISH AGE SEX RACE 1 21 Female White 2 3

Validation: Wide Format For each individual # records = 1 regardless of # times observed AGE SEX RACE 1 2 3 21 Fem White

Long to Wide Format Data MISH AGE SEX RACE 1 21 Female White 2 3 Wide AGE SEX RACE 1 2 3 21 Fem White

Questions?