Integrated Public Use Microdata Series IPUMS Internationalwww.ipums.org Matt Sobek Minnesota Population Center
IPUMS Overview 1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Access 5. Strengths and Limitations 6. Dissemination 1. What is the IPUMS
IPUMS-USA Steve Ruggles All existing samples of US census All existing samples of US census Data extraction system 1998 Data extraction system 1998 Bob McCaa IPUMS-International IPUMS-Latin America 2004 IPUMS-Latin America 2005 IPUMS-Europe 2005 IPUMS-Europe 2005 NSF Expansion 2005 NSF Expansion World’s largest collection of census data 200 million records and growing 200 million records and growing 70 countries have agreed to join the project 70 countries have agreed to join the project Brief History
Datasets in IPUMS
May 2008 Data Release
Sample Sizes
African Datasets in IPUMS Archive Further agreements: Ethiopia, Lesotho, Tanzania
Khartoum, CBS-Sudan
Dhaka, Bangladesh Bureau of Statistics
Non-African Countries in IPUMS Archive
IPUMS Global Coverage
Selected Variable Availability -- PERSON
Selected Variable Availability -- HOUSEHOLD
What Are Microdata? Individual-level data every record represents a separate person all of their individual characteristics are recorded “raw” data that must be analyzed Different from aggregate/summary/tabular data a count of persons by municipality an employment status table by sex from a published census volume
Kenya 1999 Census Questionnaire
Raw Census Microdata from IPUMS
IPUMS Data Structure Household record (shaded) followed by a person record for each member of the household Relationship Age Sex Race Birthplace Mother’s birthplace Occupation
The Advantages of Microdata Combination of all of a person’s characteristics Characteristics of everyone with whom a person lived Freedom to make any table you need Freedom to make models examining multivariate relationships Basically, you are only limited by the questions asked in the particular census
1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Access 5. Strengths and Limitations 6. Dissemination IPUMS Overview
Translation Table – Marital Status China1982Colombia1973Kenya1989Mexico1970U.S.A.1990
General Codes
Variable Description: Literacy
1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Access 5. Strengths and Limitations 6. Dissemination IPUMS Overview
PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a Spouse’s Mother’sFather’s IPUMS “Pointer” Variables Location (Simple household)
PernumRelationshipAgeSexMarstChborn 1head53femaleseparated6 2child28malesinglen/a 3child22malesinglen/a 4child21malesinglen/a 5child25femalemarried2 6child-in-law28malemarriedn/a 7grandchild3malesinglen/a 8grandchild1malesinglen/a 9non-relative32femaleseparated2 10non-relative10malesinglen/a 11non-relative5femalesinglen/a Location Spouse’sFather’sMother’s IPUMS “Pointer” Variables (Complex household)
1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Access 5. Strengths and Limitations 6. Dissemination IPUMS Overview
IPUMS Access Restricted access Scholarly and educational purposes Conditions of use: key is not to redistribute Serious vetting
1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Access 5. Strengths and Limitations IPUMS Overview 6. Dissemination
4 Key Strengths of the Census Samples National in scope Results not subject to local peculiarities Provide context for local studies More cases than any comparable datasets Enable study of relatively small populations Large Temporal depth Provide historical perspective Microdata Can make your own tabulations Apply multivariate techniques
Limitations of the Census Samples Confidentiality Geography 20,000 population or larger Sensitive variables, swapping, etc Samples Too small to answer some questions
Other Issues and Limitations Not annual Any temporal analysis will have gaps Cross-sectional data Not longitudinal Need knowledge of a statistical package User burden Information overload; culturally specific knowledge Very large extracts
1. What is the IPUMS 2. Harmonization 3. Additional Data Enhancements 4. Users and Access 5. Strengths and Limitations IPUMS Overview 6. Dissemination
Web Dissemination System