Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
Overview Processing Dissemination system Strengths and limitations Users Summation IPUMS-International
END Matt Sobek Minnesota Population Center
Census data samples What is IPUMS-International? Integrated Public Use Microdata Series – consistent codes and labels – anonymized – users download – individual-level – 1960 to present – pooled data
IPUMS 1992 – Steve Ruggles Bob McCaa IPUMS-International 1999 Latin America, Europe, Extension Background
Map of IPUMS Partners Dark green = disseminating data Light green = partners, not yet disseminating 83 countries
Current Countries in IPUMS 35 countries 111 samples 263 million persons Egypt Ghana Kenya Rwanda South Africa Uganda Cambodia China Iraq Israel Malaysia Palestine Philippines Vietnam Argentina Brazil Canada Chile Colombia Costa Rica Ecuador Mexico Panama United States Venezuela Austria Belarus France Greece Hungary Netherlands Portugal Romania Spain United Kingdom Most countries have multiple samples Africa Asia Americas Europe
IPUMS Microdata Relation to head Marital status Literacy Occupation
Selected Variable Topics Basic demographics Marriage Family structure Fertility and mortality Migration Ethnicity, language, religion Education Work Income Housing characteristics 475 Integrated variables 9052 Unharmonized variables
User Access Application Scholarly and educational purposes Key: it must not be redistributed Once approved, access to all data Free
Making the IPUMS Pre-processing Integration Dissemination
Making the IPUMS Pre-processing Integration Dissemination Language translation Reformatting Error correction Sampling Confidentiality
Making the IPUMS Pre-processing Integration Language translation Reformatting Error correction Sampling Confidentiality Metadata Data harmonization Constructed variables Dissemination
Census Questionnaire (Mexico 2000) Water Access
Editable Census Questionnaire
Water access XML-Tagged Census Questionnaire
Data Integration – Marital Status China1982Colombia1973Kenya1989Mexico1970U.S.A.1990
PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a PernumRelateAgeSexMarstChborn 1head46malemarriedn/a 2spouse44femalemarried3 3aunt77femalewidow7 4child15femalesingle0 5child13femalesinglen/a 6child11malesinglen/a Spouse’s Mother’sFather’s Family Interrelationship Variables Location (Simple household)
Attached Characteristics Spouse’s age Mother’s location Employment status Mother’s Employment status Spouse’s location Age
IPUMS Home Page
Variables Page Variable browsing
Variables Page
Sample Filtering
Variables Page
Unharmonized Variables
Variables Page
Variable Description (Marital status)
Comparability Discussion (Marital status)
Variable Description (Marital status)
Enumeration Text (Marital status)
Enumeration Text (Marital status, Cambodia)
Variable Description (Marital status)
Variable Codes (Marital status)
Variable Codes (Marital status)
Variable Description (Marital status)
Unharmonized Input Variables (Marital status)
IPUMS Home Page
Extract Step 1 – Login
Extract Step 2 – Select Samples
Extract Step 3 – Select Variables
Extract Step 4 – Variable Options
Extract Step 4 – Select Cases
Age of spouse Employment status of father Occupation of father Extract Step 4 – Attach Characteristics
Extract Step 5 – Customize Sample Sizes
Extract Step 6 – Submit
Download or Revise Extract
Key Strengths of the Census Samples Internationally comparable Pool data across countries – integrated variables Enable study of relatively small populations Large Temporal depth Provide historical perspective
Key Strengths of the Census Samples Microdata All of a person’s characteristics – multivariate analysis Hierarchical Characteristics of everyone a person resided with Cohabitation and family interrelationships
Limitations Due to Confidentiality Geography 20,000 population or larger Sensitive variables, very small categories Samples Too small to answer some questions
Other Issues and Limitations Varying census years Cross-sectional data Not longitudinal User burden Information overload; culturally specific knowledge Variable labels are insufficient Very large data
Academic field (%) 47Economics 21Demography 10Sociology 22Other IPUMS Users 54% Graduate students 2000 registered users
67% multiple samples 45% multiple countries Samples Extracted 17% 5 or more countries
Decade of Extracted Sample 1960s s s s s 29 Decade Percent
Most Frequently Extracted Countries 1. Mexico 2. Brazil 3. United States 4. Colombia 5. France 6. Chile 7. Ecuador 8. Vietnam 9. Kenya 10. Argentina
Summation Living project Democratized access World’s largest collection of census data 200 samples in another 5 years Ongoing nature of project limits us in some respects Allows us to correct errors and improve Most data are not otherwise accessible New opportunities for comparative research Entire system is designed to encourage comparisons We welcome your feedback
Percent in Labor Force Mexico Costa Rica Ecuador Chile Venezuela Colombia Brazil Married Female Labor Force Participation in Latin America (age 18 to 65)
Percent in Labor Force Latin America United States Married Female Labor Force Participation: Latin America and U.S. (age 18 to 65)
Percent in Labor Force United States Mexico Costa Rica Ecuador Chile Venezuela Colombia Brazil Married Female Labor Force Participation: Latin America and U.S. (age 18 to 65) Compare Latin America to U.S. 40 years earlier
Married Female Labor Force Participation: Mexican-born Women, Percent in Labor Force Mexican-born Women in United States Women in Mexico
Percent of elders in intergenerational families
Percent in elder-head intergenerational families
Percent in younger-head families