Disseminating integrated census microdata to academic researchers and policy makers at no cost (plus we pay US$1-5,000 per census to the NSO-owner) * * * Robert McCaa Minnesota Population Center
IPUMS-International, 2009 dark green = disseminating medium green = integrating lightest green = negotiating Mollweide projection Special thanks to: CSO-Vietnam NIS-Cambodia BPS-Indonesia PCO-Pakistan NBS-China NSSO-India BBS-Bangladesh DOS-Malaysia NSO-Mongolia CBS-Nepal NSO-Philippines NSO_Thailand
Integrating Asian Census Microdata dark green = disseminating medium green = integrating lightest green = talking Respectful invitation to the National Statistical Offices of: Afghanistan Bhutan Iran DPR Korea DPR Laos Maldives Sri Lanka Timor Leste
Outline: Disseminating Census Microdata 1. What are census microdata 3 slides 2. Electronic archiving of census microdata: do it now! 4 slides 3. Why are census microdata essential? 2 slides 4. IPUMS-International: invitation to participate 10 slides What is IPUMS? What are the benefits? How are the integrated metadata and microdata constructed and accessed? 5. Conclusions 3 slides
1. What are census microdata? And how do they differ from “raw data”? (3 slides)
16th century Aztec census written on fig-bark paper, in Nahuatl, will survive another 500 years Sources: Museo Nacional de Antropología e Historia (Mexico City). "Libro de Tributos," Colección Antigua, ms. 549 bis. Sarah Cline, The Book of Tributes. Early Sixteenth-Century Nahuatl Censuses from Morelos. Los Angeles: manuscripttranscribed translated and converted to microdata
What are “census microdata”?: anonymized, computerized census records of individuals, households & dwellings Easier to integrate than tables. Study any desired set of characteristics. Facilitates comparative research.
How do census microdata differ from “raw data”?: 1. detailed geography is suppressed and 2. strict measures are implemented to protect privacy of individuals, households, dwellings & other entities Note absence of detailed geography
2. Digital Archiving (4 slides) Census Tablet (digital image): Assyria, 2700 B.P. Library of King Ashurbanipal
Bangladesh Bureau of Statistics Tape Archive photo: April 14, : Census data on most of these tapes were recovered.
Archiving: no longer a problem for recent censuses --generally excellent in Asian agencies I have visited-- » Documentation (forms, instructions, definitions, dictionaries, methodological reports, etc.): Preserve at least two copies in at least 2 institutes Census docs: Send 1 copy pre-paid courier to MPC » Paper ».PDF » and one of the following:.HTML,.DOC,.XLS, or.TXT » DATA Preserve at least two copies in at least 2 institutes on the most stable media (CD and Servers) Census microdata: send copy pre-paid courier to MPC » Un-edited “Raw Data” (ASCII) » Edited Data (ASCII) 1981 census of Bangladesh 3 tapes containing microdata Even the moldy one was recovered!!!!
R E C O V E R S Centro Latino Americano y Caribeño de Demografía (CELADE: Santiago, Chile) ~3000 microdata tapes recovered and fully documented (funded by NSF) IPUMSiIPUMSiIPUMSiIPUMSi
R E C O V E R S Centro Latino Americano y Caribeño de Demografía (CELADE) ~3000 microdata tapes recovered and fully documented (funded by NSF) IPUMSiIPUMSiIPUMSiIPUMSi IPUMS now has largest collection of census documentation in the world, having acquired paper/electronic archives from: » United Nations Statistical Division » United Nations Population Division » CELADE (Latin America) » East-West Center (Asia/Pacific) » U.S. Census Bureau International Programs Center
Archived census microdata by region and decade % of censuses conducted inventory by IPUMS-International Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: March 15, 2009Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: March 15, 2009http:// Region/continentCountries2000s1990s1980s1970s 1960s Latin America21100% 89%81%72% North America27100%72%64%24%8% Africa58100%53%46% 25% 2% Asia44100%54%34%30%13% Europe46100%67%55%41%13% Pacific (pop>.5m)7100% 43%29%
Archived census microdata by region and decade % of censuses conducted inventory by IPUMS-International Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: March 15, 2009Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: March 15, 2009http:// Region/continentCountries2000s1990s1980s1970s 1960s Latin America21100% 89%81%72% North America27100%72%64%24%8% Africa58100%53%46% 25% 2% Asia44100%54%34%30%13% Europe46100%67%55%41%13% Pacific (pop>.5m)7100% 43%29% What Asian census microdata and documentation still exist …for the 1960s? …1970s? …1980s? …1990s? » How much will be lost before they can be recovered, documented and archived? » Help preserve these treasures now—IPUMS pays costs of shipping and recovery.
3. Why is the dissemination of census microdata essential? (2 slides)
Julia Lane, European Statisticians Conference (2003) 6 benefits from disseminating microdata » 1. Analyze more realistic questions » 2. Acquire new constituencies and stakeholders » 3. Build trust; reduce suspicion » 4. Replicate findings » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 5. Calculate marginal effects » 6. Assess data quality » …and much, much more….
UNSD Principles and Recommendations (Rev. 1, 1997) endorse dissemination of census microdata » §1.218: “There are a range of methods…that can be used to make such microdata available while still protecting individuals’ rights to privacy.” » 2006 Africa Symposium on Statistical Development (Cape Town, Jan 30-Feb. 2, 2006) » “microdata may be disseminated provided that confidentiality is preserved” » Most (all?) advanced statistical agencies make census microdata available (some more widely than others). Since the: » 1960s: USA, Finland, France, Korea, plus 18 Latin American countries » 1970s: Canada, Czechoslovakia, Japan, Malaysia, Norway, Philippines » 1980s: Australia, Italy, Spain, Thailand, plus many Asian countries » 1990s: Germany, Russia, Switzerland, UK, plus many other countries » In four decades of distributing census microdata there is not a single allegation of violation of confidentiality or privacy.
4. Invitation to participate in IPUMS-International (10 slides)
What is IPUMS-International? …a global collaboratory of National Statistical Institutes & Universities to: » 1. Inventory the world’s census microdata » 2. Archive census microdata and documentation * * * » 3. Integrate census microdata » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 4. Anonymize census microdata to preserve statistical confidentiality, using highest standards » 5. Disseminate restricted access, custom extracts to approved researchers/research projects at no cost
IPUMS-International (2009): 130 high precision samples 44 countries, million person records Country Censuses SamplesFrance Netherlands Argentina Ghana Palestine Armenia Greece Panama Austria Guinea(Conakry) Philippines Belarus Hungary Portugal Bolivia India Romania Brazil Iraq Rwanda Cambodia Israel Slovenia Canada Italy South Africa Chile Jordan Spain China Kenya Uganda Colombia Kyrgyz Republic United Kingdom Costa Rica Malaysia United States Ecuador Mexico Venezuela Egypt Mongolia Vietnam
IPUMS-International strengths 1. Uniform legal authorization with each National Statistical Office 2. Access restricted to bona fide researchers with need 3. MPC Experienced integration teams 4. MPC Proven web-based distribution system 5. High user satisfaction 6. NSO: Improved research and empirically based policy-making 7. Sustainable: NSF, NIH funded through 2014
Legal: NSO (Austria) and U. of Minnesota
0NIUXXXX ACTIVE (In Labor Force) 100 EMPLOYED, not specified EMPLOYED, not specified···· 110 At work At workXXXX 111 At work, and 'student' At work, and 'student'···X 112 At work, and 'housework' At work, and 'housework'···X 113 At work, and 'seeking work' At work, and 'seeking work'···X 114 At work, and 'retired' At work, and 'retired'···X 115 At work, and 'no work' At work, and 'no work'···X 116 At work, and 'other' At work, and 'other'···X 117 At work, family holding, not specified At work, family holding, not specified···· 118 At work, family holding, not agricultural At work, family holding, not agricultural···· 119 At work, family holding, agricultural At work, family holding, agricultural···· 120 Have job, not at work last week Have job, not at work last weekXXXX IPUMS—Microdata integration method: composite codes (multiple digits) retains not only significant distinctions but also integrates comparable concepts
Metadata : Employment Status Metadata : Employment Status EMPSTAT Employment status Description EMPSTAT indicates whether or not the respondent was part of the labor force -- working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information. The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not for others. Employment status is sometimes referred to in other sources as "activity status." Comparability -- General The age of persons to whom the question applies varies across the samples (see Universe). The reference period for the employment status question varies. For most samples, employment status was reported with respect to the day of the census or…
Comparability -- Mexico The universe and reference period are fully comparable across the Mexico samples. The 1970 Census did not provide detail on the inactive population except for "houseworkers," while the later samples have numerous subcategories. In 1990, the employment status question refers to "Principal Activity" and therefore under- reports secondary economic activity by students, housewives, family-workers, the semi- retired, and others. The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc.), but who in fact were working according to international definitions. A second question introduced for the first time in 2000 sought to capture this secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (coded “at work and …”) should be considered "inactive." … Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round holes
6 steps using 1. Logon w/ password 2a. Study documentation 2b. Design extract 3. Receive ; logon with p/word 4. Download extract (SSL encrypted) 5. UnZip data (also SAS, STATA) 6. Analyze
Asian initiative work plan (3 years): 1. Establish legal foundations: negotiate Memorandum of Understanding: National Statistical Institute (NSI) & Minnesota Population Center (MPC) 2. Entrust copies of microdata and documentation to project (NSI) 3. License microdata (MPC pays US$5,000 per census to NSI, upon receipt of microdata, documentation and invoice) 4. Design regional harmonization protocols census-by-census, concept-by-concept, code-by-code and write integrated metadata (MPC) 5. Impose confidentiality protections customized for each census 6. Disseminate microdata to licensed users (MPC, NSI) free of charge 7. Cooperate with regional partners in education and training Project pays all costs, including: » License fee to participating National Statistical Institute » Producer/User workshop, Durban, South Africa, 2009 (ISI)
IPUMS-EurAsia: Will your statistical institute participate? » Formalities: 1. Sign Memorandum of understanding 2. Entrust Microdata and documentation to project 3. Collect license fee » 2009+: advise on technical details as needed; workshops as funding permits » 2011: ISI meeting, Dublin, Ireland Inauguration of integrated database with 180 census samples » 2013: ISI meeting, Hong Kong SAR, China Inauguration of integrated database with 220 census samples
5. Conclusions (3 slides)
Benefits from Disseminating Census Microdata » National Statistical Institutes 1. Manage statistics for more equitable societies 2. Increase trust, transparency and stakeholders 3. Increase usage, better science and policy 4. Enhance cost-benefit ratio 5. Little marginal cost (project pays $5,000 per census) » Citizens, Society and Government: 1. Who we are 2. What the future may bring 3. How policies might be improved
Integrating Asian Census Microdata dark green = disseminating medium green = integrating lightest green = talking Respectful invitation to the National Statistical Offices of: Afghanistan Bhutan Iran DPR Korea DPR Laos Maldives Sri Lanka Timor Leste
What needs to be done to participate? » National Statistical Office: 1. Endorse project Memorandum of Understanding--80+ countries 2. Entrust copies of documentation & microdata to MPC--75+ countries 3. Invoice for each census $1,000 per census for less than one million person records $5,000 per census for one million or more person records » MPC: 1. Endorse project Memorandum of Understanding– Afghanistan?, Bhutan?, Iran?, DPR Korea?, DPR Laos?, Maldives?, Timor Leste? 2. Pay license fee for microdata and documentation– Indonesia!! 3. Digitize metadata and translate to English– Pakistan!! 4. Harmonize microdata– Cambodia!! 5. Disseminate microdata with copies on CDs to each NSO– Vietnam!!
Thank you!! additional information at: * * * * * * Contact: