Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roundtable on Archiving and Disseminating official statistics with a focus on census microdata Example: IPUMS-International * * *

Similar presentations


Presentation on theme: "Roundtable on Archiving and Disseminating official statistics with a focus on census microdata Example: IPUMS-International * * *"— Presentation transcript:

1 Roundtable on Archiving and Disseminating official statistics with a focus on census microdata Example: IPUMS-International http://www.ipums.org * * * Robert McCaa, Professor of Population History and Wendy L. Thomas, Archivist, University of Minnesota Population Center rmccaa@umn.edu This.ppt, docs, & additional information at: www.hist.umn.edu/~rmccaa/ipums-africa http://www.ipums.org rmccaa@umn.edu http://www.ipums.org rmccaa@umn.edu

2 Our common fate on a crowded planet: new forms of global cooperation are required. We must engage interdisciplinary research combining theory and practice. --Jeffrey D. Sachs, Common Wealth (Penguin 2008)

3 A Census Microdata Revolution 1. Preserve all microdata and documentation 20 slides Product (tables and microdata) Process (of conducting census and producing census microdata) 2. Integrate microdata and metadata 8 3. Disseminate to researchers world-wide 3 Conclusion: strengths, challenges, 7 golden rules 4

4 A Census Microdata Revolution 1. Preserve all census microdata and documentation product and process:  1960s – present  ~100 countries (80 have endorsed IPUMS MoU)  ~400 censuses (219 are entrusted to IPUMS) 2. Integrate: both microdata and metadata 3. Disseminate to researchers world-wide— “extracts” of database: countries, censuses, sub-populations, sample size, variables

5 IPUMS-International Today dark green = already integrated: 35 countries, 111 censuses, 263 million person records green = to be integrated: 39 countries, 103 censuses, 150 mill. Mollweide projection

6 IPUMS dissemination calendar (see handout) samples for 35 countries available now, 74 soon » Europe 10:4 » Available (10): Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal, Romania, Spain, UK » Soon (4): Germany, Czech Republic, Slovenia, Switzerland » Americas (funding renewed July 1) 11:11 » Available (11): Argentina, Brazil, Canada, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, USA, Venezuela » Soon (11): Bolivia, Cuba, Dominican Republic, El Salvador, Guatemala, Honduras, Nicaragua, Paraguay, Peru, Puerto Rico, Uruguay » Africa 6:11 » Available (6): Egypt, Ghana, Kenya, Rwanda, South Africa, Uganda » Soon (11): Botswana, Ethiopia, Guinea (Conakry), Madagascar, Malawi, Mali, Mauritius, Sierra Leone, Sudan, Tanzania, Zambia » Asia 8:13 » Available (8): Cambodia, China, Iraq, Israel, Malaysia, Palestine, Philippines, Vietnam » Soon (13): Armenia, Bangladesh, Fiji, India, Indonesia, Jordan, Kyrgyz Republic, Mongolia, Nepal, Pakistan, Thailand, Turkmenistan

7 IPUMS timeline » 1995: IPUMS-USA first release of integrated microdata IPUMS-USA continues: 1850-2000 + ACS samples IPUMS-USA continues: 1850-2000 + ACS samples » 1999: IPUMS-International funded » 2002 - 1 st International release: 7 countries, including Colombia and Mexico » 2006: 20 countries, 63 censuses » 2008: 35 countries, 111 censuses » ~263 million person records » Two thousand users » 2013: ~70 countries, ~200 censuses » 214 sets of microdata are already entrusted to MPC » Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...

8 1. Preserve (Archive) IPUMS Global workshop, ISI (Lisbon, Aug 2007)

9 Microdata: Archiving & Disseminating The producer’s perspective (official statisticians):The producer’s perspective (official statisticians): –Archiving: Comprehensive preservation of both data and documentation (metadata) with easily searchable indicesComprehensive preservation of both data and documentation (metadata) with easily searchable indices Continually updated with technological innovation—hardware, software (doc, pdf, txt, xls, jpg, etc.) and wet-wareContinually updated with technological innovation—hardware, software (doc, pdf, txt, xls, jpg, etc.) and wet-ware –Disseminating: the web revolution The consumer’s perspective (researchers)The consumer’s perspective (researchers) –Access: locate and use on the web without obstacles –Disseminating: free access to anyone, anywhere, anytime (access postponed is access denied) What are your interests?What are your interests?

10 Microdata: Archiving & Disseminating Our perspective: “Archiving Census Microdata and Documentation: Preserving Memory, Increasing Stakeholders” (UNSD NYC, 2001) – copy of paper at ~rmccaa/ipums-africa“Archiving Census Microdata and Documentation: Preserving Memory, Increasing Stakeholders” (UNSD NYC, 2001) – copy of paper at ~rmccaa/ipums-africa –Long term, 7 keys: readable, intelligible, identifiable, encapsulated, understandable, reconstructable, authentic –What to preserve: the product and the process –How to assess future value: stakeholders, future impact, anticipated use, informing the future –Challenges: archive, plan, trained staff, external repository

11 Preservation, the problem: 1973 census tapes of Sudan were at risk!

12 A Solution: Data recovery (by a specialized data recovery company)

13 >3,000 tapes recovered: 1971 Germany 1980 Mexico, Mali 76, Sudan 73 and many more Microdata on this tape were recovered!! Data recovery. Example: Bangladesh Bureau of Statistics--1981 census, 276 tapes, recovery in Aug. ‘08)

14 Census Microdata: 1950s few countries archived microdata (a country in green indicates microdata exist for the decade) see: www.hist.umn.edu/~rmccaa/IUMSI/country6.htm Mollweide projection

15 Census Microdata: 1960s The Americas: in the vanguard for preservation of microdata Mollweide projection

16 Census Microdata: 1970s the preservation of microdata was almost universal in the Americas and was becoming widespread in Europe, Africa and Asia Mollweide projection Mali, 1976: census microdata recovered from old Bernoulli boxes

17 Census Microdata: 1980s The preservation of microdata became generalized Mollweide projection Ghana, 1984: census microdata recovered from floppy discs!

18 Census Microdata: 1990s many countries preserved microdata (or are disposed to recover them) Mollweide projection

19 Census Microdata: 2000s many countries have microdata (or are disposed to make them available for research) Mollweide projection

20 Inventory of census microdata archived by region and decade (% of censuses conducted) Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htmNote: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htm Region/continentCountries2000s1990s1980s1970s 1960s Latin America21100% 89%81%72% North America2791%72%64%24%8% Africa5815%22% 25% 15% 2% Asia44?%54%31%30%13% Europe46?%67%55%41%13% Pacific (pob>.5m)7100% 43%29%

21 1.Census Questionnaires (forms): dwellings, households, persons, mortality, migration, etc. 2.Enumerator instructions 3.Data Dictionaries (layouts) 4.Codebooks a.Geographic codes b.Occupation / Industry / Education codes 5.Data processing protocols 6.Official Statistics 7.Official Reports (Analytical, Technical, Methdological) 7 Essential Types of Metadata for Each Census See IPUMS Documentation (“Table 1”)

22 7 Essential Types of Metadata for Each Census Example: Ghana www.hist.umn.edu/~rmccaa/ipums-africa

23 7 Essential Types of Metadata for Each Census Example: Guinea (Conakry) www.hist.umn.edu/~rmccaa/ipums-africa

24 2. Integration: Microdata and Metadata

25 IPUMS integration of metadata and microdata » Comprehensive documentation, including » Data dictionaries and codebooks » Complete original source documentation in the official language: questionnaires, manuals, etc. » All translated to English (from the German--thanks again to Martin Podehl!!) and converted into metadatabase for each census » Integration ≠ standardization » Composite codes (11, 12, 21, 22…) ≠ serial codes (1, 2, 3, …) (see next slide)

26 ChileMéxicoCodeLabel1992200219902000 0NIUXXXX ACTIVE (In Labor Force) 100 EMPLOYED, not specified EMPLOYED, not specified···· 110 At work At workXXXX 111 At work, and 'student' At work, and 'student'···X 112 At work, and 'housework' At work, and 'housework'···X 113 At work, and 'seeking work' At work, and 'seeking work'···X 114 At work, and 'retired' At work, and 'retired'···X 115 At work, and 'no work' At work, and 'no work'···X 116 At work, and 'other' At work, and 'other'···X 117 At work, family holding, not specified At work, family holding, not specified···· 118 At work, family holding, not agricultural At work, family holding, not agricultural···· 119 At work, family holding, agricultural At work, family holding, agricultural···· 120 Have job, not at work last week Have job, not at work last weekXXXX IPUMS—Microdata integration method: composite codes (multiple digits) retains not only significant distinctions but also integrates comparable concepts

27 ChileMéxicoCodeLabel1992200219902000 0NIUXXXX ACTIVE (In Labor Force) 100 EMPLOYED, not specified EMPLOYED, not specified···· 110 At work At workXXXX 111 At work, and 'student' At work, and 'student'···X 112 At work, and 'housework' At work, and 'housework'···X 113 At work, and 'seeking work' At work, and 'seeking work'···X 114 At work, and 'retired' At work, and 'retired'···X 115 At work, and 'no work' At work, and 'no work'···X 116 At work, and 'other' At work, and 'other'···X 117 At work, family holding, not specified At work, family holding, not specified···· 118 At work, family holding, not agricultural At work, family holding, not agricultural···· 119 At work, family holding, agricultural At work, family holding, agricultural···· 120 Have job, not at work last week Have job, not at work last weekXXXX IPUMS—Microdata integration method: composite codes (multiple digits) retains not only significant distinctions but also integrates comparable concepts Goal of integration coding scheme: Assist each researcher in making informed decisions on comparability—not to attempt to make the one best decision for all researchers.

28 Metadata : Employment Status Metadata : Employment Status EMPSTAT Employment status Description EMPSTAT indicates whether or not the respondent was part of the labor force -- working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information. The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not for others. Employment status is sometimes referred to in other sources as "activity status." Comparability -- General The age of persons to whom the question applies varies across the samples (see Universe). The reference period for the employment status question varies. For most samples, employment status was reported with respect to the day of the census or…

29 Metadata : Employment Status, example: Mexico Metadata : Employment Status, example: Mexico Comparability -- Mexico The universe and reference period are fully comparable across the Mexico samples. The 1970 Census did not provide detail on the inactive population except for "houseworkers," while the later samples have numerous subcategories. In 1990, the employment status question refers to "Principal Activity" and therefore under- reports secondary economic activity by students, housewives, family-workers, the semi- retired, and others. The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc.), but who in fact were working according to international definitions. A second question introduced for the first time in 2000 sought to capture this secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (codes 1101-1106) should be considered "inactive." … Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round holes

30 IPUMS integrated metadata: Instantly, compare text &/or image of enumeration forms and instructions for any combination of countries and censuses (example: educational attainment)

31 In addition… » Microdata: new high precision samples not only for contemporary censuses but also for historical ones (before the 90s) » Systematic metadata for all variables » Universes » Definitions » Comparability » Dynamic System—facilitates comparing the wording of questionnaires and instructions for any combination of countries and censuses

32 3. Dissemination

33 - Caution - IPUMS microdata are anonymized samples.IPUMS microdata are anonymized samples. –They are for advanced analysis and research. –Use of a statistical software is required. – Statistical software provides great power. – “With great power, comes great responsibility.” IPUMS samples are for analysis.IPUMS samples are for analysis. IPUMS samples are not official statistics.IPUMS samples are not official statistics.

34 6 steps using https://international.ipums.org/international: 1. Logon w/ password 2a. Study documentation 2b. Design extract 3. Receive email; logon with p/word 4. Download extract (SSL encrypted) 5. UnZip data (also SAS, STATA) 6. Analyze

35 Conclusion: IPUMS Strengths and Challenges plus 7 golden rules for promoting microdata revolution

36 The IPUMS team (Feb. 2008) (Not present: computer gurus, some researchers, and others who were too busy for a photo!) Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center

37 1. Uniform legal authorization with national statistical authorities 2. Access restricted to academics with need who agree to abide by stringent confidentiality protections 3. Sanctions against individual and institution—denial of access to all microdata for the entire institution 4. Experienced integration teams 5. Proven web-based distribution system 6. High user satisfaction with microdata & metadata 7. Sustainable funding: NSF, NIH IPUMS-International strengths

38 5 Challenges 1. Microdata to recover (30 countries), integrate (60 countries) 2. 2010 round of censuses (~100 countries) 3. Tabulator (research tool—not official stats) 4. GIS 5. High security laboratory for sensitive, comprehensive microdata

39 1. Respect “restricted-access” conditions of use: » protect confidentiality » “share” data only with registered users 2. Study both source documentation and metadata: » Original source: census forms, instructions to enumerators, etc. » Integrated metadata: samples, variables, comparability discussions 3. Construct extracts judiciously: » extract only needed countries, censuses, variables, sub-pops » use sample size &/or “subsamp” features to keep samples small 4. Use weights: either households or individuals (geographical strata = power) 5. Analyze carefully: proper statistical techniques, keeping in mind data quality, sample error 6. Cite properly: IPUMS and National Statistical Agencies 7. Share publications: IPUMS and National Statistical Agencies 7 golden rules for the global microdata revolution

40 Thank you!! rmccaa@umn.edu


Download ppt "Roundtable on Archiving and Disseminating official statistics with a focus on census microdata Example: IPUMS-International * * *"

Similar presentations


Ads by Google