Nordic Demography Symposium, Tjøme 2001 The census in global perspective and the coming census microdata revolution * * * Robert McCaa & Steven Ruggles Minnesota Population Center http://www.ipums.org IPUMS International funded by National Science Foundation Nordic Demography Symposium, Tjøme 2001
Subtext: Why should Nordic countries participate in a project to preserve the world’s census microdata and help make them usable? Longest historical series of census microdata in the world Cross-national research on a global scale requires representation of all cultural regions Intriguing demographic, historical laboratory Large pool of scientific talent with global concerns Persisting cultural, scientific ties with Minnesota (would, for example, U. of Texas be as interested?) Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: IPUMSi Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 1. Introduction The census: what is it? Census microdata: what are they? How can they be made usable? Why should we care? Nordic Demography Symposium, Tjøme 2001
(from Museum of Antropology, Mexico City) 16th c. “census” of Mexico (Nahuatl, 1530s). “Here is the home of one...” (from Museum of Antropology, Mexico City) original ms. transcribed translated digitized Nordic Demography Symposium, Tjøme 2001
(from Museum of Antropology, Mexico City) 16th c. “census” of Mexico (Nahuatl, 1530s). “Here is the home of one...” (from Museum of Antropology, Mexico City) original ms. transcribed translated digitized When is a census, a census? Goyer (1986): 5. Individual enumeration 6. Periodic enumeration 7. Publication of results 8. Dissemination of results 1. National legal authority 2. Defined enumeration area 3. Complete coverage 4. Simultaneous enumeration Nordic Demography Symposium, Tjøme 2001
Male 10 years old, not married An Aztec extended family 5 conjugal units, 4 generations, 3 married brothers Simply an old widow 1530 Female, 20, not yet married Married Male Married female Married female Married Male (1 yr. Ago) Married Head of house Married female Married Male (1 yr. Ago) Married female Male 10 years old, not married Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 450 years later: An example of a patrilateral household from rural Morelos 5 conjugal unions, 3 generations 1990 Married head, 50 Married, 48 Son, 15 daughter10 Son, 22 free union Daughtr, 22 Daughtr,14, free union Free union, 21 Free Union, 25 Unión libre, 25 años Free union, 29 Daughtr 5 Son, 2 Daughtr, months old Daughtr, 2 Free union, 19 Free union, 16 Nordic Demography Symposium, Tjøme 2001 (not kin)
Examples to percentages: Have there been changes in 4 1/2 centures? Head spouse child kin non-kin Head spouse child kin non-kin Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Census microdata of the late 20th century: What are they? Who bears preservation responsibility? Who will make them usable? Person number Age Sex 12100102600700720000011210000104 22200202600700720000011210000104 32300100600700720000012123000000 42300200400700000000000000000000 52300200200700000000000000000000 62300200000700000000000000000000 Census microdata: Censuses are costly Public goods should be democratized Where microdata are available, they are used Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 2. The population census goes global. Coverage becomes universal (thanks to A.N. Kiær, Statistics Norway, who promoted globalization of census at beginning of 20th c.) Content becomes uniform Decennial censuses become the norm Nordic Demography Symposium, Tjøme 2001
Population censuses became universal in the 20th century. Will census microdata ... in the 21st? 153 countries with 1 million + pop. in 2000 2000 round figures are provisional Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Content ... increasingly uniform, principal source on population information. social variables: Nordic Demography Symposium, Tjøme 2001
Content ... increasingly uniform education and migration variables: Nordic Demography Symposium, Tjøme 2001
Content ... increasingly uniform demographic and economic variables: Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Decennial censuses are the rule (1945-2004). of 153 countries with 1 million + pop totaling 6 billion people in 2000: At least one census per decade: 66 countries 50% of world’s population Missed a single decennial enumeration: 43 countries 38% of world’s population Missed 2 or 3 enumerations: 32 countries 10% pop. Fewer than 3 enumerations: 12 countries 2% of pop. Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 On a millennial scale, censuses and census microdata survive for only a short, but significant period Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 …official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honor citizens’ entitlement to public information. -- UN Statistical Commission, 1994 Nordic Demography Symposium, Tjøme 2001
IPUMSi helps five ways: 1. Inventory the world’s census microdata 2. Preserve endangered microdata and documentation * * * 3. Anonymize census microdata to preserve statistical confidentiality, using highest standards (Stat. Nether.) 4. Integrate datasets of selected countries using UN, Eurostat and other standards 5. Disseminate database free with complete copies to all partners Integrated Public Use Microdata Series - International Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 I P U M Si I N V E N T O R I E S Microdata...for any population or administrative division: Nation, province, district, city, ethnic group, etc. Example: Latin America, - 20 countries - 67 censuses inventoried - 1% - 100% sample densities - 100,000 to 150 million cases 19th century: 2 censuses 1960s: 14 1970s: 17 1980s: 16 1990s: 17 Found: complete census data for Colombia 1973 and 16 other countries Nordic Demography Symposium, Tjøme 2001
and metadata (documentation) I P U M Si P R E S E R V E S UN Demographic Center for Latin America (CELADE, Santiago, Chile) ~3000 microdata tapes to be preserved and metadata (documentation) Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Preserve against accident, deterioration and technological obsolescence Microdata: - transfer to stable media - use standard data storage protocols - entrust copies with at least two depositories Metadata: collect, catalogue, and reproduce - Enumeration forms (preserve all versions used) - Enumerator and data processing instructions - Codebooks (photocopies and scanned images) - Technical studies, evaluations, reports UN Stat. Div.: entire archive deposited, to be scanned Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001
How anonymized census samples became a standard statistical product: US Census Bureau: - 1960 census 0.1% “public use microdata series” - 1970 census: six 1% samples harmonized with 1960 - 1984: 1940, 1950 1% samples - 1980, 1990 samples varying densities, contents CELADE: Latin America - 1960s: 16 countries, densities 1-5% - 1970s: 19 countries, 1-10% Nordic Demography Symposium, Tjøme 2001
How anonymized census samples became a standard statistical product: Canada: - 1971, 1976, 1981, 1986, 1991, 1996: varying designs, densities - 1996: Data Liberation Initiative led to an explosion in of usage in research and teaching UK: - 1991: 2% individuals, 0.5% households hundreds of publications, thousands of users - 2001: double the densities because confidentiality assessments were too conservative. Nordic Demography Symposium, Tjøme 2001
Risk assessment of statistical confidentiality: Take into account error, coding variability and changing of personal characteristics in time Dale and Elliott, JRSS-A (forthcoming): “For a user of an outside database, attempting this sort of match with no opportunity for verification would prove fruitless. In the first place, the small degree of expected overlap would be a considerable deterrent to an intruder. However, if a match between the two files was attempted the large number of apparent matches would be highly confusing as an intruder would have no way of checking correct identification.” Nordic Demography Symposium, Tjøme 2001
Statistical confidentiality in the USA: a brief history Before 1954: - 1850: “exclusively for the use of the government, and not to be used...to the gratification of curiosity...” - 1920s: deny access to data on individuals - 1942: refused to supply War Dept. w/ addresses of Japanese-Americans after 1954: - census microdata do not reveal identities of individuals - basic geographical identifiers, low sample densities, masking, swapping, top-coding, re-coding In practice, not a single breach or allegation of a breach! Nordic Demography Symposium, Tjøme 2001
Heightened concerns about confidentiality in USA Assault on privacy by businesses Distrust of “government” Never a question of use of census microdata. Yet must avoid any possible perception of mis-use to retain confidence and cooperation of citizens. Pro-active strategy: - Publicize confidentiality safe-guards - Offer a variety of microdata products: higher risks, higher security - Data enclaves: expensive, low usage, exceedingly detailed microdata Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 ‘statistical confidentiality’ shall mean the protection of data related to single statistical units which are obtained directly for statistical purposes or indirectly from administrative or other sources against any breach of the right to confidentiality. It implies the prevention of non-statistical utilization of the data obtained and unlawful disclosure. --COUNCIL REGULATION (EC) No 322/97 of 17 February 1997 Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Statistical confidentiality standards in Eurostat Countries (* = in IPUMSi consortium) Norway: Statistics Norway is prohibited to publish or disclose data from which information about individual persons or firms can be derived. Researchers may be given access to such information under strict rules and conditions. Guidelines provided by the Norwegian Data Inspectorate form the framework for internal management of data security. Other countries with strict provisions: *Austria, Canada, Denmark, Finland, *France, Germany, Ireland, Netherlands, Sweden Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Anonymized census microdata sample availability for European countries (* = in IPUMSi consortium, * = negotiating) 15 countries available via PAU, 1990 round (3 in IPUMSi), : Belgium, Czech Republic, Estonia, Finland, *Hungary, *Italy, Latvia, Lithuania, Norway, Poland, *Spain, Sweden, Switzerland, Turkey, *UK 11 countries not available via PAU (2 in IPUMSi): *Austria, Croatia, Denmark, *France, Germany, Iceland, Ireland, Netherlands, Portugal, Slovak Republic, Slovenia Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 EUROSTAT statistical anonymity standards (Thorogood, 1999) --all accepted by IPUMSi 1. small sample size 2. limited geographical detail 3. top and bottom coding of unique categories 4. signed non-disclosure agreement 5. prohibit redistribution of datasets to third parties 6. prohibit attempts to identify individuals or the making any claim to that effect 7. require users to provide copies of publications Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 EUROSTAT statistical anonymity standards (Thorogood, 1999) --all accepted by IPUMSi and more 8. Age (constructed, where necessary) 9. Never identify date of birth 10. Never identify place of birth 11. Migration: timing and place not identified in detail 12. Place of residence identified by major civil division (pop>60k, 120k, 250k, 1 million--national rule) 13. Sensitivity analysis of variables by national experts 14. Confidentiality assessment by national experts Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 International Monetary Fund’s General Data Dissemination System 52 countries with uniform standards All embrace strict standards of statistical confidentiality Prohibit disclosure of information which may identify individuals or entities 37 countries distribute anonymized census microdata samples Nordic Demography Symposium, Tjøme 2001
Globalization of the census & the coming census microdata revolution 1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001
Making the data usable ... and used. I P U M Si Making the data usable ... and used. IPUMSi,1999-2004 ~20 countries 1850-2000 Nordic Demography Symposium, Tjøme 2001
P A Y S National experts in each country are contracted to: I P U M Si P A Y S National experts in each country are contracted to: Assemble microdata and documentation Develop samples to minimize confidentiality risks and maximize robustness Design national integration plan census-by-census concept-by-concept code-by-code Write integrated documentation Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 I P U M Si I N T E G R A T E S Census documentation compiled for Colombian microdata Standard:UN/Eurostat Principles & Recs... Photos from Colombia integration project, February-March, 2000: 4 experts from DANE (census office) +7 academics (3 universities) Nordic Demography Symposium, Tjøme 2001
IPUMSi integration principles 1. Respect absolute anonymity 2. Preserve all original data, except adjustments to insure privacy (top codes blurrings, masking, re-ordering, etc.) 3. Harmonize codes for countries occupation: ISCO, HISCO (detailed, general) education: ISCED “ “ family: IPUMS, etc. “ “ 4. Enhance with constructed variables Nordic Demography Symposium, Tjøme 2001
10 projects started I N T E G R A T E S I P U M Si First 18 months USA 1850-1880, 1900-2000 France 1962, 1968, 1975, 1982, 1990 Norway 1801, 1865, 1875, 1900 negotiating: 1960, 1970, 1980, 1990, 2001 Canada 1871, 1881, 1901; negotiating: 1961-2001; United Kingdom (1851, 1881), 1991; negotiating: 1961, 1971, 1981, 2001 Argentina 1869, 1895 Colombia 1964, 1973,1985, 1993, 2003 Vietnam 1989, 1999 Hungary 1970, 1980, 1990, 2000 Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 I P U M Si I N T E G R A T E S 5 projects planned Mexico 1960, 1970, 1980, 1990, 2000 Spain 1981, 1991, 2001 Brazil 1960, 1970, 1980, 1991, 2001 China 1982, 1990, 2000 Kenya 1989, 1999 3 negotiations underway Ghana 1984, 2000 Italy 1981, 1991, 2001 Austria 1971, 1981, 1991, 2001 Nordic Demography Symposium, Tjøme 2001
Country Census microdata I P U M Si ? ? 7 future possibilities Country Census microdata a. 1860, 1870, 1880, 1950, 1960, 1970, 1980, 1990, 2000 b. 1961, 1971, 1981, 1991, 2001 c. 1961, 1971, 1976, 1981, 1986, 1991, 1996 d. 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995 e. 1960, 1966, 1970, 1975, 1980, 1985, 1990, 1995 f. 1971, 1981, 1991, 2001 g. 1970, 1980, 1990, 2000 and .... ??? Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 I P U M Si A N O N Y M I Z E S Using the highest standards currently available: technical (Statistics Netherlands) administrative (license agreement) Imagine a new statistical product: a scientifically anonymized census microdata sample made up of unidentifiable individuals... Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 IPUMSi preserves statistical confidentiality (in addition to NSO safe-guards): 1. Construct small samples 2. Suppress geographical detail (minor civil divisions and others with less than 100,000 population), date of birth, 3-4 digit occupational codes, etc. 3. Blur codes for sensitive variables where identity might be compromised (income) 4. Top-code income, education, etc. 5. Swap a small fraction of records 6. Assess confidentiality risks for unique records for all defined geographical areas (“ARGUS”, Statistics Netherlands) Nordic Demography Symposium, Tjøme 2001
Nordic Demography Symposium, Tjøme 2001 Repositories of anonymized census microdata samples for scientific research ICPSR, University of Michigan ACAP, University of Pennsylvania CELADE, Centro Latino Americano de Demografía, Santiago Chile. ECE/PAU, Population Affairs Unit, Geneva Switzerland. EWC, East-West Center, U. of Hawaii. IPUMSi, University of Minnesota. Will others (a Nordic institution?) join the effort? Nordic Demography Symposium, Tjøme 2001
D I S S EM I N A T E S International web-based access system I P U M Si D I S S EM I N A T E S International web-based access system End-User license agreement protects privacy and confidentiality assures proper use User selects countries, cases, variables, and samples--makes cross-national research possible Open architecture software and mirror sites available to all partners Nordic Demography Symposium, Tjøme 2001
Why should Nordic countries participate now? Legal and scientific foundations in place: EUROSTAT, France, Austria, UK, etc. Project has been underway 18 months of 5 year project; if resources are required, budget planning must begin soon. Historical census microdata projects are well advanced: 1801, 1865 (100% club), 1875, 1900. Time to turn to contemporary census microdata Nordic Demography Symposium, Tjøme 2001
additional information at: http://www.ipums.org * * * * * * Thank you Nordic Demography Symposium, Tjøme 2001
Work plan, part II: make census microdata usable 3. Integrate: March 2000- National partners: -integrate phase I countries using UN/Eurostat Principles & Recommendations -help to design prototype Analyze all concepts, variables and codes of census schedules for 30 target countries -help to implement for phase I and II countries 4. Disseminate: -October 2004 - Design international data access engine - Implement with phase I and II countries Nordic Demography Symposium, Tjøme 2001