Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nordic Demography Symposium, Tjøme 2001

Similar presentations


Presentation on theme: "Nordic Demography Symposium, Tjøme 2001"— Presentation transcript:

1 Nordic Demography Symposium, Tjøme 2001
The census in global perspective and the coming census microdata revolution * * * Robert McCaa & Steven Ruggles Minnesota Population Center IPUMS International funded by National Science Foundation Nordic Demography Symposium, Tjøme 2001

2 Subtext: Why should Nordic countries participate in a project to preserve the world’s census microdata and help make them usable? Longest historical series of census microdata in the world Cross-national research on a global scale requires representation of all cultural regions Intriguing demographic, historical laboratory Large pool of scientific talent with global concerns Persisting cultural, scientific ties with Minnesota (would, for example, U. of Texas be as interested?) Nordic Demography Symposium, Tjøme 2001

3 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: IPUMSi Nordic Demography Symposium, Tjøme 2001

4 Nordic Demography Symposium, Tjøme 2001
1. Introduction The census: what is it? Census microdata: what are they? How can they be made usable? Why should we care? Nordic Demography Symposium, Tjøme 2001

5 (from Museum of Antropology, Mexico City)
16th c. “census” of Mexico (Nahuatl, 1530s). “Here is the home of one...” (from Museum of Antropology, Mexico City) original ms. transcribed translated digitized Nordic Demography Symposium, Tjøme 2001

6 (from Museum of Antropology, Mexico City)
16th c. “census” of Mexico (Nahuatl, 1530s). “Here is the home of one...” (from Museum of Antropology, Mexico City) original ms. transcribed translated digitized When is a census, a census? Goyer (1986): 5. Individual enumeration 6. Periodic enumeration 7. Publication of results 8. Dissemination of results 1. National legal authority 2. Defined enumeration area 3. Complete coverage 4. Simultaneous enumeration Nordic Demography Symposium, Tjøme 2001

7 Male 10 years old, not married
An Aztec extended family 5 conjugal units, 4 generations, 3 married brothers Simply an old widow 1530 Female, 20, not yet married Married Male Married female Married female Married Male (1 yr. Ago) Married Head of house Married female Married Male (1 yr. Ago) Married female Male 10 years old, not married Nordic Demography Symposium, Tjøme 2001

8 Nordic Demography Symposium, Tjøme 2001
450 years later: An example of a patrilateral household from rural Morelos 5 conjugal unions, 3 generations 1990 Married head, 50 Married, 48 Son, 15 daughter10 Son, 22 free union Daughtr, 22 Daughtr,14, free union Free union, 21 Free Union, 25 Unión libre, 25 años Free union, 29 Daughtr 5 Son, 2 Daughtr, months old Daughtr, 2 Free union, 19 Free union, 16 Nordic Demography Symposium, Tjøme 2001 (not kin)

9 Examples to percentages: Have there been changes in 4 1/2 centures?
Head spouse child kin non-kin Head spouse child kin non-kin Nordic Demography Symposium, Tjøme 2001

10 Nordic Demography Symposium, Tjøme 2001
Census microdata of the late 20th century: What are they? Who bears preservation responsibility? Who will make them usable? Person number Age Sex Census microdata: Censuses are costly Public goods should be democratized Where microdata are available, they are used Nordic Demography Symposium, Tjøme 2001

11 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001

12 Nordic Demography Symposium, Tjøme 2001
2. The population census goes global. Coverage becomes universal (thanks to A.N. Kiær, Statistics Norway, who promoted globalization of census at beginning of 20th c.) Content becomes uniform Decennial censuses become the norm Nordic Demography Symposium, Tjøme 2001

13 Population censuses became universal in the 20th century.
Will census microdata ... in the 21st? 153 countries with 1 million + pop. in 2000 2000 round figures are provisional Nordic Demography Symposium, Tjøme 2001

14 Nordic Demography Symposium, Tjøme 2001
Content ... increasingly uniform, principal source on population information. social variables: Nordic Demography Symposium, Tjøme 2001

15 Content ... increasingly uniform education and migration variables:
Nordic Demography Symposium, Tjøme 2001

16 Content ... increasingly uniform demographic and economic variables:
Nordic Demography Symposium, Tjøme 2001

17 Nordic Demography Symposium, Tjøme 2001
Decennial censuses are the rule ( ). of 153 countries with 1 million + pop totaling 6 billion people in 2000: At least one census per decade: 66 countries 50% of world’s population Missed a single decennial enumeration: 43 countries 38% of world’s population Missed 2 or 3 enumerations: 32 countries 10% pop. Fewer than 3 enumerations: 12 countries % of pop. Nordic Demography Symposium, Tjøme 2001

18 Nordic Demography Symposium, Tjøme 2001
On a millennial scale, censuses and census microdata survive for only a short, but significant period Nordic Demography Symposium, Tjøme 2001

19 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001

20 Nordic Demography Symposium, Tjøme 2001
…official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honor citizens’ entitlement to public information. -- UN Statistical Commission, 1994 Nordic Demography Symposium, Tjøme 2001

21 IPUMSi helps five ways:
1. Inventory the world’s census microdata 2. Preserve endangered microdata and documentation * * * 3. Anonymize census microdata to preserve statistical confidentiality, using highest standards (Stat. Nether.) 4. Integrate datasets of selected countries using UN, Eurostat and other standards 5. Disseminate database free with complete copies to all partners Integrated Public Use Microdata Series - International Nordic Demography Symposium, Tjøme 2001

22 Nordic Demography Symposium, Tjøme 2001
I P U M Si I N V E N T O R I E S Microdata...for any population or administrative division: Nation, province, district, city, ethnic group, etc. Example: Latin America, countries - 67 censuses inventoried - 1% - 100% sample densities - 100,000 to 150 million cases 19th century: censuses 1960s: s: s: s: 17 Found: complete census data for Colombia 1973 and 16 other countries Nordic Demography Symposium, Tjøme 2001

23 and metadata (documentation)
I P U M Si P R E S E R V E S UN Demographic Center for Latin America (CELADE, Santiago, Chile) ~3000 microdata tapes to be preserved and metadata (documentation) Nordic Demography Symposium, Tjøme 2001

24 Nordic Demography Symposium, Tjøme 2001
Preserve against accident, deterioration and technological obsolescence Microdata: - transfer to stable media - use standard data storage protocols - entrust copies with at least two depositories Metadata: collect, catalogue, and reproduce - Enumeration forms (preserve all versions used) - Enumerator and data processing instructions - Codebooks (photocopies and scanned images) - Technical studies, evaluations, reports UN Stat. Div.: entire archive deposited, to be scanned Nordic Demography Symposium, Tjøme 2001

25 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001

26 How anonymized census samples became a standard statistical product:
US Census Bureau: census 0.1% “public use microdata series” census: six 1% samples harmonized with 1960 - 1984: 1940, % samples - 1980, 1990 samples varying densities, contents CELADE: Latin America - 1960s: 16 countries, densities 1-5% - 1970s: 19 countries, 1-10% Nordic Demography Symposium, Tjøme 2001

27 How anonymized census samples became a standard statistical product:
Canada: - 1971, 1976, 1981, 1986, 1991, 1996: varying designs, densities - 1996: Data Liberation Initiative led to an explosion in of usage in research and teaching UK: - 1991: 2% individuals, 0.5% households hundreds of publications, thousands of users - 2001: double the densities because confidentiality assessments were too conservative. Nordic Demography Symposium, Tjøme 2001

28 Risk assessment of statistical confidentiality:
Take into account error, coding variability and changing of personal characteristics in time Dale and Elliott, JRSS-A (forthcoming): “For a user of an outside database, attempting this sort of match with no opportunity for verification would prove fruitless. In the first place, the small degree of expected overlap would be a considerable deterrent to an intruder. However, if a match between the two files was attempted the large number of apparent matches would be highly confusing as an intruder would have no way of checking correct identification.” Nordic Demography Symposium, Tjøme 2001

29 Statistical confidentiality in the USA: a brief history
Before 1954: - 1850: “exclusively for the use of the government, and not to be used...to the gratification of curiosity...” - 1920s: deny access to data on individuals - 1942: refused to supply War Dept. w/ addresses of Japanese-Americans after 1954: - census microdata do not reveal identities of individuals - basic geographical identifiers, low sample densities, masking, swapping, top-coding, re-coding In practice, not a single breach or allegation of a breach! Nordic Demography Symposium, Tjøme 2001

30 Heightened concerns about confidentiality in USA
Assault on privacy by businesses Distrust of “government” Never a question of use of census microdata. Yet must avoid any possible perception of mis-use to retain confidence and cooperation of citizens. Pro-active strategy: - Publicize confidentiality safe-guards - Offer a variety of microdata products: higher risks, higher security - Data enclaves: expensive, low usage, exceedingly detailed microdata Nordic Demography Symposium, Tjøme 2001

31 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001

32 Nordic Demography Symposium, Tjøme 2001
‘statistical confidentiality’ shall mean the protection of data related to single statistical units which are obtained directly for statistical purposes or indirectly from administrative or other sources against any breach of the right to confidentiality. It implies the prevention of non-statistical utilization of the data obtained and unlawful disclosure. --COUNCIL REGULATION (EC) No 322/97 of 17 February 1997 Nordic Demography Symposium, Tjøme 2001

33 Nordic Demography Symposium, Tjøme 2001
Statistical confidentiality standards in Eurostat Countries (* = in IPUMSi consortium) Norway: Statistics Norway is prohibited to publish or disclose data from which information about individual persons or firms can be derived. Researchers may be given access to such information under strict rules and conditions. Guidelines provided by the Norwegian Data Inspectorate form the framework for internal management of data security. Other countries with strict provisions: *Austria, Canada, Denmark, Finland, *France, Germany, Ireland, Netherlands, Sweden Nordic Demography Symposium, Tjøme 2001

34 Nordic Demography Symposium, Tjøme 2001
Anonymized census microdata sample availability for European countries (* = in IPUMSi consortium, * = negotiating) 15 countries available via PAU, 1990 round (3 in IPUMSi), : Belgium, Czech Republic, Estonia, Finland, *Hungary, *Italy, Latvia, Lithuania, Norway, Poland, *Spain, Sweden, Switzerland, Turkey, *UK 11 countries not available via PAU (2 in IPUMSi): *Austria, Croatia, Denmark, *France, Germany, Iceland, Ireland, Netherlands, Portugal, Slovak Republic, Slovenia Nordic Demography Symposium, Tjøme 2001

35 Nordic Demography Symposium, Tjøme 2001
EUROSTAT statistical anonymity standards (Thorogood, 1999) --all accepted by IPUMSi 1. small sample size 2. limited geographical detail 3. top and bottom coding of unique categories 4. signed non-disclosure agreement 5. prohibit redistribution of datasets to third parties 6. prohibit attempts to identify individuals or the making any claim to that effect 7. require users to provide copies of publications Nordic Demography Symposium, Tjøme 2001

36 Nordic Demography Symposium, Tjøme 2001
EUROSTAT statistical anonymity standards (Thorogood, 1999) --all accepted by IPUMSi and more 8. Age (constructed, where necessary) 9. Never identify date of birth 10. Never identify place of birth 11. Migration: timing and place not identified in detail 12. Place of residence identified by major civil division (pop>60k, 120k, 250k, 1 million--national rule) 13. Sensitivity analysis of variables by national experts 14. Confidentiality assessment by national experts Nordic Demography Symposium, Tjøme 2001

37 Nordic Demography Symposium, Tjøme 2001
International Monetary Fund’s General Data Dissemination System 52 countries with uniform standards All embrace strict standards of statistical confidentiality Prohibit disclosure of information which may identify individuals or entities 37 countries distribute anonymized census microdata samples Nordic Demography Symposium, Tjøme 2001

38 Globalization of the census & the coming census microdata revolution
1. Introduction: census & census microdata 2. The population census goes global coverage, periodicity, and content 3. Liberating census microdata: preservation, anonymization, integration, & dissemination 4. Statistical confidentiality and census samples: a 36 year-long perfect record 5. International norms of statistical confidentiality 6. Harmonizing and disseminating scientifically anonymized census samples: the case of IPUMSi Nordic Demography Symposium, Tjøme 2001

39 Making the data usable ... and used.
I P U M Si Making the data usable ... and used. IPUMSi, ~20 countries Nordic Demography Symposium, Tjøme 2001

40 P A Y S National experts in each country are contracted to:
I P U M Si P A Y S National experts in each country are contracted to: Assemble microdata and documentation Develop samples to minimize confidentiality risks and maximize robustness Design national integration plan census-by-census concept-by-concept code-by-code Write integrated documentation Nordic Demography Symposium, Tjøme 2001

41 Nordic Demography Symposium, Tjøme 2001
I P U M Si I N T E G R A T E S Census documentation compiled for Colombian microdata Standard:UN/Eurostat Principles & Recs... Photos from Colombia integration project, February-March, 2000: 4 experts from DANE (census office) +7 academics (3 universities) Nordic Demography Symposium, Tjøme 2001

42 IPUMSi integration principles
1. Respect absolute anonymity 2. Preserve all original data, except adjustments to insure privacy (top codes blurrings, masking, re-ordering, etc.) 3. Harmonize codes for countries occupation: ISCO, HISCO (detailed, general) education: ISCED “ “ family: IPUMS, etc “ “ 4. Enhance with constructed variables Nordic Demography Symposium, Tjøme 2001

43 10 projects started I N T E G R A T E S I P U M Si
First 18 months USA , France , 1968, 1975, 1982, 1990 Norway , 1865, 1875, negotiating: 1960, 1970, 1980, 1990, 2001 Canada , 1881, 1901; negotiating: ; United Kingdom (1851, 1881), 1991; negotiating: 1961, 1971, 1981, 2001 Argentina 1869, 1895 Colombia 1964, 1973,1985, 1993, 2003 Vietnam 1989, 1999 Hungary 1970, 1980, 1990, 2000 Nordic Demography Symposium, Tjøme 2001

44 Nordic Demography Symposium, Tjøme 2001
I P U M Si I N T E G R A T E S 5 projects planned Mexico 1960, 1970, 1980, 1990, 2000 Spain 1981, 1991, 2001 Brazil 1960, 1970, 1980, 1991, 2001 China 1982, 1990, 2000 Kenya 1989, 1999 3 negotiations underway Ghana , 2000 Italy , 1991, 2001 Austria , 1981, 1991, 2001 Nordic Demography Symposium, Tjøme 2001

45 Country Census microdata
I P U M Si ? ? 7 future possibilities Country Census microdata a , 1870, 1880, 1950, 1960, 1970, , 1990, 2000 b , 1971, 1981, 1991, 2001 c , 1971, 1976, 1981, 1986, , 1996 d , 1965, 1970, 1975, 1980, , 1990, 1995 e , 1966, 1970, 1975, 1980, , 1990, 1995 f , 1981, 1991, 2001 g , 1980, 1990, 2000 and .... ??? Nordic Demography Symposium, Tjøme 2001

46 Nordic Demography Symposium, Tjøme 2001
I P U M Si A N O N Y M I Z E S Using the highest standards currently available: technical (Statistics Netherlands) administrative (license agreement) Imagine a new statistical product: a scientifically anonymized census microdata sample made up of unidentifiable individuals... Nordic Demography Symposium, Tjøme 2001

47 Nordic Demography Symposium, Tjøme 2001
IPUMSi preserves statistical confidentiality (in addition to NSO safe-guards): 1. Construct small samples 2. Suppress geographical detail (minor civil divisions and others with less than 100,000 population), date of birth, 3-4 digit occupational codes, etc. 3. Blur codes for sensitive variables where identity might be compromised (income) 4. Top-code income, education, etc. 5. Swap a small fraction of records 6. Assess confidentiality risks for unique records for all defined geographical areas (“ARGUS”, Statistics Netherlands) Nordic Demography Symposium, Tjøme 2001

48 Nordic Demography Symposium, Tjøme 2001
Repositories of anonymized census microdata samples for scientific research ICPSR, University of Michigan ACAP, University of Pennsylvania CELADE, Centro Latino Americano de Demografía, Santiago Chile. ECE/PAU, Population Affairs Unit, Geneva Switzerland. EWC, East-West Center, U. of Hawaii. IPUMSi, University of Minnesota. Will others (a Nordic institution?) join the effort? Nordic Demography Symposium, Tjøme 2001

49 D I S S EM I N A T E S International web-based access system
I P U M Si D I S S EM I N A T E S International web-based access system End-User license agreement protects privacy and confidentiality assures proper use User selects countries, cases, variables, and samples--makes cross-national research possible Open architecture software and mirror sites available to all partners Nordic Demography Symposium, Tjøme 2001

50 Why should Nordic countries participate now?
Legal and scientific foundations in place: EUROSTAT, France, Austria, UK, etc. Project has been underway 18 months of 5 year project; if resources are required, budget planning must begin soon. Historical census microdata projects are well advanced: 1801, 1865 (100% club), 1875, 1900. Time to turn to contemporary census microdata Nordic Demography Symposium, Tjøme 2001

51 additional information at: http://www.ipums.org * * * * * * Thank you
Nordic Demography Symposium, Tjøme 2001

52 Work plan, part II: make census microdata usable
3. Integrate: March National partners: -integrate phase I countries using UN/Eurostat Principles & Recommendations -help to design prototype Analyze all concepts, variables and codes of census schedules for 30 target countries -help to implement for phase I and II countries 4. Disseminate: -October 2004 - Design international data access engine - Implement with phase I and II countries Nordic Demography Symposium, Tjøme 2001


Download ppt "Nordic Demography Symposium, Tjøme 2001"

Similar presentations


Ads by Google