IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts https://www.ipums.org/international.

Slides:



Advertisements
Similar presentations
International Day for Persons with Disability: Thirtieth Anniversary Jennifer H. Madans, Ph.D. National Center for Health Statistics, USA for the Washington.
Advertisements

Slide 1 Welcome Address Regulating Authorities E&P Service Industry E&P Operators.
Disseminating census microdata: the IPUMS and IECM experiences, (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.
Albert Esteve and the IECM-project team Centre d’Estudis Demogràfics Universitat Autònoma de Barcelona T HE I NTEGRATED E UROPEAN.
The Freedom to Publish Opinion Poll Results June 15, 2012 Presented by Dr. Robert Chung Director of Public Opinion Programme, The University of Hong Kong.
How IPUMS Harmonizes Microdata Data Sources and Bibliography Data Sources: Original census data are contributed to the IPUMS- International project by.
Company LOGO IPUMS International GIS Initiative Lara Cleveland, IPUMS International November 14, 2010 Havana, Cuba.
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor of.
IPUMS workshop * * * Robert McCaa, Professor of Population History University of Minnesota additional information.
Hist.umn.edu/~rmccaa/ipums-europe1 Population Activities Unit 1990 census round harmonization project: focused on Aging » Begun 1992: PAU/UNECE, UNFPA,
Census 2000 symposium, session 4 paper 261 Archiving Census Documentation and Microdata: Preserving Memory, Increasing Stakeholders * * * Wendy L. Thomas.
Using a restricted-access web-site of anonymized, integrated census microdata (for 1, 2, 3, 4,
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity and confidentiality »
1 Assortative Mating Patterns in the Developing World Albert Esteve* and Robert McCaa** Presented by: Sula Sarkar** * Centre d ’ Estudis Demogr à fics.
A proposal to preserve, integrate and manage access to anonymized census samples of the Official Statistical Agencies of the Arab States in cooperation.
6. Managing access to IPUMS integrated census microdata “extracts” (13 slides)
Hist.umn.edu/~rmccaa/ipums-europe1 Sister-project: IPUMS-Latin America: 17 countries, ~500 million pop., 5 census rounds 80+ samples, 100+ million person.
54th ISI, Berlin IPUMS-International: A Restricted Access Web-Site Providing Anonymized, Integrated Census Microdata.
IPUMS-Eurasia, : Preserving Eurasian census microdata, making them useful, and promoting their use * * * Robert McCaa,
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center.
Statistical confidentiality and privacy. 2. Case study: IPUMS-International * * * Robert McCaa Minnesota Population Center.
Hist.umn.edu/~rmccaa/ipums-europe1 From IPUMS-USA (1989-) & PAU-Aging (1992-) From IPUMS-USA (1989-) & PAU-Aging (1992-) to IPUMS-International (1999-)
Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center “ Inadequate.
Users and Uses of IPUMS International Data Presented by Dr. Miriam King.
The Political Geography of AIDS
Trends in African-American Marriage Patterns Steven Ruggles and Catherine Fitch Minnesota Population Center Funded by the National Science Foundation and.
IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts.
IPUMS-EurAsia, : Changing Patterns of Microdata Use * * * Robert McCaa, Professor of Population History University.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
The IECM project: Integrating the European Census Microdata IECM team* *A. Cabré, A. Esteve, J.Garcia, T. López, M. Valls PROJECT.
IPUMS-International: August * * * Robert McCaa, Professor of Population History University of Minnesota
Indigenous peoples, ethnicity and identities in contemporary censuses: A global perspective source: *
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Fourteenth Meeting of.
© Lloyd’s Regional Watch Content Guide CLICK ANY BOX AMERICAS IMEA EUROPE ASIA PACIFIC.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Thirteenth Meeting of.
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS-Europe, : Restricted-access, anonymized microdata for scientific and policy research * * * Robert McCaa,
Foreign Aid and Political Parties in Latin America Javier Gonzalez INAF – 100 Professor James R. Vreeland.
OECD Review of Russian Statistics Peer Review Mission to Russia April 2012 Tim Davis Head, Global Relations, Statistics Directorate.
Design and Use of the IPUMS-International Data Series
What can we learn from the available data? Mike Palmedo June 9, 2014.
TAI Background: Relevant International Commitments governments sign the Rio Declaration. Principle 10 mandated appropriate access to information,
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
Supply Risk Monitoring Supply Risk Monitoring (SRM) Draws on global operational network, and analytical engine –SRM website provides quick overview.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Global MAX Welcome to the world of…. About us We take pleasure in inviting you to become a member of Global MAX. We have two objectives: 1 st to provide.
UE Programme Al  an: High level scholarships for LA Malta, MAY 2005 European Union Programme of High Level Scholarships for Latin America ( )
Perfection in Automation
Trans-Border access to Census Microdata: The IPUMS-IECM partnership * * * Robert McCaa and Albert Esteve Palós “You have to.
Pusan National University Local Committee
Update and extension of the database on immigrants in OECD countries (DIOC) Joint UNECE/Eurostat Work Session on Migration Statistics, April 2010.
IPUMS Microdata Relation to head Marital status Literacy Occupation.
Statistics Project Wendy Kim & Tina Shin.  What is the most visited country in the world?
Integrated Public Use Microdata Series IPUMSwww.ipums.org Matt Sobek Minnesota Population Center
Integrated Public Use Microdata Series IPUMSwww.ipums.org.
Global Powered Lawn Mower Market to Market Size, Growth, and Forecasts in Nearly 70 Countries “This comprehensive publication enables readers the.
Global Printing Ink Market to Market Size, Growth, and Forecasts in Over 70 Countries “This comprehensive publication enables readers the critical.
1. Introduction 2. Background 3. Funding framework 4. EU participation 5. Timetable 6. Progress report 7. Future plans I ntegrating the E uropean C ensus.
Robert McCaa Antonio López Gay Representing IPUMS – International Project Minnesota Population Center / University of.
Copyright © 2007 Rockwell Automation, Inc. All rights reserved. Insert Photo Here RSLogix 5000 with FactoryTalk Activation Grace Period.
Data access and development: The IPUMS perspective United Nations Commission on Population and Development The data revolution in action: National and.
GLOBAL ENTREPRENEURSHIP MONITOR Funding Fair Wednesday 25 th May 2016.
Pinger and IEPM-BW activity at FNAL By Frank Nagy FTP/CCF Computing Division Fermilab.
Summary of Annual Activities Related to Disability Statistics Cordell Golden National Center for Health Statistics United States Fifteenth Meeting of the.
Global Golf Equipment Market to 2019 The report focuses on global major leading industry players with information such as company profiles, product picture.
Integrating the European Census Microdata
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor.
Forest Products Conversion Factors
Electrification business
Presentation transcript:

IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts * * * Robert McCaa, Minnesota Population Center Albert Esteve, Centre d’Estudis Demogràfics IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts * * * Robert McCaa, Minnesota Population Center Albert Esteve, Centre d’Estudis Demogràfics “Inadequate use of microdata has high costs” --Len Cook (2003)

Outline: IPUMS-International Confidentiality Measures 1. Introduction: What is IPUM S i 5 slides 2. Disseminating anonymized, integrated extracts 3 slides 3. IPUMS-International confidentiality protections » Legal3 slides » Administrative3 slides » Technical6 slides 4. Conclusions 2 slides

1. Introduction: What is IPUM-International? (7 slides)

IPUMS-International is… a global collaboratory of National Statistical Institutes & Universities to: » 1. Inventory the world’s census microdata » 2. Preserve endangered microdata and documentation * * * » 3. Integrate census microdata » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 4. Anonymize census microdata to preserve statistical confidentiality, using highest standards » 5. Disseminate restricted access, custom extracts to approved researchers at no cost

IPUMS-International, October 2005 dark green = disseminating green = harmonizing lightest green = negotiating Mollweide Projection

Available now: (see Table 1)

Table 1. IPUMS-International consortium members, November, 2005Region Oficial Statistics Authority AfricaCameroon, Egypt, The Gambia, Kenya, South Africa, Uganda IPUMS- Latin America Argentina, Bolivia, Brazil, Canada, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Uruguay, USA, Venezuela IPUMS- Global (Asia) Armenia, Bangladesh, Cambodia, China, Fiji, Indonesia, Iraq, Israel, Malaysia, Mongolia, Pakistan, Palestinian Authority, Philippines, Turkmenistan, Vietnam IPUMS- Europe Austria, Belarus, Bulgaria, Czech Republic, France, Germany, Greece, Hungary, Netherlands, Portugal, Romania, Slovenia, Spain, the United Kingdom (pending: Ireland, Italy, Poland, Russia, Switzerland and Turkey).

IPUMS-Europe: (Table 2). By July 2005, at 1 st workshop 9 countries entrusted 28 datasets to the project (bolded); 2 nd workshop in 2006; first release in 2007

R E C O V E R S Centro Latino Americano y Caribeño de Demografía (CELADE) ~3000 microdata tapes recovered, and corresponding documentation IPUMSiIPUMSiIPUMSiIPUMSi What microdata still exist for the European region? » For the 1960 census round? » 1970s? 1980s? 1990s? Will they be recovered before it is too late?

» : USA: SHRL Common format FORTRAN programs Limitations: lost information, false cognates, poor documentation, expensive custom datasets » : IPUMS-USA was an attempt to do it right Single harmonized database, comprehensive integrated documentation, no lost information Beta release 1993, web-based interactive extraction in 1995 » 1995-present: IPUMS-USA internet dissemination Microdata samples for each decennial census: » 1999-present: IPUMS-International History of IPUMS,

2. Disseminating Anonymized, Integrated Extracts (3 slides)

IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity » 2. Preserve all original data, except adjustments to assure confidentiality (top codes blurrings, masking, re- ordering, etc.) » 3. Harmonize codes for countries occupation: ISCO, HISCO (detailed, general) education: ISCED “ “ family: IPUMS, etc. “ “ » 4. Enhance with constructed variables

6 steps using 1. Logon w/ password 2a. Study documentation 2b. Design extract 3. Receive ; logon with p/word 4. Download extract (SSL encrypted) 5. UnZip data (also SAS, STATA) 6. Analyze

Data Dissemination: web-based extraction system » Password protected: to make and retrieve extracts » Researcher selects: » Countries, » Censuses, » Cases/sub-populations, » Variables, and » Sample densities » Extract engine queues request, generates extract » Researcher retrieves extract via web with SSL 128-bit encryption » NO: CDs, original codes, or complete datasets

3. Confidentiality Protections (15 slides) “There has been no known attempt at identification with the 1991 SARs [microdata samples of the UK]- nor in any other countries that disseminate samples of microdata” --Elliott and Dale, Journal of the Royal Statistical Society, 1999

‘statistical confidentiality’ shall mean the protection of data related to single statistical units which are obtained directly for statistical purposes or indirectly from administrative or other sources against any breach of the right to confidentiality. It implies the prevention of non-statistical utilization of the data obtained and unlawful disclosure. --COUNCIL REGULATION (EC) No 322/97 of 17 February 1997

3 kinds of confidentiality protections: 1. Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute » Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence 2. Administrative: conditional use license between the University of Minnesota and each researcher » Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license 3. Technical data protection measures » Specific to each country …/

Legal: OSI and U. Minnesota

Legal: OSI and U. Minnesota (2001-4)

Legal: OSI and U. Minnesota (2005+)

3 kinds of confidentiality protections: 1. Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute » Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence 2. Administrative: conditional use license between the University of Minnesota and each researcher » Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license 3. Technical data protection measures » Specific to each country …/

Legally-binding license agreement » protects privacy and confidentiality » assures proper use » forces snoopers to violate law Access limited to: » Bona-fide researchers (credentials) » With a demonstrated scientific need » who agree to abide by license restrictions » Confidentiality » No redistribution » Safely secured Restricted Access web-based system DISSEMINATESDISSEMINATESDISSEMINATESDISSEMINATES IPUMSiIPUMSiIPUMSiIPUMSi

User Conditions of Use License

Conditions of Use License (Appendix B)

3 kinds of confidentiality protections: 1. Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute » Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence 2. Administrative: conditional use license between the University of Minnesota and each researcher » Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license 3. Technical data protection measures » Specific to each country …/

ANONYMIZESANONYMIZESANONYMIZESANONYMIZES IPUMSiIPUMSiIPUMSiIPUMSi » Suppress geographical detail » Blur/aggregate sensitive codes » Convert dates to ages (blur key vars.) » Swap cases between districts » Scramble records technical measures are also applied, in addition to the legal & administrative protections

EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International » 1. Restrict access to samples » 2. Limit geographical detail » 3. Re-code unique categories--top and bottom » 4. Sign non-disclosure agreement » 5. Prohibit redistribution to third parties » 6. Prohibit attempts to identify individuals or the making any claim to that effect » 7. Require users to provide copies of publications

EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 8. Construct age from birthdate, if necessary8. Construct age from birthdate, if necessary 9. Do not identify date of birth9. Do not identify date of birth 10. Do not identify precise place of birth10. Do not identify precise place of birth 11. Migration: timing/place not identified in detail11. Migration: timing/place not identified in detail 12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention)12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention) 13. Do sensitivity analysis13. Do sensitivity analysis 14. Do confidentiality assessment14. Do confidentiality assessment

Anonymization example: Kenya, 1989 Anonymization example: Kenya, 1989 Kenya: Anonymization Based on Unique Characteristics Threshold (50,000 for geographic variables; 10,000 for other variables) TypeProcedure Variable Name KeySuppressedDivision, Location, Sublocation, Enumeration area, Tribe/Ethnicity Aggregated50,000 minimum: Province, District of Residence, Birth and Past Residence NoneSex, Marital Status, Relationship to Head, etc. SensitiveAggregated10,000/1,000 minimum: Occupation, Employment Status Transitory (information is considered too changeable to be used to identify individuals from microdata). NoneAge, Urban/Rural Residence, Literacy, Educational Status, Educational Level, Labor Activity, Children Everborn/Alive/Dead, Last Birth Year, Mortality variables

IPUMS-International samples anonymized by: Census Agency (36 countries) or IPUMS (19 countries) Census Agency (n=36): Argentina, Armenia, Austria, Belarus, Brazil, Cambodia, Canada, China, *Czech Republic, Egypt, France, *Germany, Greece, Hungary, Indonesia, *Ireland, Israel, Malaysia, Mexico, Mongolia, Netherlands, Pakistan, Palestinian Authority, Philippines, *Poland, *Portugal, Puerto Rico, Romania, *Slovenia, South Africa, Spain, Turkmenistan, *Turkey, USA, UK, Vietnam * Microdata for 7 countries not entrusted to project yet. * Microdata for 7 countries not entrusted to project yet. IPUMS (n=19): Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Fiji Islands, Guatemala, Honduras, Iraq, Kenya, Nicaragua, Panama, Paraguay, Peru, Uganda, Uruguay, Venezuela

Risk assessment of 1991 SARs: the risk is very low… » After taking into account errors in the data, coding variability and changing of personal characteristics in time » Dale and Elliott, JRSS-A (2003): “For a user of an outside database, attempting this sort of match with no opportunity for verification would prove fruitless. In the first place, the small degree of expected overlap would be a considerable deterrent to an intruder. However, if a match between the two files was attempted the large number of apparent matches would be highly confusing as an intruder would have no way of checking correct identification.”

4. Shifting the Risk-Utility curve rightward by restricting access to accredited users.

4. Conclusion

IPUMS-Europe 2004 – 2009 CIECM 2005 DIECM EIECM Disseminate Joint integrated European census microdata projects Coordinate Enhance

1. Uniform legal authorization with national statistical authorities 2. Access restricted to academics with need who agree to abide by stringent confidentiality protections 3. Experienced integration teams 4. Proven web-based distribution system 5. High user satisfaction 6. Sustainable: NSF, NIH, FP-6 funded (Europe only) IPUMS-International strengths

Thank you! additional information at: click: ipums-europe and: click: IECM * * * * * * Contacts:

Population Activities Unit 1990s census harmonization project: » Begun 1992: PAU, ECE, UNFPA, US-NIA » Microdata acquired for 15 countries » Progressive over-samples for the aged » Harmonized 26 core person variables plus 13 optional; 10 dwelling/household variables, 18 optional » Extensive metadata: questionnaires, nomenclatures, classifications

Problems with PAU effort: » Lacked legal authority » Inadequate funding » High institutional costs » Insufficient computing infrastructure and human resources » Antiquated distribution system: few users » Sustainability: problematic