Statistical confidentiality and privacy. 2. Case study: IPUMS-International www.ipums.org/international * * * Robert McCaa Minnesota Population Center.

Slides:



Advertisements
Similar presentations
National Science Foundation Division of Science Resources Statistics May The Confidential Information Protection and Statistical Efficiency Act.
Advertisements

How IPUMS Harmonizes Microdata Data Sources and Bibliography Data Sources: Original census data are contributed to the IPUMS- International project by.
Using synthetic data to improve the accessibility of the SLS Susan Carsley, SLS Project Manager.
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor of.
IPUMS workshop * * * Robert McCaa, Professor of Population History University of Minnesota additional information.
Census 2000 symposium, session 4 paper 261 Archiving Census Documentation and Microdata: Preserving Memory, Increasing Stakeholders * * * Wendy L. Thomas.
Using a restricted-access web-site of anonymized, integrated census microdata (for 1, 2, 3, 4,
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity and confidentiality »
4. Creating an Extract (9 slides). 4. Creating an extract » Password protected: to make and retrieve extracts » Licensed researcher selects: » Countries,
1 Assortative Mating Patterns in the Developing World Albert Esteve* and Robert McCaa** Presented by: Sula Sarkar** * Centre d ’ Estudis Demogr à fics.
St. Lucia Country Report By Edwin St Catherine Director, Central Statistical Office Presented to IPUMS Workshop August 24 th, 2007.
A proposal to preserve, integrate and manage access to anonymized census samples of the Official Statistical Agencies of the Arab States in cooperation.
6. Managing access to IPUMS integrated census microdata “extracts” (13 slides)
Preservation and Security IPUMS International Wendy Thomas Data Archivist.
Calibrating census microdata against a gold standard (employment survey): women in the workforce, Mexico 1970, 1990 and 2000.
Hist.umn.edu/~rmccaa/ipums-europe1 Sister-project: IPUMS-Latin America: 17 countries, ~500 million pop., 5 census rounds 80+ samples, 100+ million person.
54th ISI, Berlin IPUMS-International: A Restricted Access Web-Site Providing Anonymized, Integrated Census Microdata.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Steven Ruggles Minnesota Population Center.
Proposed IPUMS-International Secure Data Enclave Patricia Kelly Hall
5. Integration of Microdata and Metadata (9 slides)
Welcome to the 7 th IPUMS-International workshop Accomplishments, plans and challenges * * * Robert McCaa,
The IPUMS-International dynamic metadata system * * * Robert McCaa, Professor of Population History University of Minnesota.
Hist.umn.edu/~rmccaa/ipums-europe1 From IPUMS-USA (1989-) & PAU-Aging (1992-) From IPUMS-USA (1989-) & PAU-Aging (1992-) to IPUMS-International (1999-)
Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center “ Inadequate.
MONGOLIA COUNTRY REPORT National Statistical Office IPUMS-Global Workshop, Lisbon, Portugal, August 22-26, 2007.
IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts
IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts.
DWB – 2 nd Regional Workshop Athens, October 2014 Adolfo Gálvez INE Accesing microdata for scientific research purposes- INE Spain.
IPUMS-EurAsia, : Changing Patterns of Microdata Use * * * Robert McCaa, Professor of Population History University.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
IPUMS-International: August * * * Robert McCaa, Professor of Population History University of Minnesota
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
Indigenous peoples, ethnicity and identities in contemporary censuses: A global perspective source: *
Harmonizing the World’s Census Microdata: The IPUMS Project Matt Sobek Minnesota Population Center
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS-Europe, : Restricted-access, anonymized microdata for scientific and policy research * * * Robert McCaa,
Basque Statistics Office Confidentiality Project: Final stages Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain,
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
* * * Robert McCaa and Albert Esteve Palos IPUMS-International and Integrated European Census Microdata.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
UNECE WORKSHOP UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE WORKSHOP ON POPULATION AND HOUSING CENSUSES FOR COUNTRIES OF EASTERN EUROPE, CAUCUSES AND.
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
IPUMS-International Steven Ruggles Minnesota Population Center.
JOINT UNECE-UNFPA TRAINING WORKSHOP ON POPULATION AND HOUSING CENSUSES GENEVA, 5-6 JULY 2010 GOOD PRACTICES IN DISSEMINATING POPULATION CENSUS RESULTS.
Access to microdata in Europe P resented by Michel Isnard – Insee DwB Training Course, Barcelona, Jan
Design and Use of the IPUMS-International Data Serieshttp://international.ipums.org Matt Sobek Minnesota Population Center
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
* IPUMS-International * Using Integrated unit records for demographic and health research: Local, regional, national, and international * * * Robert McCaa,
IPUMS-International Free census samples (microdata) for researchers and policy makers: * * * Robert McCaa, Minnesota Population.
Trans-Border access to Census Microdata: The IPUMS-IECM partnership * * * Robert McCaa and Albert Esteve Palós “You have to.
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
Integrated census microdata: a valuable, virgin source for statistical analysis of internal and international migration See handouts: 1. Card for list.
The experience of a National Statistical Institute after a law change: Estonia First Regional Workshop Microdata Access in European Countries ― Cooperation.
Integrated Public Use Microdata Series IPUMSwww.ipums.org Matt Sobek Minnesota Population Center
1 Dissemination Michael J. Levin Harvard Center for Population and Development Studies
Data Sharing in Nursing: What Researchers Need to Know November 9, 2015 Caitlin Bakker, Research Services Librarian |
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Workshop on Collection and Dissemination of Socio-economic Data from Population and Housing Censuses New Delhi, India, May 2012 United Nations Demographic.
Joint UNECE/Eurostat work session on statistical data confidentiality October 2015 Helsinki, Finland Circle of trust Maurice Brandt DESTATIS.
Integrated Public Use Microdata Series IPUMS Internationalwww.ipums.org Matt Sobek Minnesota Population Center
Data access and development: The IPUMS perspective United Nations Commission on Population and Development The data revolution in action: National and.
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor.
Country report Germany
Country report Germany
Civil Registration Process: Place, Time, Cost, Late Registration
Brisbane Accord Group SESSION 15. APPLICATION AND UTILIZATION OF CIVIL REGISTRATION AND VITAL STATISTICS INFORMATION Civil Registration Process: Place,
5 November, 2018 Nuku’alofa, Tonga
2. Applying for Access (10 slides)
Session 4 United Nations Statistics Division
Recommended Tabulations of the Principles and Recommendations for Population and Housing Censuses, Rev. 2 Session 4 United Nations Statistics Division.
Presentation transcript:

Statistical confidentiality and privacy. 2. Case study: IPUMS-International * * * Robert McCaa Minnesota Population Center “ Inadequate use of microdata has high costs ” --Len Cook (2003, registrar general, ONS)

MPC: largest provider of integrated microdata to trusted, non-commercial researchers International (census) USA (census) Employment History (19 th c.) GIS Health Time-Use

IPUMS-Global (first 10 years) dark green = integrated and disseminating (44 countries, 130 censuses, 279 millon person records) green = to be integrated (35 countries, 90 censuses, 150 mill.) Mollweide projection Inventory: * = IPUMS confidentiality protocols used See “Inventory” handout

1. IPUMS: A restricted access, web-based microdata dissemination system 2. IPUMS: The trusted user/institution approach » A. Legal Disclosure Controls » B. Administrative Disclosure Controls » C. Technical Disclosure Controls » Example: Saint Lucia, IPUMS Assessments (2007): » UN-ECE Case Study » Trewin on-site evaluation Outline: IPUMS statistical confidentiality methods

1. IPUMS-International: Goals 1. Inventory census microdata and documentation, world-wide 2. Recover and preserve at-risk microdata 3. Integrate census microdata and documentation 4. Disseminate--without cost--extracts of samples to bona-fide researchers worldwide, regardless of country of birth, citizenship or residence. » Sustained funding —6 grants of 5 years duration: » National Science Foundation (USA): 3 successive grants » National Institutes of Health (USA): Latin America, Europe, Eur-Asia

IPUMS-International: a restricted-access, web-based microdata extraction system » Researcher licensed to access microdata: 1/3 rejected » NO: Public access, source files, or complete datasets » Licensed researcher selects: » Countries, » Censuses, » Cases/sub-populations, » Variables, and sample densities » Extract engine queues request, generates extract » Password protected: to make and retrieve extracts » Researcher retrieves extract via web with SSL 128-bit encryption and analyzes using own wares (soft/hard/wet)

6 steps using 1. Logon w/ password 2a. Study documentation 2b. Design extract 3. Receive ; logon with p/word 4. Download extract (SSL encrypted) 5. UnZip data (also SAS, STATA) 6. Analyze See “10 tips” handout

IPUMS-International: world’s largest disseminator of integrated microdata to trusted, non-commercial researchers » 1999: Founded by Steven Ruggles and Bob McCaa, –restrict access to trusted users, and apply corresponding confidentiality techniques » 2002: 1 st release of integrated samples for 7 countries; >200 users in first year » Big success! 80+ countries signed; 70+ entrusted microdata to IPUMS, datasets for more than 250 censuses, >180 entire datasets » 2006…

IPUMS-International: world’s largest disseminator of integrated microdata to trusted, non-commercial researchers » 1999: Founded » 2006, 3 rd release: » data for 20 countries, samples for 63 censuses, » 185 million person records, » >1,000 users » 2010, 7 th release: » data for ~50 countries, samples for ~160 censuses » ~300 million person records » >4,000 users » Note: data extracts are provided only to licensed users.

2. IPUMS-International The “trusted-user/institution” approach to disseminating integrated, anonymized microdata extracts Disclosure Controls: A. Legal: Memorandum with NSI B. Administrative: License with researchers C. Technical: Sample, Data modifications

3 kinds of confidentiality protections: A. Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute » Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence B. Administrative: conditional use license between the University of Minnesota and each researcher » Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license C. Technical data protection measures » Specific to each country …/

A. NSI with U of Minnesota

A. NSI with U. of Minnesota

3 kinds of confidentiality protections: A. Legal: Dissemination agreement between University of Minnesota and each National Statistical Institute » Uniform 11 point Memorandum of Understanding regarding: ownership, use, authorization, restrictions, confidentiality, security, publication, violations, sharing, arbitration, and order of precedence B. Administrative: conditional use license between the University of Minnesota and each researcher » Permission to use restricted access microdata, 3 criteria: research need, research competence, and agree to abide by conditions of use license C. Technical data protection measures » Specific to each country …/

Legally-binding license agreement » forces would-be intruder to violate law by which they can be fined and/or jailed » Researcher’s institution sanctioned » protects privacy and confidentiality » assures proper use Access limited to: » Bona-fide researchers (credentials) » With a demonstrated scientific need » who agree to abide by license restrictions » Confidentiality » No redistribution » Safely secured » Alleging that a person has been identified is prohibited B. License with researchers Restricted Access web-based system LICENSELICENSELICENSELICENSE IPUMSiIPUMSiIPUMSiIPUMSi

Legally-binding license agreement » forces would-be snoopers to violate law » protects privacy and confidentiality » assures proper use Access limited to: » Bona-fide researchers (credentialed) » with demonstrated scientific need » who agree to abide by license restrictions » Confidentiality » No redistribution, no commercial use » Data safely secured » Alleging that a person can be or has been identified is a violation B. License with researchers Restricted Access web-based system LICENSELICENSELICENSELICENSE IPUMSiIPUMSiIPUMSiIPUMSi

“Apply for Access”

Must click acceptance of each restriction to gain access.

End of application

C. 9 Technical Disclosure Controls (Thorogood, 1999) 1. Restrict access to samples 2. Limit geographical detail 3. Recode sparse categories 4. Truncate top and bottom codes 5. Construct age from birthdate, if necessary 6. Suppress: date of birth, precise place of birth 7. Migration: timing/place not identified in detail 8. Identify place of residence by major civil division (pop>20k, 60k, 100k, 250k, 1 million—i.e., national convention) 9. Suppress any sensitive variable requested by NSI

C. Technical Disclosure Controls Example: Saint Lucia, 1991 Census 1. Restrict access to samples: 10% (13,405 persons) 2. Limit geographical detail (n<2,000): suppress region, district, town, settlement, enumeration district, school identification; retain urban-rural 3. Recode sparse categories (n<25)  “other”. » Type of dwelling: suppress townhouse, barracks » Land occupation: suppress sharecrop » Type of ownership: suppress squatted, leased » Type of roof: suppress 5 categories » Wall material: suppress 5 categories » Water supply: suppress pubwell » Type of lighting: suppress gas » Ethnic origin: suppress Chinese, Portuguese, Syrian-Lebanese » Religion: suppress 6 categories » School, work mode of transport: bicycle » Type of school: technical institute, university » Number of hours worked last wee’k: 5 hour groups., 70+ » Pay period: suppress quarterly, annually » Occupation, industry, training code: reduce from 4 digits to 1

C. Technical Disclosure Controls Example: Saint Lucia, Top-bottom code » Number of rooms: 10+ » Number of bedrooms: 7+ » Number of radios: 4+ » Number of tvs: 3+ » Number of videos: 2+ » Number of emigrants in dwelling: 2+ » Age: 81+ » Age at first child: <= 14 » Age at first union: <=14, 41+ » Age at last child: <=14, 45+ » Number of school subjects: =7 » Income categories: 8+

C. Technical Disclosure Controls Example: Saint Lucia, Suppress: » date of birth, precise place of birth, type of work wanted 6. Migration: timing/place not identified in detail » Country last lived: suppress 37 categories » Year of immigration: < Identify place of residence by major civil division (pop>20k, 60k, 100k, 250k, 1 million—i.e., national convention) » all suppressed 8. Suppress any sensitive variable requested by NSI: » none (as yet)

3. Assessments: A. Why was IPUMS cited as “good practice” by the UN-ECE (2007, Annex 23, pp )?

UN-ECE Good practices (see annex 23): 1. High level of confidence and transparency between the researchers (users) and the national statistical institutes 2. The data are anonymized by highly efficient technical means 3. The conditions of use are well defined 4. Good use is assured by both juridical and administrative mechanisms to prevent violations 5. Sanctions for misuse are clearly spelled out 6. Sanctions are imposed not only against those who misuse the data but also against their institutions

“The security of the computing environment used by IPUMS-International is first class and appears to be of the standard of the best statistical offices.” --Dennis Trewin, former-Australian Statistician, past-President International Statistical Institute, chair, UN-ECE Committee on Managing Statistical Confidentiality and Microdata Access (CES 2007) B. The Trewin Report: See “Trewin Report” handout

Statistical confidentiality and security: see the on-site review by Dennis Trewin (click “Trewin Report”) An Outsider’s view from inside IPUMS-International: » “The best practice for an international repository of microdata” » “The security of IPUMS is first class…the standard of the best national statistical offices” » “in full compliance with the principles and recommendations of the ECE”

1. Uniform legal authorization with national statistical authorities 2. Access restricted to academics with need who agree to abide by stringent confidentiality protections. Sanctions against individual and institution—denial of access to all microdata for the entire institution 3. Strong technical methods of microdata anonymization 4. Experienced integration teams 5. Proven web-based access management system 6. High producer and user satisfaction 7. Sustainable: MPC, NSF, NIH IPUMS-International strengths

Join us at the 58 th ISI: Dublin, Aug 21-26, » IPUMS Workshop, Aug » Microdata sessions. » IPUMS Funding for delegates from developing countries. » IPUMS booth » Participate in ISI sessions. » Network with stat offices, international agencies, etc.

Thank you! More: see: Durban workshop (2009): Microdata recovery, Jamaica report Lisbon workshop (2007): Saint Lucia report * * * * * * Contact: this ppt is also available at: ipums-global (See “Port of Spain workshop”) ipums-global ipums-global