Optimizing the Use of Microdata: Julia Lane Adapted from ASA presentation in honor of Pat Doyle.

Slides:



Advertisements
Similar presentations
Illinois Justice Network Portal Implementation Board Meeting February 11, 2004.
Advertisements

Data Sharing – an ESRC perspective Siân Bourne, Acting Head of Research Resources.
ONS Research Data Access Strategy AGENDA Background and context Confidentiality The Strategy.
The Microdata Analysis System (MAS): A Tool for Data Dissemination Disclaimer: The views expressed are those of the authors and not necessarily those of.
1 A View of the United States Federal Statistical System from OMB Katherine K. Wallman Chief Statistician U. S. Office of Management and Budget.
National Science Foundation Division of Science Resources Statistics May The Confidential Information Protection and Statistical Efficiency Act.
Data Collection in a Decentralized Statistical System – The U.S. Perspective Friends of the Chair Group on Integrated Economic Statistics, Work Group Meeting.
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
Evaluating Risk 1 IRB CELT Presentation Colleen Donaldson – IRB Administrator Julie Wilkens – IRB Coordinator.
Connecting people, society and the economy to a location UNSC Learning Centre 25 February 2013 Peter Harper Deputy Australian Statistician Australian Bureau.
1 Confidentiality and Data Access: Perspectives on Demographic Data Pat Doyle U.S. Census Bureau Prepared for the IASSIST Annual Conference, University.
Information Security Policies Larry Conrad September 29, 2009.
Farm Business and Farm Household Survey Data Customized Data Summaries from ARMS for Statistical Analysis Philip Friend USDA ‘s Economic Research Service.
Getting Smarter with Information An Information Agenda Approach
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
CUI Statistical: Collaborative Efforts of Federal Statistical Agencies Eve Powell-Griner National Center for Health Statistics.
Slide 14.1 Cooper et al: Tourism: Principles and Practice, 3e Pearson Education Limited 2005, © retained by authors Chapter 14 Public Sector and Policy.
Chapter 14 Public Sector and Policy
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Don Von Dollen Senior Program Manager, Data Integration & Communications Grid Interop December 4, 2012 A Utility Standards and Technology Adoption Framework.
Overview of 2002 CIPSEA: Methods to Protect Confidential Tabular Data Amrut Champaneri, Ph.D. U.S. Department of Transportation Bureau of Transportation.
Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated.
WHEN TITLE IS NOT A QUESTION N O ‘WE CAN’ WHEN TITLE IS NOT A QUESTION N O ‘WE CAN’ WHEN TITLE IS NOT A QUESTION N O ‘WE CAN’ Identity and Privacy: the.
© 2013 Cengage Learning. All Rights Reserved. 1 Part Four: Implementing Business Ethics in a Global Economy Chapter 9: Managing and Controlling Ethics.
Dissemination to support Research & Analysis John Cornish.
Copyright 2010, The World Bank Group. All Rights Reserved. Planning and programming Planning and prioritizing Part 1 Strengthening Statistics Produced.
CES Task Force on Confidentiality and Microdata Tiina Luige UNECE Statistical Division Conference of European Statisticians UN Economic Commission for.
Access to microdata in Europe P resented by Michel Isnard – Insee DwB Training Course, Barcelona, Jan
Challenges in adjusting statistical systems to support analysis of climate change Meeting of climate change related statistics for producers and users.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Strengthening Science Supporting Fishery Management  Standards for Best Available Science  Implementation of OMB’s Peer Review Bulletin  Separation.
Safeguarding Research Data Policy and Implementation Challenges Miguel Soldi February 24, 2006 THE UNIVERSITY OF TEXAS SYSTEM.
Presenter: Silas Mulwah Organization:Kenya National Bureau of Statistics  th September 2013, United Nations Regional workshop on Data Dissemination.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
Getting There from Here: Creating an Evidence- Based Culture Within Special Education Ronnie Detrich Randy Keyworth Jack States.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Page 1 Strategic Foresight Initiative Summary Briefing Emergency Management Higher Education Conference June 6, :30 – 11:30 am.
Carole Wells Kutztown University Andrea Chapdelaine Albright College Ana Ruiz Judy Warchal Alvernia University 11 th Annual International Association for.
Regional Seminar on Promotion and Utilization of Census Results and on the Revision on the United Nations Principles and Recommendations for Population.
Introduction to Project Management.  Explain what a project is?  Describe project management.  Understand project management framework.  Discuss the.
The NSF-Census Research Network (NCRN) Spring 2014 Meeting Introduction by John Thompson Director, Census Bureau.
FORUM GUIDE TO SUPPORTING DATA ACCESS FOR RESEARCHERS A STATE EDUCATION AGENCY PERSPECTIVE Kathy Gosa, Kansas State Department of Education.
Eve Powell-Griner National Center for Health Statistics Centers for Disease Control and Prevention National Center for Health Statistics Microdata Release.
AASHTO & FHWA Appeal re: DRB “rule of three” decision before the Data Stewardship Executive Policy Committee 8/28/2008.
Introduction and Overview of Information Security and Policy By: Hashem Alaidaros 4/10/2015 Lecture 1 IS 332.
Arlington, VA March 31, 2004 Presentation for the Advisory Committee for Business & Operations Effective Practices Research Overview For Merit Review This.
1 The NORC Data Enclave for Sensitive Microdata Timothy M. Mulcahy Senior Research Scientist, NORC/University of Chicago,
Micro data exchange in international trade, migration and banking statistics Jens.
1 Confidentiality and Data Access Committee Jacob Bournazian, Chair Energy Information Administration BTS Confidentiality Seminar Series June 11, 2003.
©Canada Health Infoway 2016 Health System Use Summit: Health Analytics for Informed Decision Making Technology and Infrastructure Enablers Joan Roch, Chief.
Privacy and ‘Big Data’: the European perspective Human Subjects’ Protections in the Digital Age: IRB, Privacy and Big Data Peter Elias, University of Warwick.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
United Nations Statistics Division
Promoting Evidence-Based Policymaking by Sharing State Administrative Data Dr. Marty Romitti January 25, 2017.
Short Training Course on Agricultural Cost of Production Statistics
Data Confidentiality and the Common Good.
Ethical questions on the use of big data in official statistics
Nicolás J. I. Rodríguez & Arild Mellesdal
National Statistician’s Data Ethics Advisory Committee
On data accessibility and confidentiality……..
International Statistics
Federal Statistical Office Germany Research Data Centre
Jan Byfuglien Statistics Norway
United Nations Statistics Division
The Health Information Research Infrastructure
Ethical Implications of using Big Data for Official Statistics
Imputation as a Practical Alternative to Data Swapping
Presentation transcript:

Optimizing the Use of Microdata: Julia Lane Adapted from ASA presentation in honor of Pat Doyle

Overview Benefits and Costs of Microdata Access Example of Consequences of Current Practice Current and Future Challenges Developing an Economic Framework Using the Framework to Shape a Research Agenda Next Steps

Benefits Of Microdata Access Permits Analysis of Complex Questions Tabular data answers predefined questions Micro data “drills down” to basic decision- making unit Heterogeneous behavior of economic agents Ability to Estimate Marginal Effects Scientific Safeguard Data Quality Development of Core Constituency for Statistical Agencies

Costs Of Microdata Access Different modalities Research Data Centers cost of safeguards Licensing cost of monitoring Remote Access cost of developing and updating Public Use Files cost of developing and updating Reputation Costs “Official” statistics? Role of work in progress Authorized purpose? Disclosure Legal liability Ethical Response rates

Example of Impact of One Approach: Public Use Files Reduce Information variable deletion recoding categorical variables into larger categories recoding continuous variables into categories rounding continuous variables using top and bottom code using local suppression and enlarging geographic areas Perturb Data noise addition record swapping rank swapping blanking and imputation micro-aggregation multiple imputation/modeling to generate synthetic data

Consequences of Topcoding for Data Quality

Consequences of Topcoding for Decisionmaking Earnings inequality increasing Steadily? Sharply? When? Inference for policy makers?

Consequences of Topcoding for Data Quality

Consequences of Topcoding for Decisionmaking Standard Censored Regression Problem Black/white earnings Gap of.35 or.63 log points in 1963? Change in gap between 1963 and log points or.15 log points?  Policy maker?  Racial earnings gap closing rapidly  Racial earnings gap closing slowly? ●Return to Education First column: Dropped from 1% in 1963 to approximately zero in 1973? Final column Consistent at 7%.  Policy maker?  Stop investing in education?  Investment in education should increase?

New Challenges: The Basic Issue “A recent book and conference on confidentiality and data access brought home the growing challenge facing the Census Bureau …. It is becoming clear that advances in technology and increased use of administrative records may, at some point in the future, render our current disclosure avoidance procedures inadequate. At the same time … the larger federal statistical system face increasing demands for more, better and more recent data to meet critically important public policy and research needs.” Pat Doyle, 2001

New Challenges: New Data Collection Modalities Surveys/censuses/admin data and.. Textual corpora Videotapes wireless network embedded devices increasingly sophisticated phones RFIDs sensor webs smart dust Cognitive neuroimaging records

Uses for Analysis

June 29,

Proposed Approach Formalize currently piecemeal approach to core problem: Optimize data quality Protect Confidentiality Respond to Changing World Exploit existing knowledge in other areas Develop approach that is responsive to overwhelming demand for information but recognizes constraints

Economic Framework Maximize U= u(Q, R, N), U is Data Utility Q= Data quality, R=Researcher quality, and N=number of times the data are accessed If Mi = modality i, then we can write Q(Mi). R and N are both determined by the access costs, A, imposed by the access modality, so R(Ai) and N(Ai).

Economic Framework Subject to S = H. D + C S = social cost H is harm D is disclosure risk C is cost to government

Economic Framework D* = z(E, I, Z, Mi) E is the existence and accessibility of other data sources that can be used for reidentification. The relationship between this and re-identification is affected by technology, T, and can be written E(T) I is the existence of malevolent interlopers. This relationship is affected by technology, legal penalties, L, and the characteristics of the population, X and can be written I(T, L, X) Z is researcher error. This is affected by technology, legal penalties, training and adoptable protocols, P and can be written Z(T,L, P) M, as before, is the set of access modalities

Constrained Optimization L = U – λ (H z(E,I,Z, Mi) + pt T + ΣMi pAiMi – S )

Using Framework to Shape a Research Agenda 1.Developing metrics of data quality Q Domingo-Ferrer/Torra/Winkler/Shlomo/Haworth 2.Quantifying the effect of the cost of access A on usage N and researcher quality R Dunne/Seastrom 3.Measuring harm H Madsen/Singer/Greenia (CDAC, 2005) 4.Quantifying the relationship between other data sources E and disclosure D Winkler/Domingo-Ferrer/Torra 5.Modelling malevolent behavior I and researcher error Z Feigenbaum/Agarawal/PORTIA project 6.Investigating alternative technological approaches T to providing new access modalities M Cybertrust/Defense Department/RDC’s/NSF funded researchers

Next Steps Need active funding within statistical community Consider portfolio approach – multiple modalities, human AND physical infrastructure (Portia Project) Consortium of agencies (Census, BLS, BEA etc) to fund research agenda Leverage research outside statistical community Conference of European Statisticians Statistical Confidentiality And Microdata Access – Principles And Guidelines Of Good Practice Engagement with other academic communities (e.g. cybertrust/IIS (Information, Privacy and Security ) initiatives at NSF; DARPA); IASSIST Role of supercomputer centers