The Fate of Empirical Economics When All Data are Private

Slides:



Advertisements
Similar presentations
Public Goods SLO- Describe, explain and analyse as appropriate;
Advertisements

CREATING CHANGE IN EUROPE : SPARC EUROPE AND SCHOLARLY PUBLISHING Frederick J. Friend SPARC Senior Consultant
1 Worst-Case Equilibria Elias Koutsoupias and Christos Papadimitriou Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science.
Data Analytics Program at Drake Brad C. Meyer, Chair Information Management and Business Analytics.
1 William Y. Arms Cornell University April 4, 2003 Free Access to Information Today Who Benefits? What are the Risks? Who Pays?
Chapter 5: Economics of Pollution. Forms of Pollution Air pollution Water pollution Land contamination Noise pollution.
ABIDE Anonymised BIg Data Environment, ABIDE. ABIDE is an Anonymized Data Science Sharing Platform for laboratories and big data users of all types. Uniquely,
Biases in papers by rational economists The case of clustering and parameter heterogeneity Martin Paldam Paper at: Papers/Meta-method/Rationality2.pdf.
Ch. 2: Economic Tools 1. Micro approach –A. Choices –B. Constraints –C. Maximization (best choice) –D. Comparative statics 2. Utility Function 3. Totals.
JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)
Theoretical Tools of Public Economics Math Review.
National Science Foundation Directorate for Computer & Information Science & Engineering (CISE) Trustworthy Computing and Transition to Practice Secure.
A Whirlwind Tour of Differential Privacy
Economic Tools 1. Graphs: review basics –A. Plotting points. –B. Computing slope. 2. Micro approach –A. Choices –B. Constraints –C. Maximization (best.
Alexei A.Gaivoronski IKT Økonomi1 Provision of mobile data services: portfolio analysis Norwegian University of Science and Technology, Trondheim, Norway.
INFO 7470 Session 12 Statistical Tools: Methods of Confidentiality Protection John M. Abowd and Lars Vilhuber April 25, 2016.
Criteria for Evaluating Post-2000 Estimates Chuck Coleman Population Estimates Branch Population Division U.S. Census Bureau Prepared for the Spring 2001.
Why getting it almost right is OK and Why scrambling the data may help Oops I made it again…
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
CS & CS ST: Probabilistic Data Management Fall 2016 Xiang Lian Kent State University Kent, OH
Quantitative Methods for Business Studies
AC 1.2 present the survey methodology and sampling frame used
Conducting Market Research
Conducting Market Research Surveys
Business Economics (ECO 341) Fall Semester, 2012
No Free Lunch: Working Within the Tradeoff Between Quality and Privacy
Excel’s Solver Use Excel’s Solver as a tool to assist the decision maker in identifying the optimal solution for a business decision. Business decisions.
Business for Health Business Skills for Private Medical Practices
Understanding Social and Economic Data
Chapter 5 A Closed-Economy One-Period Macroeconomic Model.
Business Intelligence Minor
University of Texas at El Paso
Jonathan Eggleston and Lori Reeder
Data Science and Statistical Agencies
Prepared by:Dr.Hassan Sweillam
Assessing Disclosure Risk in Microdata
Four Challenges for Statistical Agencies
ECON 100 Lecture 16 Wednesday, November 12.
Big-Data Fundamentals
First order non linear pde’s
1d – Production Possibilities
Privacy-preserving Release of Statistics: Differential Privacy
eHealth in the Region of the Americas:
Privacy Preserving Data Publishing
Chapter One – Section Three Economic Choices and Decision Making
Differential Privacy in Practice
Differential Privacy in the Local Setting
The European Statistical Training Programme
Cambridge - The Importance of Communication within the Science Community Standard SC.7.N.1.7.
Low-Income Energy Assistance In FY 2009 & The Short Term Energy Outlook October, 2008.
2018 Digital Survey: Feedback & Analysis
Economic and Labor Foundations of Workforce Education
Classification Trees for Privacy in Sample Surveys
Research Infrastructures: Ensuring trust and quality of data
How Scientist Work Using Science Skills.
Trends and Variation in Health Insurance Coverage
Conducting Market Research Surveys
Privacy preserving cloud computing
Lucia Foster Chief Economist U.S. Census Bureau December 5-6, 2013
Field Experiments in Economics
Seminar in Economics Econ. 470
Published in: IEEE Transactions on Industrial Informatics
Field Experiments in Economics
Trustworthy Semantic Web
INTRODUCTION TO ECONOMICS
Psychology Chapter 2 Section 5: Ethical Issues
Differential Privacy (1)
Chapter 2 Sociologists Doing Research Section 1: Research Methods
Jerome Reiter Department of Statistical Science Duke University
Using paradata to explore users’ pathways through web surveys
Presentation transcript:

The Fate of Empirical Economics When All Data are Private John M. Abowd Cornell University and U.S. Census Bureau* Society of Government Economists May 13, 2016 *This work was performed as a Cornell faculty member before I joined the U.S. Census Bureau in an executive capacity. No confidential data from any source were used to develop this talk.

PLEASE DO NOT OPEN THE ENVELOPE!!!!!!!!!! First, a Live Demo! My assistants are now distributing sealed envelopes: CHOOSE YOUR OWN ENVELOPE DO NOT LET THE ASSISTANT CHOOSE PLEASE DO NOT OPEN THE ENVELOPE!!!!!!!!!!

Step 1 OPEN THE ENVELOPE CAREFULLY (IT IS RESEALABLE) REMOVE THE SHEET OF PAPER THAT SAYS “CIRCLE ONE” LEAVE THE OTHER SHEET OF PAPER IN THE ENVELOPE

Step 2 WITHOUT TAKING THE OTHER SHEET OF PAPER OUT OF THE ENVELOPE: READ THE QUESTION INSIDE YOUR ENVELOPE MEMORIZE YOUR ANSWER RESEAL THE ENVELOPE WITH THE QUESTION INSIDE

Step 3 ANSWER THE QUESTION ON THE SHEET OF PAPER THAT SAYS “CIRCLE ONE” HAND YOUR ANSWER TO ONE OF THE ASSISTANTS

Step 4 WHILE THE ASSISTANTS ARE TABULATING THE DATA, SHRED YOUR QUESTION ENVELOPE IN ONE OF SHREDDERS IN THE ROOM

Now, Let’s Analyze the Data

Randomized Response As a survey technique: Warner, Stanley L. (1965) “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias” Journal of the American Statistical Association 60, no. 309: 63–69, DOI: 10.2307/2283137. As a privacy-preserving data analysis system Du and Zhan (2003) “Using Randomized Response Techniques for Privacy- Preserving Data Mining” SIGKDD ’03, August 24-27, 2003, Washington, DC, USA. DOI: 10.1145/956750.956810. Dwork and Roth (2014) “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science 9, nos. 3–4: 211– 407.

The Basic Economics Scientific data quality is a pure public good (non-rival, non- excludable) Quantifiable privacy protection is also a pure public good (or “bad,” when measured as “privacy loss”) when supplied using the methods I will discuss shortly Computer scientists have succeeded in providing feasible technology sets relating the public goods: data quality and privacy protection These technology sets generate a quantifiable production possibilities frontier between data quality and privacy protection

The Basic Economics II We can now estimate the marginal social cost of data quality as a function of privacy protection—a big step forward The CS models are silent (or, occasionally, just wrong) about how to choose a socially optimal location on the PPF because they ignore social preferences To solve the social choice problem, we need to understand how to quantify preferences for data quality v. privacy protection For this we use the Marginal Social Cost of data quality and the Marginal Social Benefit of data quality, both measured in terms of the required privacy loss

Where social scientists act like MSC = MSB Where computer scientists act like MSC = MSB

How Should We Measure MSB? Medical diagnosis example Consumer price index example Legislative apportionment example Generically: sum of all the marginal social benefits from every potential use Not: marginal social benefit of the highest-valued user (market solution)

Ideal Data Publication - Privacy Protection Systems To the maximum extent possible, scientific analysis should be performed on the original confidential data Publication of statistical results should respect a quantifiable privacy- loss budget constraint Data publication algorithms should provably compose Data publication algorithms should be provably robust to arbitrary ancillary information

Doing Data Analysis in This World Census Bureau already does this in some applications OnTheMap Survey of Income and Program Participation Synthetic Data Synthetic Longitudinal Business Database Google does this Randomized Aggregatable Privacy Preserving Ordinal Responses (tool for Cloud service providers to harvest browser data) Prototype systems allow medical record databases to do this Privacy-preserving deep learning Computational healthcare

Doing Data Analysis in This World See: Abowd and Schmutte “Economic Analysis and Statistical Disclosure Limitation” Brookings Papers on Economic Activity (Spring 2015), http://www.brookings.edu/~/media/Projects/BPEA/Spring-2015- Revised/AbowdText.pdf?la=en Erlingsson, Pihur and Korolova “RAPPOR: Randomized Aggregatable Privacy- Preserving Ordinal Response” CCS’14, November 3–7, 2014, Scottsdale, Arizona, USA. ACM 978-1-4503-2957-6/14/11, http://dx.doi.org/10.1145/2660267.2660348 Shokri and Shmatikov “Privacy-Preserving Deep Learning” CCS’15, October 12–16, 2015, Denver, Colorado, USA. ACM 978-1-4503-3832-5/15/10, http://dx.doi.org/10.1145/2810103.2813687

Thank you! Contacts: john.abowd@cornell.edu