A Whirlwind Tour of Differential Privacy

Slides:



Advertisements
Similar presentations
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Advertisements

The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23,
1. Required Reading A firm foundation for private data analysis. Dwork, C. Communications of the ACM, 54(1), Privacy by the Numbers: A New.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
1 Information complexity and exact communication bounds April 26, 2013 Mark Braverman Princeton University Based on joint work with Ankit Garg, Denis Pankratov,
Ohio’s Road to Tobacco Freedom December 2, 2004 by Mike Renner Ohio TUPCF Executive Director.
Game Theory, Mechanism Design, Differential Privacy (and you). Aaron Roth DIMACS Workshop on Differential Privacy October 24.
Greg Lamb. Introduction It is clear that we as consumers and entrepreneurs cannot expect complete privacy when discussing business matters. However… There.
Privacy Enhancing Technologies
Error Measurement and Iterative Methods
Mechanisms for a Spatially Distributed Market Moshe Babaioff, Noam Nisan and Elan Pavlov School of Computer Science and Engineering Hebrew University of.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Page 1 Secure Communication Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Investigative Analytics New techniques in data exploration Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2
Local Computation Mechanism Design Shai Vardi, Tel Aviv University Joint work with Avinatan Hassidim & Yishay Mansour Men’s preferences first second Women’s.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
IIIT Hyderabad Security and Privacy of Visual Data Maneesh Upmanyu, C. Narsimha Raju Anoop M. Namboodiri, K. Srinathan, C.V. Jawahar Center for Visual.
Tools for Privacy Preserving Distributed Data Mining
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Illinois Security Lab Privacy Sensitive Location Information Systems in Smart Buildings Jodie P. Boyer, Kaijun Tan, Carl A. Gunter Midwest Security Workshop,
Presented by: Suparita Parakarn Kinzang Wangdi Research Report Presentation Computer Network Security.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
SPAR-MPC Day 1 Breakout Sessions Emily Shen 29 May 2014.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
+ Security. + What is network security? confidentiality: only sender, intended receiver should “understand” message contents sender encrypts message receiver.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Network Security Continued. Digital Signature You want to sign a document. Three conditions. – 1. The receiver can verify the identity of the sender.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
LEARNING AREA 1 : INFORMATION AND COMMUNICATION TECHNOLOGY INTRODUCTION TO ICT COMPUTER ETHICS AND LEGAL ISSUES.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Preserving Statistical Validity in Adaptive Data Analysis Vitaly Feldman IBM Research - Almaden Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
Internet Privacy Define PRIVACY? How important is internet privacy to you? What privacy settings do you utilize for your social media sites?
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Thinking (and teaching) Quantitatively about Bias and Privacy
Privacy as a tool for Robust Mechanism Design in Large Markets
Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth.
Designing Private Forums
Differential Privacy in Practice
Vitaly (the West Coast) Feldman
Current Developments in Differential Privacy
Preserving Validity in Adaptive Data Analysis
Lecture 27: Privacy CS /7/2018.
Published in: IEEE Transactions on Industrial Informatics
Gentle Measurement of Quantum States and Differential Privacy *
Some contents are borrowed from Adam Smith’s slides
Differential Privacy.
Presentation transcript:

A Whirlwind Tour of Differential Privacy Aaron Roth February 14, 2015 <3

Protecting Privacy is Important Class action lawsuit accuses AOL of violating the Electronic Communications Privacy Act, seeks $5,000 in damages per user. AOL’s director of research is fired.

Protecting Privacy is Important Class action lawsuit (Doe v. Netflix) accuses Netflix of violating the Video Privacy Protection Act, seeks $2,000 in compensation for each of Netflix’s 2,000,000 subscribers. Settled for undisclosed sum, 2nd Netflix Challenge is cancelled.

Protecting Privacy is Important The National Human Genome Research Institute (NHGRI) immediately restricted pooled genomic data that had previously been publically available.

But what is “privacy”?

But what is “privacy” not? Privacy is not hiding “personally identifiable information” (name, zip code, age, etc…)

But what is “privacy” not? Privacy is not releasing only “aggregate” statistics.

So what is privacy? Idea: Privacy is about promising people freedom from harm. Attempt 1: “An analysis of a dataset D is private if the data analyst knows no more about Alice after the analysis than he knew about Alice before the analysis.”

So what is privacy? Problem: Impossible to achieve with auxiliary information. Suppose an insurance company knows that Alice is a smoker. An analysis that reveals that smoking and lung cancer are correlated might cause them to raise her rates! Was her privacy violated? This is exactly the sort of information we want to be able to learn… This is a problem even if Alice was not in the database!

So what is privacy? Idea: Privacy is about promising people freedom from harm. Attempt 2: “An analysis of a dataset D is private if the data analyst knows almost no more about Alice after the analysis than he would have known had he conducted the same analysis on an identical database with Alice’s data removed.”

Differential Privacy [Dwork-McSherry-Nissim-Smith 06] Alice Bob Xavier Chris Donna Ernie Algorithm ratio bounded Pr [r]

Differential Privacy 𝑋: The data universe. 𝐷⊂𝑋: The dataset (one element per person) Definition: Two datasets 𝐷, 𝐷 ′ ⊂𝑋 are neighbors if they differ in the data of a single individual.

Differential Privacy 𝑋: The data universe. 𝐷⊂𝑋: The dataset (one element per person) Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷 ′ , and for all outputs x: Pr 𝑀 𝐷 =𝑥 ≤(1+𝜖) Pr 𝑀 𝐷 ′ =𝑥

Some Useful Properties Theorem (Postprocessing): If 𝑀(𝐷) is 𝜖-private, and 𝑓 is any (randomized) function, then 𝑓(𝑀 𝐷 ) is 𝜖-private.

So… Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷 ′ , and for all outputs x: Pr 𝑀 𝐷 =𝑥 ≤(1+𝜖) Pr 𝑀 𝐷 ′ =𝑥 𝑥=

So… Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷 ′ , and for all outputs x: Pr 𝑀 𝐷 =𝑥 ≤(1+𝜖) Pr 𝑀 𝐷 ′ =𝑥 𝑥=

So… Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷 ′ , and for all outputs x: Pr 𝑀 𝐷 =𝑥 ≤(1+𝜖) Pr 𝑀 𝐷 ′ =𝑥 𝑥=

Some Useful Properties Theorem (Composition): If 𝑀 1 ,…, 𝑀 𝑘 are 𝜖-private, then: 𝑀 𝐷 ≡( 𝑀 1 𝐷 ,…, 𝑀 𝑘 𝐷 ) is 𝑘𝜖-private.

So… You can go about designing algorithms as you normally would. Just access the data using differentially private “subroutines”, and keep track of your “privacy budget” as a resource. Private algorithm design, like regular algorithm design, can be modular.

Some simple operations: Answering Numeric Queries Def: A numeric function 𝑓 has sensitivity 𝑐 if for all neighboring 𝐷, 𝐷 ′ : 𝑓 𝐷 −𝑓 𝐷 ′ ≤𝑐 Write 𝑠 𝑓 ≡𝑐 e.g. “How many PhDs are in the building?” has sensitivity 1. “What fraction of people in the building have PhDs?” has sensitivity 1 𝑛 .

Some simple operations: Answering Numeric Queries The Laplace Distribution:

Some simple operations: Answering Numeric Queries The Laplace Mechanism: 𝑀 𝐿𝑎𝑝 𝐷,𝑓,𝜖 =𝑓 𝐷 +𝐿𝑎𝑝 𝑠 𝑓 𝜖 Theorem: 𝑀 𝐿𝑎𝑝 (⋅,𝑓,𝜖) is 𝜖-private. Pf:

Some simple operations: Answering Numeric Queries The Laplace Mechanism: 𝑀 𝐿𝑎𝑝 𝐷,𝑓,𝜖 =𝑓 𝐷 +𝐿𝑎𝑝 𝑠 𝑓 𝜖 Theorem: The expected error is 𝑠 𝑓 𝜖 (can answer “what fraction of people in the building have PhDs?” with error 0.01%)

Some simple operations: Answering Non-numeric Queries “What is the modal eye color in the building?” 𝑅={Blue, Green, Brown, Red} If you can define a function that determines how “good” each outcome is for a fixed input, E.g. 𝑞 𝐷, Red =“fraction of people in D with red eyes”

Some simple operations: Answering Non-numeric Queries Theorem: 𝑀 𝐸𝑥𝑝 (𝐷,𝑅,𝑞, 𝜖) is 𝜖-private, and outputs 𝑟∈𝑅 such that: 𝐸 𝑞 𝐷,𝑟 − max 𝑟 ∗ ∈𝑅 𝑞 𝐷, 𝑟 ∗ ≤ 2𝑠 𝑞 𝜖 ⋅ ln 𝑅 (Can return an eye color that is within < 0.03% of the most common in this building.)

So what can we do with that? Machine Learning/Regression Statistical Estimation Graph Analysis (Next talk!) Combinatorial Optimization Spectral Analysis of Matrices Anomaly Detection/Analysis of Data Streams Convex Optimization Equilibrium computation Computation of optimal 1-sided and 2-sided matchings Pareto Optimal Exchanges …

A Synthetic Netflix Dataset 18,000 movies ( 2 18,000 possible data records) 3-way marginal: ``What fraction of people watched movie A, B, and C’’ Error and running time on 1,000,000 randomly selected 3-way marginals Seconds Error 𝜖 𝜖

Other Uses of “Privacy” as Stability Economic A tool for developing new incentive-compatible mechanisms For combinatorial auctions, stable matchings, and more Analytic To prevent over-fitting and improve the accuracy of data analysis and scientific discovery (Third talk this session!)

Thanks! To learn more: