Differential Privacy Xintao Wu Oct 31, 2012. Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
April 2, 2015Applied Discrete Mathematics Week 8: Advanced Counting 1 Random Variables In some experiments, we would like to assign a numerical value to.
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Privacy Preserving Market Basket Data Analysis Ling Guo, Songtao Guo, Xintao Wu University of North Carolina at Charlotte.
Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Preserving Privacy in Clickstreams Isabelle Stanton.
Chapter 5 Data mining : A Closer Look.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Secure Cloud Database using Multiparty Computation.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Additive Data Perturbation: the Basic Problem and Techniques.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Privacy-preserving data publishing
Differential Privacy (1). Outline  Background  Definition.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
University of Texas at El Paso
Security in Outsourcing of Association Rule Mining
Private Data Management with Verification
Privacy-preserving Release of Statistics: Differential Privacy
A Privacy-Preserving Index for Range Queries
Differential Privacy in Practice
Current Developments in Differential Privacy
Differential Privacy (2)
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Differential Privacy Xintao Wu Oct 31, 2012

Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means, variances –Marginal totals –Model parameters Output perturbation –Add noise to summary statistics

Blending/hiding into a crowd K-anonymity based approaches Adversary may have various background knowledge to breach privacy Privacy models often assume “the adversary’s background knowledge is given”

Classic intuition for privacy Privacy means that anything can be learned about a respondent from the statistical database can be learned without access to the database. Security of encryption –Anything about the plaintext that can be learned from a ciphertext can be learned without the ciphertext. Prior and posterior views about an individual should not change much

Motivation Publicly release statistical information about a dataset without compromising the privacy of any individual

Requirement Anything that can be learned about a respondent from a statistical database should be learnable without access to the database Reduce the knowledge gain of joining the database Require that the probability distribution on the public results is essentially the same independent of whether any individual opts in to, or opts out of the dataset

Definition

Sensitivity function Captures how great a difference must be hidden by the additive noise

LAP distribution noise

Guassian noise

Adding LAP noise

Proof sketch

Delta_f=1, epsilon varies

Delta_f=1 epsilon=0.01

Delta_f=1 epsilon=0.1

Delta_f=1 epsilon=1

Delta_f=1 epsilon=2

Delta_f=1 epsilon=10

Delta_f=2, epsilon varies

Delta_f=3, epsilon varies

Delta_f=10000, epsilon varies

Composition Sequential composition Parallel composition --for disjoint sets, the ultimate privacy guarantee depends only on the worst of the guarantees of each analysis, not the sum.

Example Let us assume a table with 1000 customers and each record has attributes: name, gender, city, cancer, salary. –For attribute city, we assume the domain size is 10; –for attribute cancer, we only record Yes or No for each customer; – for attribute salary, the domain range is 0-10k. –The privacy threshold \epsilon is a constant 0.1 set by data owner. For one single query “How many customers got cancer?” The adversary is allowed to ask three times of the query shown the above.

Example (continued) “How many customers got cancer in each city?” For one single query “What is the sum of salaries across all customers?”

Type of computing (query) some are very sensitive, others are not single query vs. query sequence query on disjoint sets or not outcome expected: number vs. arbitrary interactive vs. not interactive

Sensitivity Global sensitivity Local sensitivity Smooth sensitivity

Different areas of DP PINQ DM with DP Optimizing linear counting queries under differential privacy. -Matrix mechanism for answering a workload of predicate counting queries

PPDM interface--PINQ A programmable privacy preserving layer Add calibrated noise to each query Need to assign privacy cost budget

Data Mining with DP Previous study—privacy preserving interface ensures everything about DP Problems—inferior results if the interface is utilized simply during data mining Solution—consider both together DP ID3 —noisy count —evaluate all attributes in one exponential mechanism query using entire budget instead of splitting budget among multiple

DP in Social Networks Page of pakdd11 tutorialpakdd11 tutorial