Differential Privacy Some contents are borrowed from Adam Smith’s slides.

Slides:



Advertisements
Similar presentations
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Advertisements

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Nguyen Ngoc Tuan – Le Nguyen Duy Vu /24/
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
1 Pinning Down “Privacy” Defining Privacy in Statistical Databases Adam Smith Weizmann Institute of Science
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Calibrating Noise to Sensitivity in Private Data Analysis
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
CS573 Data Privacy and Security Statistical Databases
Secure Cloud Database using Multiparty Computation.
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
Computer Security: Principles and Practice
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 5 – Database Security.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
Chapter No 4 Query optimization and Data Integrity & Security.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Additive Data Perturbation: the Basic Problem and Techniques.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
IT Auditing & Assurance, 2e, Hall & Singleton Chapter 3: Data Management Systems.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Database Privacy (ongoing work) Shuchi Chawla, Cynthia Dwork, Adam Smith, Larry Stockmeyer, Hoeteck Wee.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Slide 1 CS 380S Differential Privacy Vitaly Shmatikov most slides from Adam Smith (Penn State)
Privacy-preserving Release of Statistics: Differential Privacy
Chapter 3: Data Management Systems
Differential Privacy in Practice
Current Developments in Differential Privacy
Differential Privacy (2)
Probabilistic Databases
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Differential Privacy Some contents are borrowed from Adam Smith’s slides

Outline  Background  Definition  Applications

3 Background: Database Privacy You Bob Alice Users (government, researchers, marketers, … ) “Census problem” Two conflicting goals  Utility: Users can extract “global” statistics  Privacy: Individual information stays hidden  How can these be formalized? Collection and “ sanitization ” 

4 Database Privacy You Bob Alice Users (government, researchers, marketers, … ) Variations on model studied in  Statistics  Data mining  Theoretical CS  Cryptography Different traditions for what “privacy” means Collection and “ sanitization ” 

Background  Interactive database query A classical research problem for statistical databases Prevent query inferences – malicious users submit multiple queries to infer private information about some person Has been studied since decades ago  Non-interactive: publishing statistics then destroy data  micro-data publishing?

6 Basic Setting  Database DB = table of n rows, each in domain D D can be numbers, categories, tax forms, etc This talk: D = {0,1} d E.g.: Married?, Employed?, Over 18?, … xnxn x n-1  x3x3 x2x2 x1x1 San Users (government, researchers, marketers, … ) query 1 answer 1 query T answer T  DB= random coins ¢¢¢

7 Examples of sanitization methods  Input perturbation Change data before processing E.g. Randomized response  Summary statistics Means, variances Marginal totals (# people with blue eyes and brown hair) Regression coefficients  Output perturbation Summary statistics with noise  Interactive versions of above: Auditor decides which queries are OK

8 Two Intuitions for Privacy “If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius]  Learning more about me should be hard Privacy is “protection from being brought to the attention of others.” [Gavison]  Safety is blending into a crowd Remove Gavison def?

9 Why not use crypto definitions?  Attempt #1: Def’n: For every entry i, no information about x i is leaked (as if encrypted) Problem: no information at all is revealed! Tradeoff privacy vs utility  Attempt #2: Agree on summary statistics f(DB) that are safe Def’n: No information about DB except f(DB) Problem: how to decide that f is safe? (Also: how do you figure out what f is?)

Differential Privacy The risk to my privacy should not substantially increase as a result of participating in a statistical database:

 No perceptible risk is incurred by joining DB.  Any info adversary can obtain, it could obtain without Me (my data). Differential Privacy Pr [ t ]

Sensitivity of functions

Design of randomization K  Laplace distribution  K adds noise to the function output f(x) Add noise to each of the k dimensions  Can be other distributions. Laplace distribution is easier to manipulate

 For d functions, f1,…,fd Need noise: the quality of each answer deteriorates with the sum of the sensitivities of the queries

Typical application  Histogram query Partition the multidimensional database into cells, find the count of records in each cell

Application: contingency table  Contingency table For K dimensional boolean data Contains the count for each of the 2^k cases  Can be treated as a histogram, each entry add an e-noise  Drawback, noise can be large for maginals

Halfspace queries  We try to publish some canonical halfspace queries,  any non-canonical ones can be mapped to the canonical ones and find approximate answers

applications  Privacy integrated queries (PINQ) PINQ provides analysts with a programming interface to unscrubbed data through a SQL- like language  Airavat a MapReduce-based system which provides strong security and privacy guarantees for distributed computations on sensitive data.