Wavelet and Matrix Mechanism CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 11 : 590.03 Fall 12.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Ordinary Least-Squares
Chapter Outline 3.1 Introduction
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Fast Algorithms For Hierarchical Range Histogram Constructions
Lecture 3: Parallel Algorithm Design
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
One more definition: A binary tree, T, is balanced if T is empty, or if abs ( height (leftsubtree of T) - height ( right subtree of T) )
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
The General Linear Model. The Simple Linear Model Linear Regression.
1 Micha Feigin, Danny Feldman, Nir Sochen
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 11.
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
TCSS 342 AVL Trees v1.01 AVL Trees Motivation: we want to guarantee O(log n) running time on the find/insert/remove operations. Idea: keep the tree balanced.
Wavelet Packets For Wavelets Seminar at Haifa University, by Eugene Mednikov.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Classification and Prediction: Regression Analysis
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 2 NUMERICAL DATA REPRESENTATION.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis.
Part 3: Query Processing -- Data-Independent Methods 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang 3, Gerome Miklau 4 1 University.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
The Impact of Duality on Data Representation Problems Panagiotis Karras HKU, June 14 th, 2007.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Fall 2008Simple Parallel Algorithms1. Fall 2008Simple Parallel Algorithms2 Scalar Product of Two Vectors Let a = (a 1, a 2, …, a n ); b = (b 1, b 2, …,
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Dense-Region Based Compact Data Cube
Lecture 16: Image alignment
Learning to Align: a Statistical Approach
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Private Data Management with Verification
Lecture 3: Parallel Algorithm Design
Computing and Compressive Sensing in Wireless Sensor Networks
Chapter 6 Transform-and-Conquer
Privacy-preserving Release of Statistics: Differential Privacy
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Orthogonal Range Searching and Kd-Trees
Spatial Online Sampling and Aggregation
Singular Value Decomposition
Chapter 6. Large Scale Optimization
Great Theoretical Ideas in Computer Science
Chapter 6. Large Scale Optimization
Lecture 4: Tree Search Strategies
Presentation transcript:

Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12

Announcement Project proposal submission deadline is Fri, Oct 12 noon. Lecture 11 : Fall 122

Recap: Laplace Mechanism Thm: If sensitivity of the query is S, then adding Laplace noise with parameter λ guarantees ε-differential privacy, when λ = S/ε Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2 Variance / error on each entry = 2λ 2 = 2x4/ε 2 3Lecture 11 : Fall 12

Laplace Mechanism is Suboptimal Query 1: Number of cancer patients Query 2: Number of cancer patients If you answer both using Laplace mechanism – Sensitivity = 2 – Error in each answer: 2x4/ε 2 – Average of two answers gives an error of 4/ε 2 If you just answer the first and return the same answer – Sensitivity = 1 – Error in the answer: 2/ε 2 Lecture 11 : Fall 124

Outline Constrained inference – Ensure that the returned answers are consistent with each other. Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism[Xiao et al ICDE 09] – Matrix Mechanism [Li et al PODS 10] Lecture 11 : Fall 125

Note The following solution ideas are useful whenever – You want to answer a set of correlated queries. – Queries are based on noisy measurements. – Each measurement (x1 or x1+x2) has similar variance. Lecture 11 : Fall 126

Range Queries Given a set of values {v1, v2, …, vn} Let xi = number of tuples with value v1. Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Lecture 11 : Fall 127

Range Queries Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism Sensitivity = O(n 2 ) O(n 4 /ε 2 ) total error across all range queries. May reduce using constrained optimization … Lecture 11 : Fall 128

Range Queries Q: Suppose we want to answer all range queries? Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values. O(1/ε 2 ) error for each xi. Error(q(1,n)) = O(n/ε 2 ) Total error on all range queries : O(n 3 /ε 2 ) Lecture 11 : Fall 129

Universal Histograms for Range Queries Strategy 3: Answer sufficient statistics using Laplace mechanism Answer range queries using noisy sufficient statistics. Lecture 11 : Fall 1210 x1x2x3x4x5x6x7x8 x12x34x56x78 x1234x5678 x1-8 [Hay et al VLDB 2010]

Universal Histograms for Range Queries Sensitivity: log n q(2,6) = x2+x3+x4+x5+x6Error = 2 x 5log 2 n/ε 2 = x2 + x34 + x56 Error = 2 x 3log 2 n/ε 2 Lecture 11 : Fall 1211 x1x2x3x4x5x6x7x8 x12x34x56x78 x1234x5678 x1-8

Universal Histograms for Range Queries Every range query can be answered by summing at most log n different noisy answers Maximum error on any range query = O(log 3 n / ε 2 ) Total error on all range queries = O(n 2 log 3 n / ε 2 ) Lecture 11 : Fall 1212 x1x2x3x4x5x6x7x8 x12x34x56x78 x1234x5678 x1-8

Outline Constrained inference – Ensure that the returned answers are consistent with each other. Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 11 : Fall 1213

Wavelet Mechanism Lecture 11 : Fall 1214 x1x2x3x4x5xn C2C3Cm … … C1 Step 1: Compute Wavelet coefficients C2+η2C3+η3Cm+ηm … C1+η1 Step 2: Add noise to coefficients y1y2y3y4y5yn … Step 3: Reconstruct original counts

Haar Wavelet Lecture 11 : Fall 1215

Haar Wavelet Lecture 11 : Fall 1216 For an internal node, Let a = average of leaves in left subtree Let b = average of leaves in right subtree

Haar Wavelet Reconstruction Lecture 11 : Fall 1217 Sum of coefficients on root to leaf path + if x i is in the left subtree of coefficient - if x i is in right subtree

Haar Wavelet : Range Queries Lecture 11 : Fall 1218 Range Query: number of tuples in a range S = [a,b] Let α(c) be the number of values in the left subtree of c that are in S Let β(c) be the number of values in the right subtree of c that are in S

Haar Wavelet : Range Queries Lecture 11 : Fall 1219 α(c) – β(c) = 0 when no leaves under c are contained in S α(c) – β(c) = 0 when all leaves under c are contained in S Only need to consider those coefficients with partial overlap with the range.

Haar Wavelet Lecture 11 : Fall 1220 For an internal node, Let a = average of leaves in left subtree Let b = average of leaves in right subtree

Adding noise to wavelet coefficients Associate each coefficient with a weight level( c ) = height of c in the tree. Generalized sensitivity (ρ) Lecture 11 : Fall 1221

Adding noise to wavelet coefficients Theorem: Adding noise to a coefficient c from Laplace(λ/W(c)) guarantees (2ρ/λ)-differential privacy. Proof: Lecture 11 : Fall 1222

Generalized Sensitivity of Wavelet Mechanism Proof: Any coefficient changes by 1/m, where m is the number of values in its subtree. m = 1/W(c) Only c 0 and the coefficients in one root to leaf path change if some xi changes by 1. Lecture 11 : Fall 1223

Error in answering range queries Range query depends on at most O(log n) coefficients. Error in each coefficient is at most O(log 2 n/ε 2 ) Error in a range query is O(log 3 n/ε 2 ) Lecture 11 : Fall 1224

Summary of Wavelet Mechanism Query Strategy: use wavelet coefficients Can be computed in linear time Noise in each range query: O(log 3 n/ε 2 ) Lecture 11 : Fall 1225

Outline Constrained inference – Ensure that the returned answers are consistent with each other. Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 11 : Fall 1226

Linear Queries A set of linear queries can be represented by a matrix X = [x1, x2, x3, x4] is a vector representing the counts of 4 values H 4 X represents the following 7 queries – x1+x2+x3+x4 – x1+x2 – x3+x4 – x1 – x2 – x3 – x4 Lecture 11 : Fall 1227

Query Matrices Lecture 11 : Fall 1228 IdentityBinary IndexHaar Wavelet

Sensitivity of a Query Matrix How many queries are affected by a change in a single count? Lecture 11 : Fall 1229 Sensitivity = 1Sensitivity = 3

Laplace Mechanism Lecture 11 : Fall 1230 Sensitivity Noise Vector of Laplace(1)

Matrix Mechanism Lecture 11 : Fall 1231 Original Data Noisy Representation Reconstructed Data Final query answer

Reconstruction Lecture 11 : Fall 1232

Matrix Mechanism Lecture 11 : Fall 1233

Error analysis Lecture 11 : Fall 1234

Extreme strategies Strategy A = In – Noisily answer each xi – Answer queries using noisy counts Strategy A = W – Add noise to all the query answers Lecture 11 : Fall 1235 Good when each query hits a few values. Good when sensitivity is small

Finding the Optimal Strategy Find A that minimizes TotalError A (W) – Reduces to solving a semi-definite program with rank constraints – O(n 6 ) running time. See paper for approximations and an interesting discussion on geometry. Lecture 11 : Fall 1236

Summary A linear query workload and strategy can be modeled using matrices Previous techniques to find a better strategy to answer a batch of queries is subsumed by the matrix mechanism General mechanism to answer queries. Noise depends on the sensitivity of the strategy and A t A -1 Lecture 11 : Fall 1237

Next Class Sparse Vector Technique – Answering a workload of “sparse” queries Lecture 11 : Fall 1238

References X. Xiao, G. Wang, J. Gehrke, “Differential Privacy via Wavelet Transform”, ICDE 2009 C. Li, M. Hay, V. Rastogi, G. Miklau, A. McGregor, “Optimizing Linear Queries under Differential Privacy”, PODS 2010 Lecture 11 : Fall 1239