Download presentation
Presentation is loading. Please wait.
1
Private Data Management with Verification
Yan Chen Duke University Advisor: Ashwin Machanavajjhala
2
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
3
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
4
Data Privacy
5
Differential Privacy Definition 1 : ε-Differential Privacy
A randomized algorithm M satisfies ε-Differential Privacy if for any two neighboring datasets D1 and D2, any output S, [C.Dwork etc. ICALP 2006]
6
Differential Privacy Property 1 (Sequential Composition)
M1 and M2 satisfy ε1 and ε2-differential privacy. Releasing the results of both M1(D) and M2(D) will satisfy (ε1+ε2)-differential privacy. Property 2 (Parallel Composition) If D1, D2 are subsets of D and D1∩D2 = Φ. Then releasing M1(D1) and M2(D2) will satisfy max(ε1,ε2)-differential privacy. Property 3 (Post-processing) If M3 is any algorithm, releasing M3(M1(D)) will still ε1-differential privacy.
7
Laplace Mechanism Definition 2 : Laplace Mechanism
For any function f: D -> R^n, the Laplace Mechanism M: M(D) = f(D) + η. η is a vector of independent random variables drawn from a Laplace distribution with parameter = Δ(f) / ε. Δ(f): global sensitivity of f [C.Dwork etc. ICALP 2006]
8
Private Data Management Framework
Data Curator Data Synthesizer Querier Verifier
9
Framework - Open Questions
Differentially Private Algorithms for private verification on different tasks Protection for Data Synthesis
10
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
11
Differentially Private Regression Diagnostics
Generate Model Evaluate Model (Regression Diagnostics) Algorithms for linear/logistic regression while ensuring privacy No privacy-preserving techniques for regression diagnostics
12
Differentially Private Regression Diagnostics
PriRP – Residual Plot (an error measure for linear regression) PriROC – ROC curve (an error measure for logistic regression)
13
Residual Plot Linear Regression models the outcome:
Suppose b is the estimate model, the residual of each point: Residual Plot: residuals v.s. predicted values
14
Residual Plot
15
Private Residual Plot - PriRP
Private Bounds Computation Residual Plots Perturbation
16
Private Residual Plot - PriRP
Private Bounds Computation Real bounds contain sensitive info of data The sensitivity of the bound is infinity. Q: Identify the bounds (-b,b) such that at least θ fraction of the points are contained in (-b,b) with high probability? SVT based algorithm [C. Dwork 14] qi : how many points within the bound (-u*2^i, u*2^i) ?
17
Private Residual Plot - PriRP
Residual Plots Perturbation Q: Estimate 2D probability density inside a bounded region? 1. Discretization 2. Perturbation 3. Sampling
18
Private Residual Plot - PriRP
Empirical Evaluation (data scale = 5000)
19
Private Residual Plot - PriRP
Empirical Evaluation Define similarity between real RP and perturbed RP: Discretize the bound of real RP into 10*10 equal-width grid cells Compute the distribution of residuals among all grids cells c in real RP and perturbed RP, denoted as P(c) and P’(c)
20
Private Residual Plot - PriRP
Empirical Evaluation
21
ROC curve
22
ROC curve ROC curve: TPR v.s. FPR in terms of all possible θ
AUC: area under the curve
23
Private ROC Curve - PriROC
Choosing Thresholds Computing TPRs and FPRs Ensuring Monotonicity
24
Private ROC Curve - PriROC
Choosing Thresholds 1. data independent strategy: fix |Θ| = N+1, Θ = {0,1/N,…,N-1/N,1} Problem: Bad for the skewed predictions 2. data dependent strategy: Ideas: iteratively choose thresholds evenly dividing the data => iteratively finding medians (as thresholds) (smooth sensitivity & deal with invalid thresholds)
25
Private ROC Curve - PriROC
Computing TPRs and FPRs Compute TPRs from computing prefix range queries on Similarly for computing FPRs
26
Private ROC Curve - PriROC
Ensuring Monotonicity To ensure monotonicity, applying method from [Hay. VLDB 10]
27
Private ROC Curve - PriROC
Empirical Evaluation
28
Private ROC Curve - PriROC
Empirical Evaluation AUC Symmetric Difference
29
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
30
Future Work - Verification
Counting queries 1. Data Independent Algorithms (easy) e.g. Laplace Mechanism 2. Data Dependent Algorithms (hard) err is data dependent
31
Future Work - Verification
Definition: Sensitivity of Randomized Algorithm For any randomized algorithm A: D -> R with random variable stream N, we say the randomized algorithm A has sensitivity Δ, if for any two neighboring datasets D and D’, any fixed values of N, Theorem: If randomized algorithm A has sensitivity Δ, then satisfies ε-differential privacy and
32
Future Work - Verification
Another interesting problem: Given an error bound, offer the output only when its error is bounded by the error bound w.h.p.
33
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
34
Future Work - Data Synthesis
Queries on the synthetic data release the information of the synthetic data. Differentially Private Data Synthesis good in terms of the privacy for the whole system, but too much noise Weaker privacy definition? Data synthesis process should be protected
35
Future Work - Data Synthesis
What kind of weaker privacy definition we can use for generating synthetic data? Can the chosen weaker privacy definition composed with differential privacy? How the whole system is protected? Even if the weaker privacy definition is composed with differential privacy, what is the tightest composition result? More complex data synthesis algorithms: Can we empirically evaluate what they protect?
36
Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary
37
Summary We present the framework for private data management with verification and propose some open questions We start with query verification on differentially private regression diagnostics. We propose the first differentially private algorithms PriRP (for linear regression) and PriROC (for logistic regression) We present our initial works on verification of data dependent algorithms for counting queries. We briefly show the idea of private data synthesis as another future direction.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.