Download presentation
Presentation is loading. Please wait.
Published byOpal Cox Modified over 9 years ago
1
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database Systems San Diego, CA, June 2003 ( PODS 2003 ) Alexandre Evfimievsk Johannes Gehrke Ramakrishnan Srikant Cornell University Cornell University IBM Slmaden Research Center
2
2 Introduction Two broad approach in privacy preserving – secure multi-party computation approach – randomization approach – building classification models over randomized data – discover association rules over randomized data
3
3 Introduction Privacy We must ensure that the randomization is sufficient for preserving privacy e.g randomize age x i by adding r i ( drawn uniformly from a segment[-50, 50] ) assuming that the server receives age 120 from a user than the server has learn that the real age of the user >= 70
4
4 Introduction Two approaches for quantifying how privacy preserving a randomization method –Information theory –Privacy breaches
5
5 overview The Model N clients C 1,…C N connected to one server ; each C i has private x i To ensure privacy, each C i sends a modified y i of x i to server The server collects the modified information and recover the statistical properties
6
6 overview Assumptions x i € V X, V X is a finite set each x i is chosen independently at random according to the same fixed probability distribution px (not private)
7
7 overview Randomization randomization operator R(x) y i is an instance of R(x i ), is send to the server All possible outputs of R(x) is denoted by V Y, V Y is a finite set For all x € V X and y € V Y, the probability that R(x) outputs y is denoted by
8
8 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information
9
9 Privacy breaches Each possible value x of C i ’s private information has probability px(x) Define a random variable X such that The randomized value y i is an instance of a random variable Y such that The joint distribution of X and Y is
10
10 Privacy breaches Any property Q(x), Q : V x { true, false}
11
11 Privacy breaches example x between 0 ~ 1000 1.R 1 (x) = x 20%, otherwise 80% (uniformly) 2.R 2 (x) = x + (mod 1001), in {-100 ~ 100} (uniformly) 3.R 3 (x) be R 2 (x) 50%, otherwise 50% (uniformly)
12
12 Privacy breaches 1% 71.6% 40.5% 100%
13
13 Privacy breaches Some property has very low prior probability but becomes likely once we learn that R(X) = y 1% 71.6% Some property has a probability far from 100% but becomes almost 100%-probable 40.5% 100%
14
14 Privacy breaches Let 1, 2 be two probabilities such that 1 corresponds to our intuitive notion of “very unlikely” whereas 2 corresponds to likely
15
15 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information
16
16 Amplification Use Def 1 to check privacy breaches 1. There are 2 |VX| possible properties check all ? 2. Without px of X, how can we use Def 1 ?
17
17 Amplification
18
18 Amplification
19
19 Amplification Proof : Assume that eor property Q(x) we have a ρ 1 to ρ 2 privacy breach
20
20 Amplification
21
21 Amplification
22
22 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information
23
23 Itemset Randomization Assume that all transaction have same size m and each transaction is an independent instance Select–a–size (with parameters: 0 < ρ < 1 and ) 1.Selects an integer j at random from {0, 1, …, m} defined p [j] = P [j is chosen] p [j] 2.Select j item from t, uniformly at random, put them into t’ => |t∩t’| = j 1/(m, j) 3.a !€ t, tosses a coin, P [head] = ρ, if head added to t’ ρ m’-j (1- ρ) n-m-(m’-j)
24
24 Itemset Randomization Denote t’ = R(t), m’ = |t’|, j = |t∩t|, n = | I |
25
25 Itemset Randomization
26
26 Itemset Randomization Frequent ?? Trying to have more items of t in t’ Give ρ, focus on p[j]’s Maximizing the following expectation
27
27 Itemset Randomization Select parameters ρ and to select ρ and j*
28
28 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information
29
29 Compressing randomized transactions Randomized transactions are large - Network resource - Lots of memory
30
30 Compressing randomized transactions A (Seed, n, q, ρ) - pseudorandom generator is a function G : Seed * {1,….,n} → {0, 1} that has following properties - i : P [G( ξ, i ) = 1 | ξ€ r Seed] = ρ - 1 ≤ i 1 < … < i q ≤ n, G( ξ, i 1 ), G( ξ, i 2 ), … G( ξ, i q ), are statistically independent
31
31 Compressing randomized transactions We are going to represent a randomized transaction by a seed ξ€ Seed G( ξ, i ) = 1 means that item i belongs to the randomized transaction There is a mapping τ from seeds to transactions τ( ξ) = { item i | G( ξ, i ) = 1 } The set Seed : Boolean strings {0, 1} k, k << n
32
32 Compressing randomized transactions Another randomization operator similar to select - a - size, has parameters: 0 < ρ < 1 and Given transaction t, a (Seed, n, q, ρ) - pseudorandom generator with q ≥ m (size of t), The operator generates the seed = R’( t ) in three steps
33
33 Compressing randomized transactions 1.Selects an integer j at random from {0, 1, …, m} defined p [j] = P [j is chosen] 2.Select j item from t, uniformly at random, put them into t’, W.L.O.G assume t[1], [2], … t[j] are selected 3.Select a random seed ξ € Seed such that
34
34 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information
35
35 Worst – Case information X random variable, Y = R(x) Random variable The mutual information I ( X ; Y ) is I(X ; Y) Privacy KL(p 1 || p 2 ) is Kullback-Leibler distance between the distribution p 1 (x) and p 2 (x) of two random variable
36
36 Worst – Case information e.g V x = {0, 1} P[ X = 0 ] = P[ X = 1] = ½ Y 1 = R 1 (X), Y 2 =R 2 (X) P[Y 1 = x | X = x] = 0.6 P[Y 1 = 1-x | X = x] = 0.4 P[Y 2 = e | X = x] = 0.9999 P[Y 2 = x | X = x] = 99*10 -6 P[Y 2 = 1-x | X = x] = 1*10 -6 I(X ; Y 2 ) << I(X ; Y 1 ) ????
37
37 Worst – Case information
38
38 Worst – Case information Revealing R(X) = y for some y cause ρ 1 to ρ 2 privacy breach Revealing R(X) = y for some y cause ρ 2 to ρ 1 privacy breach
39
39 Conclusion New definition of privacy breaches A general approach amplification Compressing long randomized transactions by using pseudorandom generators Defined several new information theoretical
40
40 Future work Continuous distribution Tradeoff between privacy and accuracy Combine randomization and secure multi- party computation approaches
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.