Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.

Similar presentations


Presentation on theme: "1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database."— Presentation transcript:

1 1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database Systems San Diego, CA, June 2003 ( PODS 2003 ) Alexandre Evfimievsk Johannes Gehrke Ramakrishnan Srikant Cornell University Cornell University IBM Slmaden Research Center

2 2 Introduction Two broad approach in privacy preserving – secure multi-party computation approach – randomization approach – building classification models over randomized data – discover association rules over randomized data

3 3 Introduction Privacy We must ensure that the randomization is sufficient for preserving privacy e.g randomize age x i by adding r i ( drawn uniformly from a segment[-50, 50] ) assuming that the server receives age 120 from a user than the server has learn that the real age of the user >= 70

4 4 Introduction Two approaches for quantifying how privacy preserving a randomization method –Information theory –Privacy breaches

5 5 overview The Model N clients C 1,…C N connected to one server ; each C i has private x i To ensure privacy, each C i sends a modified y i of x i to server The server collects the modified information and recover the statistical properties

6 6 overview Assumptions x i € V X, V X is a finite set each x i is chosen independently at random according to the same fixed probability distribution px (not private)

7 7 overview Randomization randomization operator R(x) y i is an instance of R(x i ), is send to the server All possible outputs of R(x) is denoted by V Y, V Y is a finite set For all x € V X and y € V Y, the probability that R(x) outputs y is denoted by

8 8 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information

9 9 Privacy breaches Each possible value x of C i ’s private information has probability px(x) Define a random variable X such that The randomized value y i is an instance of a random variable Y such that The joint distribution of X and Y is

10 10 Privacy breaches Any property Q(x), Q : V x  { true, false}

11 11 Privacy breaches example x between 0 ~ 1000 1.R 1 (x) = x 20%, otherwise 80% (uniformly) 2.R 2 (x) = x +  (mod 1001),  in {-100 ~ 100} (uniformly) 3.R 3 (x) be R 2 (x) 50%, otherwise 50% (uniformly)

12 12 Privacy breaches 1%  71.6% 40.5%  100%

13 13 Privacy breaches Some property has very low prior probability but becomes likely once we learn that R(X) = y 1%  71.6% Some property has a probability far from 100% but becomes almost 100%-probable 40.5%  100%

14 14 Privacy breaches Let  1,  2 be two probabilities such that  1 corresponds to our intuitive notion of “very unlikely” whereas  2 corresponds to likely

15 15 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information

16 16 Amplification Use Def 1 to check privacy breaches 1. There are 2 |VX| possible properties check all ? 2. Without px of X, how can we use Def 1 ?

17 17 Amplification

18 18 Amplification

19 19 Amplification Proof : Assume that eor property Q(x) we have a ρ 1 to ρ 2 privacy breach

20 20 Amplification

21 21 Amplification

22 22 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information

23 23 Itemset Randomization Assume that all transaction have same size m and each transaction is an independent instance Select–a–size (with parameters: 0 < ρ < 1 and ) 1.Selects an integer j at random from {0, 1, …, m} defined p [j] = P [j is chosen] p [j] 2.Select j item from t, uniformly at random, put them into t’ => |t∩t’| = j 1/(m, j) 3.a !€ t, tosses a coin, P [head] = ρ, if head added to t’ ρ m’-j (1- ρ) n-m-(m’-j)

24 24 Itemset Randomization Denote t’ = R(t), m’ = |t’|, j = |t∩t|, n = | I |

25 25 Itemset Randomization

26 26 Itemset Randomization Frequent ?? Trying to have more items of t in t’ Give ρ, focus on p[j]’s Maximizing the following expectation

27 27 Itemset Randomization Select parameters ρ and to select ρ and j*

28 28 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information

29 29 Compressing randomized transactions Randomized transactions are large - Network resource - Lots of memory

30 30 Compressing randomized transactions A (Seed, n, q, ρ) - pseudorandom generator is a function G : Seed * {1,….,n} → {0, 1} that has following properties -  i : P [G( ξ, i ) = 1 | ξ€ r Seed] = ρ -  1 ≤ i 1 < … < i q ≤ n, G( ξ, i 1 ), G( ξ, i 2 ), … G( ξ, i q ), are statistically independent

31 31 Compressing randomized transactions We are going to represent a randomized transaction by a seed ξ€ Seed G( ξ, i ) = 1 means that item i belongs to the randomized transaction There is a mapping τ from seeds to transactions τ( ξ) = { item i | G( ξ, i ) = 1 } The set Seed : Boolean strings {0, 1} k, k << n

32 32 Compressing randomized transactions Another randomization operator similar to select - a - size, has parameters: 0 < ρ < 1 and Given transaction t, a (Seed, n, q, ρ) - pseudorandom generator with q ≥ m (size of t), The operator generates the seed = R’( t ) in three steps

33 33 Compressing randomized transactions 1.Selects an integer j at random from {0, 1, …, m} defined p [j] = P [j is chosen] 2.Select j item from t, uniformly at random, put them into t’, W.L.O.G assume t[1], [2], … t[j] are selected 3.Select a random seed ξ € Seed such that

34 34 outline Refined Definition of Privacy Breaches Amplification Itemset Randomization Compression of Randomized Transactions Worst- Case Information

35 35 Worst – Case information X random variable, Y = R(x) Random variable The mutual information I ( X ; Y ) is I(X ; Y)   Privacy  KL(p 1 || p 2 ) is Kullback-Leibler distance between the distribution p 1 (x) and p 2 (x) of two random variable

36 36 Worst – Case information e.g V x = {0, 1} P[ X = 0 ] = P[ X = 1] = ½ Y 1 = R 1 (X), Y 2 =R 2 (X) P[Y 1 = x | X = x] = 0.6 P[Y 1 = 1-x | X = x] = 0.4 P[Y 2 = e | X = x] = 0.9999 P[Y 2 = x | X = x] = 99*10 -6 P[Y 2 = 1-x | X = x] = 1*10 -6 I(X ; Y 2 ) << I(X ; Y 1 ) ????

37 37 Worst – Case information

38 38 Worst – Case information Revealing R(X) = y for some y cause ρ 1 to ρ 2 privacy breach Revealing R(X) = y for some y cause ρ 2 to ρ 1 privacy breach

39 39 Conclusion New definition of privacy breaches A general approach amplification Compressing long randomized transactions by using pseudorandom generators Defined several new information theoretical

40 40 Future work Continuous distribution Tradeoff between privacy and accuracy Combine randomization and secure multi- party computation approaches


Download ppt "1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database."

Similar presentations


Ads by Google