1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian

2 Outline Motivation Brief background on NBC Privacy breach for views Transformation from unsafe views to safe views Extension for arbitrary prior distributions Experiments Conclusion

3 Motivation PPDM methods seek to achieve the benefits from data mining on the data, without compromising privacy of individuals in the data. –data collection phase –data publishing phase –data mining phase

4 Motivation Privacy breaches when publishing NBCs –Bob knows that Alice lives on Westwood and she is in 40s –Bob’s prior belief on Alice earning 70K was 5/7 = 71% –After seeing the views, Bob infers that with a probability of 1/10 × (4/5 + 4×3/4 + 5×1) = 88% Alice earns a 70K salary.

5 Motivation Publishing better views –Bob’s posterior belief 1/6 × (2/3+1/2+1/2+1+1+1) = 78% –71%-to-78% is safer than 71%-to-88%

6 Motivation Achieve same classification results –Test input is –The NBC built on V1 predicts the class label as 50K, because 5/7×1/5×1/5 < 2/7×1/2×1/2 –The prediction from the second classifier (built on V2) is again 50K, because 3/5×1/3×1/3 < 2/5×1/2×1/2

7 Motivation NBC has proved to be one of the most effective classifiers in practice and in theory. Given an unsafe NBC, it is possible to find an equivalent one that is safer to publish. The objective is determining whether a set of NBC- enabling views are safe to publish And if not, how to find a secure database that produces the same NBC model satisfying privacy requirements.

8 Brief Background on NBC The original database T is an instance of a relation In order to build an NBC, the only views that need to be published are for all, and Equivalent to publishing these views, one can instead publish the following counts. For,,,

9 Brief Background on NBC Using these counts, we can express the NBC’s probability estimation as follows. For all and for all, the NBC’s prediction is:

10 Privacy Breach for Views Prior and posterior knowledge where Quasi-identifier: Family of all table instances: all instances satisfying the given views:

11 For a given table T, publishing V(T) = V 0 causes a privacy breach with respect to a pair of given constants 0 < L 1 < L 2 < 1, if either of the following holds: or, For example, 0.5-to-0.8 does not satisfy the privacy requirement L 1 = 0.51 and L 2 = 0.8, but 0.5-to-0.78 does. Privacy Breach for Views

12 assume a uniform distribution of the database instances; assume a uniform distribution of class values. Privacy Breach for Views

13 Privacy Breach for Views Let I 0 be the value of a given quasi-identifier I, and let V 0 be the value of a given view V(T). If there exist some m 1,m 2 > 0 such that for all : then for any c and any pair of L 1,L 2 > 0 publishing V 0 will not cause any privacy breaches w.r.t. L1 and L 2, provided that the following amplification criterion holds:

14 Privacy Breach for Views For a given quasi-identifier I = I 0, a given view V(T) = V 0 is safe to publish against any L 1 -to-L 2 privacy breaches, if there exists such that the following conditions hold: and for all : select the largest possible for a given, recast the privacy goal as that of checking/enforcing the second condition

15 Privacy Breach for Views With respect to a given I 0 as the value of a quasi- identifier I, and a given amplification ratio, the viewset (P,N) is safe to publish, if for all, and, the following conditions hold:

16 Privacy Breach for Views Two observations –all quasi-identifiers that have the same cardinality (i.e., number of attributes) can be blocked at the same time, since the conditions are functions of |I|, and not of I or I 0. –all privacy breaches for all quasi-identifiers of any cardinality can be blocked by simply blocking the one with largest cardinality, namely n, because

17 Privacy Breach for Views With respect to a given amplification ratio, the viewset (P,N) is safe to publish, if for all, and, the following conditions hold:

18 Transformation from unsafe views to safe views NBC-Equivalence Let f and f’ be two functions that map each element of to a non-negative real number. We call f and f’ NBC-equivalent, if

19 Transformation from unsafe views to safe views Transformation algorithms –Input: V is the given view set consisting of and ; amplification ratio –Description: Step 1: Replace all those that are 0 to non-zero Step 2: Scale down all to new rational numbers that satisfy the given amplification ratio Step 3: Adjust the numbers such that again Step 4: Normalize the numbers or turn them into integers –Output: V (1)Raising all the counts to the same power does not change the classification; (2)In other words a set of NBC-equivalent viewsets is closed under exponentiation. Example: 100 and 16, 10>4 100-16>10-4

20 Extension for arbitrary prior distributions See an tiny example

21 Experiments Adult dataset containing 32,561 tuples The attributes used were Age, Years of education, Work hours per week, and Salary. an NBC trained on the k-anonymous data vs. an NBC trained on the output of Safety Views Transformation

22 Conclusion Reformulated privacy breach for view publishing Presented sufficient conditions that are easy to check/enforce Provided algorithms that guarantee the privacy of the individuals who provided the training data, and incur zero accuracy loss in terms of building an NBC.

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

Similar presentations

Presentation on theme: "1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

Similar presentations

Presentation on theme: "1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian."— Presentation transcript:

Similar presentations

About project

Feedback