Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey.

Similar presentations


Presentation on theme: "Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey."— Presentation transcript:

1 Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey Victor Muntés i Mulero Instituto de Investigación en Inteligencia Artificial Consejo Superior de Investigaciones Científicas

2 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Presentation Schema Motivation Basic Concepts Ordered Neural Networks (ONN) Experimental Results Conclusions and Future Work

3 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy NameID numberSalaryPostal codeAge John Smith5312456620.000 €1710032 Michael Grom3442331225.000 €0808042 Anna Molina1882736415.000 €3622732 Our Scenario: attribute classification Classification of attributes Identifiers (ID) Quasi-identifiers Confidential (C) Non-Confidential (NC) 3 NC ID C

4 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Data Privacy and Anomymization ID NC C C 4 Original DataReleased Data ID NC External Data Source Record Linkage Confidential data disclosure!!!

5 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Data Privacy and Anomymization ID NC’ C C 5 ID NC External Data Source Record Linkage NC ? Anonymization Process Goal: Ensure protection while preserving statistical usefulness Trade-Off: Accuracy vs Privacy Privacy in Statistical Database (PSD) Privacy Preserving Data Mining (PPDM) Privacy in Statistical Database (PSD) Privacy Preserving Data Mining (PPDM)

6 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Presentation Schema Motivation Basic Concepts Ordered Neural Networks (ONN) Experimental Results Conclusions and Future Work

7 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Rank Swapping (RS-p) [Moore96] Sorts values in each attribute and swaps them randomly with a restricted range of size p Microaggregation (MIC-vm-k) [DM02] Builds small clusters from v variables of at least k elements Then, it replaces each value by the centroid of the cluster to which it belongs Best Ranked Protection Methods [DT01] [DT01] Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science (2001) 111-133 [Moore96]Moore, R.: Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census (Unpublished manuscript) (1996) [DM02]Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on KDE 14 (2002) 189-201

8 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Rank Swapping (RS-p) p = 20% 8 6 10 7 9 2 1 4 5 3 1 2 3 4 5 6 7 8 9 10 attr 1 sort values

9 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy MDAV Microaggregation k=2 X X’ MDAV is an heuristic microaggegation MDAV = Maximum distance to the average vector

10 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Our contribution... We propose a new perturvative protection method for numerical data based on the use of neural networks Basic idea: learning a pseudo-identity function (quasi-learning ANNs) Anonymizing numerical data sets

11 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Each neuron weights its inputs and applies an activation function: For our purpose, we assume ANNs without feedback connections and layer-bypassing Artificial Neural Networks (Sigmoid)

12 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Allows the ANN to learn from a predefined set of input- output example pairs It adjusts weights in the ANN iteratively In each iteration we calculate the error in the output layer using a sum of the squared difference Weights are updated using an iterative steepest descent method Backpropagation Algorithm

13 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Presentation Schema Motivation Basic Concepts Ordered Neural Networks (ONN) Experimental Results Conclusions and Future Work

14 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Key idea: innacurately learning the original data set, using ANNs, in order to reproduce a similar one: Similar enough to preserve the properties of the original data set Different enough not to reveal the original confidential values Ordered Neural Networks (ONN)

15 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy How can we learn the original data set? Ordered Neural Networks (ONN) a n Try to learn the original data set with a single neural network TOO COMPLEX

16 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy How can we learn the original data set? Ordered Neural Networks (ONN) a n a The pattern to be learnt may still be too complex We could sort each attribute independently in order to simplify the learning process

17 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy How can we learn the original data set? Ordered Neural Networks (ONN) a n a Reordering of each attribute separately The concept of tuple is lost! Why are we so keen on preserving the attribute semantics?

18 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Different approach: We ignore the attribute semantics mixing all the values in the database We sort them to make the learning process easier We partition the values into several blocks in order to simplify the learning process Ordered Neural Networks (ONN)

19 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy ONN General Schema

20 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Vectorization ONN ignores the attribute semantics to reduce the learning process cost 20

21 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Sorting Objective: simplifying the learning process and reduce learning time 21

22 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Partitioning The size of the data set may be very large A single ANN would make the learning process very difficult ONN will use a different ANN for each partition k 22

23 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Normalize In order to make the learning process possible, it is necessary to normalize input data 23 We normalize the values for their images to fit in the range where the slope of the activation function is rellevant [FS91] [FS91] Freeman, J.A., Skapura, D.M. In: Neural Networks: Algorithms, Applications and Programming Techniques. Addison-Wesley Publishing Company (1991) 1-106

24 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Learning Step Given P partitions, we have one ANN per partition Each ANN is fed with values coming from the P partitions in order to add noise k a a b b c c d d e e f f g g h h i i a a d d g g P p1p1 p2p2 p3p3 p1p1 pPpP a’ a a = ? Backpropagation

25 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Learning Step Given P partitions, we have one ANN per partition Each ANN is fed with values coming from the P partitions in order to add noise k a a b b c c d d e e f f g g h h i i c c f f i i P p1p1 p2p2 p3p3 p1p1 pPpP c’ c c = ?

26 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Protection Step k a a b b c c d d e e f f g g h h i i P p1p1 p2p2 p3p3 p1p1 pPpP a’ a a d d g g First, we propagate the original data set through the trained ANNs Finally, we derandomized the generated values De-normalization

27 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Presentation Schema Motivation Basic Concepts Ordered Neural Networks (ONN) Experimental Results Conclusions and Future Work

28 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Data used in CASC Project (http://neon.vb.cbs.nl/casc)http://neon.vb.cbs.nl/casc Data from US Census Bureau: 1080 tuples x13 attributes =14040 values to be protected We compare our algorithm with the best 5 parameterizations presented in the literature for: Rank Swapping Microaggregation ONN is parameterized adhoc Experiments Setup

29 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy ONN parameterization: P: Number of Partitions B: Normalization Range Size E: Learning Rate Parameter C: Activation Function Slope Parameter H: Number of neurons in the hidden layer Experiments Setup

30 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Score: Protection Methods Evaluation Score = 0.5 IL + 0.5 DR IL = 100(0.2 IL 1 + 0.2 IL 2 + 0.2 IL 3 + 0.2 IL 4 + 0.2 IL 5 ) IL 1 = mean of absolute error IL 2 = mean variation of average IL 3 = mean variation of variance IL 4 = mean variation of covariancie IL 5 = mean variation of correlation DR = 0.5 DLD + 0.5 ID DLD = number of links using DBRL ID = protected values near orginal We need a protection quality score that measures: The difficulty for an intruder to reveal the original data The information loss in the protected data set

31 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Results 7 variables13 variables

32 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Presentation Schema Motivation Basic Concepts Ordered Neural Networks (ONN) Experimental Results Conclusions and Future Work

33 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy The use of ANNs combined with some preprocessing techniques is promising for protection methods In our experiments ONN is able to improve the protection quality of the best ranked protection methods in the literature As future work, we would like to establish a set of criteria to automatically tune the parameters of ONN Conclusions & Future Work

34 SOFSEM 2008, Nový Smokovec, Slovakia Neural Networks for Data Privacy Any questions? Contact e-mail: vmuntes@ac.upc.eduvmuntes@ac.upc.edu DAMA Group Web Site: http://www.dama.upc.eduhttp://www.dama.upc.edu


Download ppt "Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey."

Similar presentations


Ads by Google