Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.

Similar presentations


Presentation on theme: "Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana."— Presentation transcript:

1 Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center)

2 Privacy Preservation for Data Streams 2 Application (1) Corp. A Corp. B Corp. C Analytical Services Finding trends, clusters, patterns, aggregations. Sensitive data P P P

3 Privacy Preservation for Data Streams 3 Application (2) Corp. A Information Hub Publish data as a service Client A Client B Subscribe data to identify trends, patterns, classes P

4 Privacy Preservation for Data Streams 4 Target Application Identify trends value time value time value time value time stream 1 stream 2 stream 3 stream 4 Cluster/ classification

5 Privacy Preservation for Data Streams 5 Problem Formulation time …….. A1A1 A2A2 ANAN t A1tA1t + Online generated noise, one vector at a time

6 Privacy Preservation for Data Streams 6 Problem Formulation (continued) time ……. R x Offline and Online Given σ 2, obtain A * online, s.t. D(A, A*) = σ 2, and for given R, D(A, A ~ ) is close to σ 2

7 Privacy Preservation for Data Streams 7 Data Perturbation time + Random i.i.d noise i.i.d: identical independently distributed

8 Privacy Preservation for Data Streams 8 Principal Component Analysis: PCA i.i.d Noise

9 Privacy Preservation for Data Streams 9 Principal Component Analysis: PCA Correlate d Noise

10 Privacy Preservation for Data Streams 10 PCA Based Data Reconstruction A A~A~ Removed Noise Principal Direction Remaining Noise Privacy A*A* σ2σ2 Added Noise: Utility Projection Error A*: Perturbed Data A: Original Data A ~ : Reconstructed Data

11 Privacy Preservation for Data Streams 11 PCA Based Data Reconstruction A A~A~ Principal Direction Remaining Noise Privacy A*A* σ2σ2 Added Noise: Utility Projection Error A*: Perturbed Data A: Original Data A ~ : Reconstructed Data Correlated Noise!

12 Privacy Preservation for Data Streams 12 Data Perturbation: main idea  Observations –The amount of the random noise controls privacy/utility tradeoff –i.i.d (identical independently distributed) noise does not preserve the privacy! Not well enough  Lesson learned – Noise should be correlated with original data Z. Huang et al. Sigmod 05.

13 Privacy Preservation for Data Streams 13 Challenge 1: Dynamic Correlation

14 Privacy Preservation for Data Streams 14 Challenge 1: Dynamic Correlation

15 Privacy Preservation for Data Streams 15 Challenge 2: Dynamic Autocorrelation

16 Privacy Preservation for Data Streams 16 Challenge 2: Dynamic Autocorrelation

17 Privacy Preservation for Data Streams 17 Online Random Noise for Autocorrelation: Stock

18 Privacy Preservation for Data Streams 18 State of the Art  Privacy Preservation –Given a utility requirement, maximize the privacy  Existing Work (Z. Huang et al. Sigmod05) –Batch mode, static data –And many other works (see our paper for a detailed literature review)

19 Privacy Preservation for Data Streams 19 Adding Dynamic Correlated Noise A1A1 A2A2 A3A3 + U 3x3 : online estimation of principal components AtAt Update U EtEt Generate noise distributed along U A~tA~t Publish A ~ t S. Papadimitriou et al. VLDB05

20 Privacy Preservation for Data Streams 20 Put it into Algorithm: Distribute Noise σ2σ2 σ2σ2 k=3, U: eigenvectors, V: eigenvalues Added to A t Rotate back to data space Noise distributed in principal components’ subspace

21 Privacy Preservation for Data Streams 21 why is our algorithm better (state of the art)? Local principal component Global principal component Noise added along global PC -- offline Removed noise by online reconstruction Noise added along global PC -- offline Removed noise by online reconstruction

22 Privacy Preservation for Data Streams 22 Online Reconstruction vs. Offline Reconstruction  Choice of adversary: –Offline reconstruction based on global principal components –Online tracking of the principal components and apply local reconstruction –Please see the details in the paper

23 Privacy Preservation for Data Streams 23 Tracking Autocorrelation a=[1 2 3 4 5 6] T w1w1 w2w2 w3w3 w4w4 W = 1 2 3 2 3 4 3 4 5 4 5 6 Time h streams

24 Privacy Preservation for Data Streams 24 Distribute Noise W = 1 2 3 2 3 4 3 4 5 4 5 6 1 2 3 2 3 4 3 4 5 4 5 6 1 2 3 2 3 4 3 4 5 4 5 6 1 2 3 2 3 4 3 4 5 4 5 6 1 2 3 2 3 4 3 4 5 4 5 6 Avoid adding noise > allowed threshold! And still auto-correlated with the stream Idea: constraint the next k noise values based on previous h-k noises + current estimation of U  becomes a linear system

25 Privacy Preservation for Data Streams 25 Experiments  Three Real Data Streams – Sensor streams, Lab: Light, Humidity, Volt, Temperature. 7712x198 – Choroline environmental streams: 4310x166 – Stock streams: 8000x2

26 Privacy Preservation for Data Streams 26 Perturbation vs. Reconstruction Perturbationi.i.d-NOffline-NOnline-N: SCAN / SACAN ReconstructionBaselineOffline-ROnline-R: SCOR / SACOR noise correlated with global principal components streaming correlated additive noise streaming auto-correlated additive noise offline-reconstruction based on global principal components streaming correlated online reconstruction streaming auto-correlated online reconstruction noise (discrepancy) is represented by the relative energy as percentage to the original data streams, i.e., D(A, A*)/||A|| take perturbed data as the reconstruction

27 Privacy Preservation for Data Streams 27 Reconstruction Error: Online-R vs. Offline-R online reconstruction achieves better accuracy as it minimizes the projection error 10% noise k=10

28 Privacy Preservation for Data Streams 28 Reconstruction Error: vary k 1.online reconstruction achieves better accuracy 2.large k reduces projection error

29 Privacy Preservation for Data Streams 29 Privacy vs. Discrepancy, online-R: Lab data

30 Privacy Preservation for Data Streams 30 Privacy vs. Discrepancy, online-R: Choroline

31 Privacy Preservation for Data Streams 31 Online Random Noise for Autocorrelation: Choroline

32 Privacy Preservation for Data Streams 32 Online Random Noise for Autocorrelation: Stock

33 Privacy Preservation for Data Streams 33 Privacy vs. Discrepancy: Online-R ( Choroline )

34 Privacy Preservation for Data Streams 34 Privacy vs. Discrepancy: Online-R (Stock)

35 Privacy Preservation for Data Streams 35 Running Time Analysis

36 Privacy Preservation for Data Streams 36 Running Time Analysis

37 Privacy Preservation for Data Streams 37 Future Work  Combing correlation and autocorrelation  Other type of data streams, other than numeric data, such as categorical data

38 Privacy Preservation for Data Streams 38 Questions  Thank you!


Download ppt "Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana."

Similar presentations


Ads by Google