Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
An RG theory of cultural evolution Gábor Fáth Hungarian Academy of Sciences Budapest, Hungary in collaboration with Miklos Sarvary - INSEAD, Fontainebleau,
PCA + SVD.
Shape From Light Field meets Robust PCA
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
An introduction to Principal Component Analysis (PCA)
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Principal Component Analysis
Clustering over Multiple Evolving Streams by Events and Correlations Mi-Yen Yeh, Bi-Ru Dai, Ming-Syan Chen Electrical Engineering, National Taiwan University.
Principal Component Analysis (PCA) Principal component analysis (PCA) creates new variables (components) that consist of uncorrelated, linear combinations.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Context Compression: using Principal Component Analysis for Efficient Wireless Communications Christos Anagnostopoulos & Stathes Hadjiefthymiades Pervasive.
Dimensional reduction, PCA
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
A Concept of Environmental Forecasting and Variational Organization of Modeling Technology Vladimir Penenko Institute of Computational Mathematics and.
1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §
1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Principal Component Analysis Principles and Application.
RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks Huamin Chen, Jian Li, and Prasant Mohapatra Presenter: Jian Li.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Time Series Compressibility and Privacy VLDB 2007 : Time-Series Data Mining Presented By Spiros Papadimitriou, Feifei Li, George Kollios, Philip S. Yu.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Additive Data Perturbation: data reconstruction attacks.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Gap-filling and Fault-detection for the life under your feet dataset.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Simulated data sets Extracted from:. The data sets shared a common time period of 30 years and age range from 0 to 16 years. The data were provided to.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
A multiline LTE inversion using PCA Marian Martínez González.
Randomization based Privacy Preserving Data Mining Xintao Wu University of North Carolina at Charlotte August 30, 2012.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.
Locations. Soil Temperature Dataset Observations Data is – Correlated in time and space – Evolving over time (seasons) – Gappy (Due to failures) – Faulty.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics.
Enabling Real Time Alerting through streaming pattern discovery Chengyang Zhang Computer Science Department University of North Texas 11/21/2016 CRI Group.
Forecasting with Cyber-physical Interactions in Data Centers (part 3)
Outlier Processing via L1-Principal Subspaces
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Dynamic graphics, Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Feature space tansformation methods
Feature Selection Methods
Principal Component Analysis
Presentation transcript:

Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center)

Privacy Preservation for Data Streams 2 Application (1) Corp. A Corp. B Corp. C Analytical Services Finding trends, clusters, patterns, aggregations. Sensitive data P P P

Privacy Preservation for Data Streams 3 Application (2) Corp. A Information Hub Publish data as a service Client A Client B Subscribe data to identify trends, patterns, classes P

Privacy Preservation for Data Streams 4 Target Application Identify trends value time value time value time value time stream 1 stream 2 stream 3 stream 4 Cluster/ classification

Privacy Preservation for Data Streams 5 Problem Formulation time …….. A1A1 A2A2 ANAN t A1tA1t + Online generated noise, one vector at a time

Privacy Preservation for Data Streams 6 Problem Formulation (continued) time ……. R x Offline and Online Given σ 2, obtain A * online, s.t. D(A, A*) = σ 2, and for given R, D(A, A ~ ) is close to σ 2

Privacy Preservation for Data Streams 7 Data Perturbation time + Random i.i.d noise i.i.d: identical independently distributed

Privacy Preservation for Data Streams 8 Principal Component Analysis: PCA i.i.d Noise

Privacy Preservation for Data Streams 9 Principal Component Analysis: PCA Correlate d Noise

Privacy Preservation for Data Streams 10 PCA Based Data Reconstruction A A~A~ Removed Noise Principal Direction Remaining Noise Privacy A*A* σ2σ2 Added Noise: Utility Projection Error A*: Perturbed Data A: Original Data A ~ : Reconstructed Data

Privacy Preservation for Data Streams 11 PCA Based Data Reconstruction A A~A~ Principal Direction Remaining Noise Privacy A*A* σ2σ2 Added Noise: Utility Projection Error A*: Perturbed Data A: Original Data A ~ : Reconstructed Data Correlated Noise!

Privacy Preservation for Data Streams 12 Data Perturbation: main idea  Observations –The amount of the random noise controls privacy/utility tradeoff –i.i.d (identical independently distributed) noise does not preserve the privacy! Not well enough  Lesson learned – Noise should be correlated with original data Z. Huang et al. Sigmod 05.

Privacy Preservation for Data Streams 13 Challenge 1: Dynamic Correlation

Privacy Preservation for Data Streams 14 Challenge 1: Dynamic Correlation

Privacy Preservation for Data Streams 15 Challenge 2: Dynamic Autocorrelation

Privacy Preservation for Data Streams 16 Challenge 2: Dynamic Autocorrelation

Privacy Preservation for Data Streams 17 Online Random Noise for Autocorrelation: Stock

Privacy Preservation for Data Streams 18 State of the Art  Privacy Preservation –Given a utility requirement, maximize the privacy  Existing Work (Z. Huang et al. Sigmod05) –Batch mode, static data –And many other works (see our paper for a detailed literature review)

Privacy Preservation for Data Streams 19 Adding Dynamic Correlated Noise A1A1 A2A2 A3A3 + U 3x3 : online estimation of principal components AtAt Update U EtEt Generate noise distributed along U A~tA~t Publish A ~ t S. Papadimitriou et al. VLDB05

Privacy Preservation for Data Streams 20 Put it into Algorithm: Distribute Noise σ2σ2 σ2σ2 k=3, U: eigenvectors, V: eigenvalues Added to A t Rotate back to data space Noise distributed in principal components’ subspace

Privacy Preservation for Data Streams 21 why is our algorithm better (state of the art)? Local principal component Global principal component Noise added along global PC -- offline Removed noise by online reconstruction Noise added along global PC -- offline Removed noise by online reconstruction

Privacy Preservation for Data Streams 22 Online Reconstruction vs. Offline Reconstruction  Choice of adversary: –Offline reconstruction based on global principal components –Online tracking of the principal components and apply local reconstruction –Please see the details in the paper

Privacy Preservation for Data Streams 23 Tracking Autocorrelation a=[ ] T w1w1 w2w2 w3w3 w4w4 W = Time h streams

Privacy Preservation for Data Streams 24 Distribute Noise W = Avoid adding noise > allowed threshold! And still auto-correlated with the stream Idea: constraint the next k noise values based on previous h-k noises + current estimation of U  becomes a linear system

Privacy Preservation for Data Streams 25 Experiments  Three Real Data Streams – Sensor streams, Lab: Light, Humidity, Volt, Temperature. 7712x198 – Choroline environmental streams: 4310x166 – Stock streams: 8000x2

Privacy Preservation for Data Streams 26 Perturbation vs. Reconstruction Perturbationi.i.d-NOffline-NOnline-N: SCAN / SACAN ReconstructionBaselineOffline-ROnline-R: SCOR / SACOR noise correlated with global principal components streaming correlated additive noise streaming auto-correlated additive noise offline-reconstruction based on global principal components streaming correlated online reconstruction streaming auto-correlated online reconstruction noise (discrepancy) is represented by the relative energy as percentage to the original data streams, i.e., D(A, A*)/||A|| take perturbed data as the reconstruction

Privacy Preservation for Data Streams 27 Reconstruction Error: Online-R vs. Offline-R online reconstruction achieves better accuracy as it minimizes the projection error 10% noise k=10

Privacy Preservation for Data Streams 28 Reconstruction Error: vary k 1.online reconstruction achieves better accuracy 2.large k reduces projection error

Privacy Preservation for Data Streams 29 Privacy vs. Discrepancy, online-R: Lab data

Privacy Preservation for Data Streams 30 Privacy vs. Discrepancy, online-R: Choroline

Privacy Preservation for Data Streams 31 Online Random Noise for Autocorrelation: Choroline

Privacy Preservation for Data Streams 32 Online Random Noise for Autocorrelation: Stock

Privacy Preservation for Data Streams 33 Privacy vs. Discrepancy: Online-R ( Choroline )

Privacy Preservation for Data Streams 34 Privacy vs. Discrepancy: Online-R (Stock)

Privacy Preservation for Data Streams 35 Running Time Analysis

Privacy Preservation for Data Streams 36 Running Time Analysis

Privacy Preservation for Data Streams 37 Future Work  Combing correlation and autocorrelation  Other type of data streams, other than numeric data, such as categorical data

Privacy Preservation for Data Streams 38 Questions  Thank you!