1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.

Slides:



Advertisements
Similar presentations
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Advertisements

Covariance Matrix Applications
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis in MD Simulation Speaker: ZHOU Chen-Yang Supervisor: Wu Yun-Dong.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
An introduction to Principal Component Analysis (PCA)
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.
Principal Component Analysis
Dimensional reduction, PCA
SAC’06 April 23-27, 2006, Dijon, France Towards Value Disclosure Analysis in Modeling General Databases Xintao Wu UNC Charlotte Songtao Guo UNC Charlotte.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Face Recognition Using Eigenfaces
Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.
Active Appearance Models Computer examples A. Torralba T. F. Cootes, C.J. Taylor, G. J. Edwards M. B. Stegmann.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Additive Data Perturbation: data reconstruction attacks.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Digital Media Lab 1 Data Mining Applied To Fault Detection Shinho Jeong Jaewon Shim Hyunsoo Lee {cinooco, poohut,
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Randomization based Privacy Preserving Data Mining Xintao Wu University of North Carolina at Charlotte August 30, 2012.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.
Unsupervised Learning II Feature Extraction
Ch5.4 Central Limit Theorem
Chapter 3: Maximum-Likelihood Parameter Estimation
Multiplying Matrices.
Principal Component Analysis
School of Computer Science & Engineering
9.3 Filtered delay embeddings
Principal Component Analysis (PCA)
Principal Component Analysis
Additive Data Perturbation: data reconstruction attacks
Principal Component Analysis (PCA)
Techniques for studying correlation and covariance structure
Descriptive Statistics vs. Factor Analysis
X.1 Principal component analysis
Dimensionality Reduction
Feature Selection Methods
Principal Component Analysis
Examining Data.
Marios Mattheakis and Pavlos Protopapas
Chapter 4 –Dimension Reduction
NOISE FILTER AND PC FILTERING
Presentation transcript:

1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University

2 Random Perturbation Agrawal and Srikant’s SIGMOD paper. Y = X + R + Original Data XRandom Noise RDisguised Data Y

3 Random Perturbation Most of the security analysis methods based on randomization treat each attribute separately. Is that enough? Does the relationship among data affect privacy?

4 As we all know … We can’t perturb the same number for several times. If we do that, we can estimate the original data: Let t be the original data, Disguised data: t + R 1, t + R 2, …, t + R m Let Z = [(t+R 1 )+ … + (t+R m )] / m Mean: E(Z) = t Variance: Var(Z) = Var(R) / m

5 This looks familiar … This is the data set (x, x, x, x, x, x, x, x) Random Perturbation: (x+r 1, x+r 2,……, x+r m ) We know this is NOT safe. Observation: the data set is highly correlated.

6 Let’s Generalize! Data set: (x 1, x 2, x 3, ……, x m ) If the correlation among data attributes are high, can we use that to improve our estimation (from the disguised data)?

7 Introduction A heuristic approach toward privacy analysis Principal Component Analysis (PCA) PCA-based data reconstruction Experiment results Conclusion and future work

8 Privacy Quantification: A Heuristic Approach Our goal: to find a best-effort algorithm that reconstructs the original data, based on the available information. Definition

9 How to use the correlation? High Correlation  Data Redundancy Data Redundancy  Compression Our goal: Lossy compression: We do want to lose information, but We don’t want to lose too much data, We do want to lose the added noise.

10 PCA Introduction The main use of PCA: reduce the dimensionality while retaining as much information as possible. 1 st PC: containing the greatest amount of variation. 2 nd PC: containing the next largest amount of variation.

11 Original Data

12 After Dimension Reduction

13 For the Original Data They are correlated. If we remove 50% of the dimensions, the actual information loss might be less than 10%.

14 For the Random Noises They are not correlated. Their variance is evenly distributed to any direction. If we remove 50% of the dimensions, the actual noise loss should be 50%.

15 Data Reconstruction Applying PCA Find Principle Components: C = Q  Q T Set to be the first p columns of Q. Reconstruct the data:

16 Random Noise R How does affect accuracy? Theorem:

17 How to Conduct PCA on Disguised Data? Estimating Covariance Matrix

18 Experiment 1: Increasing the Number of Attributes Normal Distribution Uniform Distribution

19 Experiment 2: Increasing the number of Principal Components Normal Distribution Uniform Distribution

20 Experiment 3: Increasing Standard Deviation of Noises Normal DistributionUniform Distribution

21 Conclusions Privacy analysis based on individual attributes is not sufficient. Correlation can disclose information. PCA can filter out some randomness from a highly correlated data set. When does randomization fail: Answer: when the data correlation is high. Can it be cured?

22 Future Work How to improve the randomization to reduce the information disclosure? Making random noises correlated? How to combine the PCA with the univariate data reconstruction?