Additive Data Perturbation: data reconstruction attacks

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Covariance Matrix Applications
Component Analysis (Review)
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Principal component analysis (PCA)
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Dimensional reduction, PCA
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Face Recognition Jeremy Wyatt.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.
Additive Data Perturbation: data reconstruction attacks.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
Neural Computation Prof. Nathan Intrator
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Matrix Notation for Representing Vectors
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principal Component Analysis (PCA)
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Principal Components Analysis ( PCA)
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics.
Matrices and vector spaces
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
9.3 Filtered delay embeddings
Principal Component Analysis (PCA)
Principal Component Analysis
Dynamic graphics, Principal Component Analysis
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
The regression model in matrix form
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Techniques for studying correlation and covariance structure
Principal Component Analysis
Recitation: SVD and dimensionality reduction
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Parallelization of Sparse Coding & Dictionary Learning
Feature space tansformation methods
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Feature Selection Methods
Principal Component Analysis
Lecture 16. Classification (II): Practical Considerations
Principal Components What matters most?.
Marios Mattheakis and Pavlos Protopapas
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Additive Data Perturbation: data reconstruction attacks

Outline (paper 15) Overview Data Reconstruction Methods Comparison PCA-based method Bayes method Comparison Summary

Overview Data reconstruction Z = X+R Problem: Z, R  estimate the value of X Extend it to matrix X contains multiple dimensions Or folding the vector X  matrix Approach 1 Apply matrix analysis technique Approach 2 Bayes estimation

Two major approaches Principle component analysis (PCA) based approach Bayes analysis approach

Variance and covariance Definition Random variable x, mean  Var(x) = E[(x- )2] Cov(xi, xj) = E[(xi- i)(xj- j)] For multidimensional case, X=(x1,x2,…,xm) Covariance matrix If each dimension xi has zero mean cov(X) = 1/m XT*X

PCA intuition Vector in space Original space  base vectors E={e1,e2,…,em} Example: 3-dimension space x,y,z axes corresponds to {(1 0 0),(0 1 0), (0 0 1)} If we want to use the red axes to represent the vectors The new base vectors U=(u1, u2) Transformation: matrix X  XU X1 X2 u1 u2

Why do we want to use different bases? Actual data distribution can be possibly described with lower dimensions X2 u1 X1 Ex: projecting points to U1, we can use one dimension (u1) to approximately describe all these points The key problem: finding these directions that maximize variance of the points. These directions are called principle components.

How to do PCA? Calculating covariance matrix: C = “Eigenvalue decomposition” on C Matrix C: symmetric We can always find an orthonormal matrix U U*UT = I So that C = U*B*UT B is a diagonal matrix X is zero mean on each dimension Explanation: di in B are actually the variance in the transformed space. U are the new base vectors.

Look at the diagonal matrix B (eigenvalues) We know the variance in each transformed direction We can select the maximum ones (e.g., k elements) to approximately describe the total variance Approximation with maximum eigenvalues Select the corresponding k eigenvectors in U U’ Transform A  AU’ AU’ has only k dimensional

PCA-based reconstruction Cov matrix for Y=X+R Elements in R is iid with variance 2 Cov(Xi+Ri, Xj+Rj) = cov(Xi,Xi) + 2 , for diagonal elements cov(Xi,Xj) for i!=j Therefore, removing 2 from the diagonal of cov(Y), we get the covariance matrix for X

Reconstruct X We have got C=cov(X) Apply PCA on cov matrix C C = U*B*UT Select major principle components and get the corresponding eigenvectors U’ X^ = Y*U’*U’T for X’ =X*U  X=X’*U-1=X’*UT ~ X’*U’T approximate X’ with Y*U’ and plugin Error comes from here

Bayes Method Make an assumption The original data is multidimensional normal distribution The noise is is also normal distribution Covariance matrix, can be approximated with the discussed method.

Data (x11,x12,…x1m)  vector (x21,x22,…x2m)  vector …

Problem: Given a vector yi, yi=xi+ri Find the vector xi Maximize the posterior prob P(X|Y)

Again, applying bayes rule Maximize this f Constant for all x With fy|x (y|x) = fR(y-x), plug in the distributions fx and fR We maximize:

It’s equivalent to maximize the exponential part A function is maximized/minimized, when its derivative =0 i.e., Solving the above equation, we get

Reconstruction For each vector y, plug in the covariance, the mean of vector x, and the noise variance, we get the estimate of the corresponding x

Experiments Errors vs. number of dimensions Conclusion: covariance between dimensions helps reduce errors

Errors vs. # of principle components Conclusion: the # of principal components ~ the amount of noise

Discussion The key: find the covariance matrix of the original data X Increase the difficulty of Cov(X) estimation  decrease the accuracy of data reconstruction Assumption of normal distribution for the Bayes method other distributions?