Marios Mattheakis and Pavlos Protopapas

Slides:



Advertisements
Similar presentations
Matrix Representation
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Covariance Matrix Applications
Surface normals and principal component analysis (PCA)
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality Reduction PCA -- SVD
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
Face Recognition Using Eigenfaces
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
3D Geometry for Computer Graphics
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Some matrix stuff.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Unsupervised Learning II Feature Extraction
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
PREDICT 422: Practical Machine Learning
Statistical Data Analysis - Lecture /04/03
Information Management course
Background on Classification
LECTURE 10: DISCRIMINANT ANALYSIS
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Matrices and Vectors Review Objective
Lecture: Face Recognition and Feature Reduction
Principal Component Analysis (PCA)
Principal Components Analysis
Principal Component Analysis
Lecture Slides Elementary Statistics Thirteenth Edition
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
SVD: Physical Interpretation and Applications
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
Recitation: SVD and dimensionality reduction
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Parallelization of Sparse Coding & Dictionary Learning
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Feature space tansformation methods
Principal Component Analysis
Generally Discriminant Analysis
Symmetric Matrices and Quadratic Forms
Eigen Decomposition Based on the slides by Mani Thomas
Principal Components What matters most?.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
LECTURE 09: DISCRIMINANT ANALYSIS
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis
Lecture 16. Classification (II): Practical Considerations
Symmetric Matrices and Quadratic Forms
Presentation transcript:

Marios Mattheakis and Pavlos Protopapas Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA) Marios Mattheakis and Pavlos Protopapas

Outline Introduction: Why Dimensionality Reduction? Linear Algebra (Recap). Statistics (Recap). Principal Component Analysis: Foundation. Assumptions & Limitations. Kernel PCA for nonlinear dimensionality reduction.

Dimensionality Reduction, why? A process of reducing the number of predictor variables under consideration. To find a more meaningful basis to express our data filtering the noise and revealing the hidden structure. C. Bishop, Pattern Recognition and Machine Learning, Springer (2008).

A simple example taken by Physics Consider an ideal spring-mass system oscillating along x. Seeking for the pressure Y that spring exerts on the wall. LASSO regression model: LASSO variable selection: J. Shlens, A Tutorial on Principal Component Analysis, (2003).

Principal Component Analysis versus LASSO LASSO simply selects one of the arbitrary directions, scientifically unsatisfactory. We want to use all the measurements to situate the position of mass. We want to find a lower-dimensional manifold of predictors on which data lie. LASSO X We ll see a review on Statistics and LA concepts since you have seen them in previous adv. sections. Principal Component Analysis (PCA): A powerful Statistical tool for analyzing data sets and is formulated in the context of Linear Algebra.

Linear Algebra (Recap)

Symmetric matrices Suppose a design (or data) matrix consists of n observations and p predictors, hence: Then is a symmetric matrix. Symmetric: Using that : Similar for

Eigenvalues and Eigenvectors Suppose a real and symmetric matrix: Exists a unique set of real eigenvalues: and the associate linearly independent eigenvectors: such that: (orthogonal) (normalized) Hence, they consist an orthonormal basis.

Spectrum and Eigen-decomposition Unitary Matrix: Eigen-decomposition:

Numerical verification of decomposition property Gram-Schmidt process

Real & Positive Eigenvalues: Gram Matrix The eigenvalues of are positive and real numbers: Group the terms Similar for Hence, and are Gram matrices.

Same eigenvalues The and share the same eigenvalues: Same eigenvalues. group the terms the eigenvalues are unchanged Same eigenvalues. Transformed eigenvectors:

The sum of eigenvalues of is equal to its trace Cyclic Property of Trace: Suppose the matrices: The trace of a Gram matrix is the sum of its eigenvalues.

Statistics (Recap)

Centered Model Matrix Suppose the model (data) matrix We make the predictors centered (each column has zero expectation) by subtracting the sample mean: each of the n rows represents a different repetition of the experiment, and each of the p columns gives a particular kind of feature Centered Model Matrix:

Sample Covariance Matrix Consider the Covariance matrix: Inspecting the terms: The diagonal terms are the sample variances: The non-diagonal terms are the sample covariances:

Principal Components Analysis (PCA) With the above preliminaries, the actual methodology of PCA is now quite simple. The idea is to remove as much redundancy in our predictors as possible. The redundancy is defined through the correlation between the predictors.

PCA PCA tries to fit an ellipsoid to the data. PCA is a linear transformation that transforms data to a new coordinate system. The data with the greatest variance lie on the first axis (first principal component) and so on. Using Linear Algebra PCA reduces the dimensions by throwing away the low variance principal components. J. Jauregui (2012)

PCA foundation Since is a Gram matrix, will be a Gram matrix too, hence: The eigenvalues are sorted in as: Using Linear Algebra The eigenvector is called the ith principal component of

Measure the importance of the principal components The total sample variance of the predictors: The fraction of the total sample variance that corresponds to : so, the indicates the “importance” of the ith principal component.

Back to spring-mass example PCA finds: revealing the one-degree of freedom. Hence, PCA indicates that there may be fewer variables that are essentially responsible for the variability of the response.

PCA Dimensionality Reduction The Spectrum represents the dimensionality reduction by PCA. Usually the first eigenvalue are orders of magnitude greater than the others because the data may have fewer degrees of freedom than the number of the predictors.

PCA Dimensionality Reduction There is no rule in how many eigenvalues to keep, but it is generally clear and left in analyst’s discretion. Usually the first eigenvalue are orders of magnitude greater than the others because the data may have fewer degrees of freedom than the number of the predictors. How many eigenvalues to keep is left in teh C. Bishop, Pattern Recognition and Machine Learning, Springer (2008).

Assumptions of PCA Although PCA is a powerful tool for dimension reduction, it is based on some strong assumptions. The assumptions are reasonable, but they must be checked in practice before drawing conclusions from PCA. When PCA assumptions fail, we need to use other Linear or Nonlinear dimension reduction methods.

Mean/Variance are sufficient In applying PCA, we assume that means and covariance matrix are sufficient for describing the distributions of the predictors. This is true only if the predictors are drawn by a multivariable Normal distribution, but approximately works for many situations. When a predictor is heavily deviate from Normal distribution, an appropriate nonlinear transformation may solve this problem.

High Variance indicates importance The eigenvalue is measures the “importance” of the ith principal component. It is intuitively reasonable, that lower variability components describe less the data, but it is not always true.

Principal Components are orthogonal PCA assumes that the intrinsic dimensions are orthogonal allowing us to use linear algebra techniques. When this assumption fails, we need to assume non-orthogonal components which are non compatible with PCA.

Linear Change of Basis PCA assumes that data lie on a lower dimensional linear manifold. So, a linear transformation yields an orthonormal basis. When the data lie on a nonlinear manifold in the predictor space, then linear methods are doomed to fail.

Kernel PCA for Nonlinear Dimensionality Reduction Applying a nonlinear map Φ (called feature map) on data yields PCA kernel: Centered nonlinear representation: Apply PCA to the modified Kernel:

Summary Dimensionality Reduction Methods A process of reducing the number of predictor variables under consideration. To find a more meaningful basis to express our data filtering the noise and revealing the hidden structure. Principal Component Analysis A powerful Statistical tool for analyzing data sets and is formulated in the context of Linear Algebra. Spectral decomposition: We reduce the dimension of predictors by reducing the number of principal components and their eigenvalues. PCA is based on strong assumptions that we need to check. Kernel PCA for nonlinear dimensionality reduction.

Advanced Section 4: Dimensionality Reduction, PCA Thank you Office hours for Adv. Sec. Monday 6:00-7:30 pm Tuesday 6:30-8:00 pm