Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Slides:



Advertisements
Similar presentations
Independent Component Analysis
Advertisements

Independent Component Analysis: The Fast ICA algorithm
Component Analysis (Review)
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Lecture 7: Principal component analysis (PCA)
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Independent Component Analysis (ICA)
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Dimensional reduction, PCA
Projection Pursuit. Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Visual Recognition Tutorial
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Independent Components Analysis with the JADE algorithm
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
2.4 Nonnegative Matrix Factorization  NMF casts matrix factorization as a constrained optimization problem that seeks to factor the original matrix into.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,
Geology 6600/7600 Signal Analysis 02 Sep 2015 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Lecture 2: Statistical learning primer for biologists
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Principle Component Analysis and its use in MA clustering Lecture 12.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Independent Component Analysis Independent Component Analysis.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Unsupervised learning  Supervised and Unsupervised learning  General considerations  Clustering  Dimension reduction The lecture is partly based on:
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
HST.582J/6.555J/16.456J Gari D. Clifford Associate Director, Centre for Doctoral Training, IBME, University of Oxford
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Probabilistic Models with Latent Variables
Principal Component Analysis
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Principal Component Analysis
Computational Intelligence: Methods and Applications
Presentation transcript:

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA

Overview The purpose of dimension reduction:  Data simplification  Data visualization  Reduce noise (if we can assume only the dominating dimensions are signals)  Variable selection for prediction

Overview Data separationDimension reduction Outcome variable y exists (learning the association rule) Classification, regression SIR, Class- preserving projection, Partial least squares No outcome variable (learning intrinsic structure) ClusteringPCA, MDS, Factor Analysis, ICA, NCA…

PCA  Explain the variance-covariance structure among a set of random variables by a few linear combinations of the variables;  Does not require normality!

PCA

7 Reminder of some results for random vectors

8 Proof of the first (and second) point of the previous slide.

PCA The eigen values are the variance components: Proportion of total variance explained by the kth PC:

PCA

The geometrical interpretation of PCA:

PCA using the correlation matrix, instead of the covariance matrix? This is equivalent to first standardizing all X vectors. PCA

Using the correlation matrix avoids the domination from one X variable due to scaling (unit changes), for example using inch instead of foot. Example: PCA

Selecting the number of components? Based on eigen values (% variation explained). Assumption: the small amount of variation explained by low- rank PCs is noise.

Sparse PCA In high-dimensional data, loadings of a single PC on 10,000 genes doesn’t make much sense. To obtain sparse loadings, and make the interpretation easier, and the model more robust. SCoTLASS

Zhou, Hastie, Tibshirani’s SPCA by regression: Sparse PCA

Factor Analysis If we take the first several PCs that explain most of the variation in the data, we have one form of factor model. L: loading matrix F: unobserved random vector (latent variables). ε: unobserved random vector (noise)

Factor Analysis Orthogonal factor model assumes no correlation between the factor RVs. is a diagonal matrix

Factor Analysis

Rotations in the m-dimensional subspace defined by the factors make the solution non-unique: PCA is one unique solution, as the vectors are sequentially selected. Maximum likelihood estimator is another solution:

Factor Analysis As we said, rotations within the m-dimensional subspace doesn’t change the overall amount of variation explained. Do rotation to make the results more interpretable:

Factor Analysis Varimax criterion: Find T such that is maximized. V is proportional to the summation of the variance of the squared loadings. Maximizing V makes the squared loadings as spread out as possible --- some are real small, and some are real big.

23 Orthogonal simple factor rotation: Rotate the orthogonal factors around the origin until the system is maximally aligned with the separate clusters of variables. Oblique Simple Structure Rotation: Allow the factors to become correlated. Each factor is rotated individually to fit a cluster. Factor Analysis

MDS Multidimensional scaling is a dimension reduction procedure that maps the distances between observations to a lower dimensional space. Minimize this objective function: D: distance in the original space d: distance in the reduced dimension space. Numerical method is used for the minimization.

Projection pursuit A very broad term: finding the most “interesting” direction of projection. How the projection is done depends on the definition of “interesting”. If it is maximal variation, then PP leads to PCA. In a narrower sense: Finding non-Gaussian projections. For most high-dimensional clouds, most low-dimensional projections are close to Gaussian  important information in the data is in the directions for which the projected data is far from Gaussian.

Projection pursuit It boils down to objective functions – each kind of “interesting” has its own objective function to maximize.

PCA Projection pursuit with multi-modality as objective. Projection pursuit

One objective function to measure multi-modality: It uses the first three moments of the distribution. It can help finding clusters through visualization. To find w, the function is maximized over w by gradient ascent:

Projection pursuit Can think of PCA as a case of PP, with the objective function: For other PC directions, find projection onto space orthogonal to the previously found PCs.

Projection pursuit Some other objective functions (y is the RV generated by projection w’x) The Kurtosis as defined here has value 0 for normal distribution. Higher Kertusis: peaked and fat-tailed. m/link/kurtosis.htm

ICA finds a unique solution by requiring the factors to be statistically independent, rather than just uncorrelated. Lack of correlation only determines the second-degree cross- moment, while statistical independence means for any functions g1() and g2(), For multivariate Gaussian, uncorrelatedness = independence Independent component analysis Again, another view of dimension reduction is factorization into latent variables.

ICA Multivariate Gaussian is determined by second moments alone. Thus if the true hidden factors are Gaussian, then still they can be determined only up to a rotation. In ICA, the latent variables are assumed to be independent and non-Gaussian. The matrix A must have full column rank.

Independent component analysis ICA is a special case of PP. The key is again for y being non-Gaussian. Several ways to measure non-Gaussianity: (1)Kurtotis (zero for Gaussian RV, sensitive to outliers) (2)Entropy (Gaussian RV has the largest entropy given the first and second moments) (3) Negentropy: y gauss is a Gaussian RV with the same covariance matrix as y.

ICA To measure statistical independence, use mutual information, Sum of marginal entropies minus the overall entropy Non-negative ; Zero if and only if independent.

ICA The computation: There is no closed form solution, hence gradient descent is used. Approximation to negentropy (for less intensive computation and better resistance to outliers) Two commonly used G(): v is standard gaussian. G() is some nonquadratic function. When G(x)=x 4 this is Kurtosis.

Normal density Whitening transform:

ICA FastICA: Center the X vectors to mean zero. Whiten the X vectors such that E(xx’)=I. This is done through eigen value decomposition. Initialize the weight vector w Iterate: w + =E{xg(w T x)}-E{g’(w T x)}w w=w + /||w + || until convergence g() is the derivative of the non-quadratic function

Figure 14.28: Mixtures of independent uniform random variables. The upper left panel shows 500 realizations from the two independent uniform sources, the upper-right panel their mixed versions. The lower two panels show the PCA and ICA solutions, respectively. ICA