Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
Independent Component Analysis
Independent Component Analysis: The Fast ICA algorithm
Ordinary Least-Squares
Eigen Decomposition and Singular Value Decomposition
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Component Analysis (Review)
Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis
Independent Component Analysis (ICA)
Dimensional reduction, PCA
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Bayesian belief networks 2. PCA and ICA
Visual Recognition Tutorial
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Today Wrap up of probability Vectors, Matrices. Calculus
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Independent Components Analysis with the JADE algorithm
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Lecture 2: Statistical learning primer for biologists
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Independent Component Analysis Independent Component Analysis.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Principal Component Analysis
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
LECTURE 10: DISCRIMINANT ANALYSIS
Probabilistic Models with Latent Variables
Bayesian belief networks 2. PCA and ICA
Presented by Nagesh Adluru
OVERVIEW OF LINEAR MODELS
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Presentation transcript:

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et al. PNAS. (2003) Barker & Raynes. J. Chemometrics 2013.

Projection pursuit A very broad term: finding the most “interesting” direction of projection. How the projection is done depends on the definition of “interesting”. If it is maximal variation, then PP leads to PCA. In a narrower sense: Finding non-Gaussian projections. For most high-dimensional clouds, most low-dimensional projections are close to Gaussian  important information in the data is in the directions for which the projected data is far from Gaussian.

Projection pursuit It boils down to objective functions – each kind of “interesting” has its own objective function(s) to maximize.

PCA Projection pursuit with multi-modality as objective. Projection pursuit

One objective function to measure multi-modality: It uses the first three moments of the distribution. It can help finding clusters through visualization. To find w, the function is maximized over w by gradient ascent:

Projection pursuit Can think of PCA as a case of PP, with the objective function: For other PC directions, find projection onto space orthogonal to the previously found PCs.

Projection pursuit Some other objective functions (y is the RV generated by projection w’x) The Kurtosis as defined here has value 0 for normal distribution. Higher Kertusis: peaked and fat-tailed. m/link/kurtosis.htm

ICA finds a unique solution by requiring the factors to be statistically independent, rather than just uncorrelated. Lack of correlation only determines the second-degree cross- moment, while statistical independence means for any functions g1() and g2(), For multivariate Gaussian, uncorrelatedness = independence Independent component analysis Again, another view of dimension reduction is factorization into latent variables.

ICA Multivariate Gaussian is determined by second moments alone. Thus if the true hidden factors are Gaussian, then still they can be determined only up to a rotation. In ICA, the latent variables are assumed to be independent and non-Gaussian. The matrix A must have full column rank.

Independent component analysis ICA is a special case of PP. The key is again for y being non-Gaussian. Several ways to measure non-Gaussianity: (1)Kurtotis (zero for Gaussian RV, sensitive to outliers) (2)Entropy (Gaussian RV has the largest entropy given the first and second moments) (3) Negentropy: y gauss is a Gaussian RV with the same covariance matrix as y.

ICA To measure statistical independence, use mutual information, Sum of marginal entropies minus the overall entropy Non-negative ; Zero if and only if independent.

ICA The computation: There is no closed form solution, hence gradient descent is used. Approximation to negentropy (for less intensive computation and better resistance to outliers) Two commonly used G(): v is standard gaussian. G() is some nonquadratic function. When G(x)=x 4 this is Kurtosis.

ICA FastICA: Center the X vectors to mean zero. Whiten the X vectors such that E(xx’)=I. This is done through eigen value decomposition. Initialize the weight vector w Iterate: w + =E{xg(w T x)}-E{g’(w T x)}w w=w + /||w + || until convergence g() is the derivative of the non-quadratic function

Figure 14.28: Mixtures of independent uniform random variables. The upper left panel shows 500 realizations from the two independent uniform sources, the upper-right panel their mixed versions. The lower two panels show the PCA and ICA solutions, respectively. ICA

Other than dimension reduction, hidden factor model, there is another way to understand a model like this: It can be understood as explaining the data by a bipartite network --- a control layer and an output layer. Unlike PCA and ICA, NCA doesn’t assume a fully linked loading matrix. Rather, the matrix is sparse. The non-zero locations are pre-determined by biological knowledge about regulatory networks. For example, Network component analysis

Motivation: Instead of blindly search for lower dimensional space, a priori information is incorporated into the loading matrix.

NCA X NxP =A NxK P KxP +E NxP Conditions for the solution to be unique: (1)A is full column rank; (2)When a column of A is removed, together with all rows corresponding to non-zero values in the column, the remaining matrix is still full column rank; (3)P must have full row rank

NCA Fig. 2. A completely identifiable network (a) and an unidentifiable network (b). Although the two initial [A] matrices describing the network matrices have an identical number of constraints (zero entries), the network in b does not satisfy the identifiability conditions because of the connectivity pattern of R 3. The edges in red are the differences between the two networks.

NCA Notice that both A and P are to be estimated. Then the criteria of identifiability is in fact untestable. The compute NCA, minimize the square loss function: Z 0 is the topology constraint matrix – i.e. which position of A is non-zero. It is based on prior knowledge. It is the network connectivity matrix.

NCA Solving NCA: This is a linear decomposition system which has the bi- convex property. It is solved by iteratively solving for A and P while fixing the other one. Both steps use least squares. Convergence is judged by the total least-square error. The total error is non-increasing in each step. Optimality is guaranteed if the three conditions for identifiability are satisfied. Otherwise a local optimum may be found.

PLS  Finding latent factors in X that can predict Y.  X is multi-dimensional, Y can be either a random variable or a random vector.  The model will look like: where T j is a linear combination of X  PLS is suitable in handling p>>N situation.

PLS Data: Goal:

PLS Solution: a k+1 is the (k+1) th eigen vector of Alternatively, The PLS components minimize Can be solved by iterative regression.

PLS Example: PLS v.s. PCA in regression: Y is related to X 1