Presented by Nagesh Adluru

Slides:



Advertisements
Similar presentations
Independent Component Analysis: The Fast ICA algorithm
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Visual Recognition Tutorial
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Gaussian Information Bottleneck Gal Chechik Amir Globerson, Naftali Tishby, Yair Weiss.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室.
Bayesian belief networks 2. PCA and ICA
Visual Recognition Tutorial
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Correlation. The sample covariance matrix: where.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Summarized by Soo-Jin Kim
Independent Components Analysis with the JADE algorithm
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Measure Independence in Kernel Space Presented by: Qiang Lou.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Gaussian Processes Li An Li An
A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil.
Principal Component Analysis (PCA)
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Estimating standard error using bootstrap
Spectral Methods for Dimensionality
Deep Feedforward Networks
Ch 12. Continuous Latent Variables ~ 12
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Brain Electrophysiological Signal Processing: Preprocessing
Outlier Processing via L1-Principal Subspaces
Machine Learning Basics
Application of Independent Component Analysis (ICA) to Beam Diagnosis
PCA vs ICA vs LDA.
Lecture 14 PCA, pPCA, ICA.
Bayesian belief networks 2. PCA and ICA
Principal Component Analysis
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
EE513 Audio Signals and Systems
Welcome to the Kernel-Club
A Fast Fixed-Point Algorithm for Independent Component Analysis
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
Unfolding with system identification
Marios Mattheakis and Pavlos Protopapas
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Presented by Nagesh Adluru KERNEL INDEPENDENT COMPONENT ANALYSIS BY FRANCIS BACH & MICHAEL JORDAN International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003 Presented by Nagesh Adluru

Goal of the Paper To perform Independent Component Analysis (ICA) in a novel way which is better, robust compared to the existing techniques. 29.12.2018

Concepts Involved ICA – Independent Component Analysis Mutual Information F – Correlation RKHS – Reproducing Kernel Hilbert Spaces CCA – Canonical Correlation Analysis KICA – Kernel ICA KGV – Kernel Generalized Variance 29.12.2018

ICA – Independent Component Analysis ICA is unsupervised learning We have estimate x given the set of observations of y (Assumption components of x are independent). So we have to estimate W such that x = Wy 29.12.2018

ICA – Independent Component Analysis ICA is semi-parametric. Because we do not know anything about the distribution of x it is non-parametric. But we do know the distribution of y and that it is a distribution of ‘linear combination’ of components of x. So the problem is semi-parametric and kernels do well in such situations. 29.12.2018

ICA – Independent Component Analysis If we knew the distribution of x then we can assume the ‘x-space’ and hence can find W using gradient or fixed-point algorithm. But not in practice!!! So how?? Since we are looking for independent components we need to maximize the independence or minimize mutual information. 29.12.2018

Mutual Information Mutual Information is an abstract term that is used to describe independence among variables. The mutual information is the least when the dependence is the least. So looks promising to be explored!!! Prior work has focused on approximations to this term because of difficulty involved with real-variables and finite samples. Kernels offer better ways. 29.12.2018

F – Correlation F – Correlation is defined as below: If x1 and x2 are independent then the value is zero but converse is important here. 29.12.2018

F – Correlation Converse: If is zero then the x1 and x2 are independent. Is that true? It is true only if F ‘space’ is very large. But it is also true if F is restricted to the reproducing Kernel Hilbert Spaces based on Gaussian kernels. 29.12.2018

F – Correlation Since the converse holds even for the restriction of F to RKHS, a mutual information can be defined such that if it is 0 then the two variables are independent. 29.12.2018

RKHS – Reproducing Kernel Hilbert Spaces Operations using kernels can be treated as operations in Hilbert space. The reproducing ability of the kernels of operations in Euclidean space is exploitable for computational purposes. So the correlation between fs can be interpreted as the correlation between Фs which is defined as the canonical correlation between Фs. 29.12.2018

CCA – Canonical Correlation Analysis CCA vs PCA PCA maximizes variance of projection of distribution of a single random vector. CCA maximizes correlation between projections of distributions of two or more random vectors. CIJ = cov(xI, xJ) 29.12.2018

CCA – Canonical Correlation Analysis While PCA leads to eigenvector problem CCA leads to generalized eigenvector problem. (Eigenvector problem: AV = V Generalized eigenvector problem: AV = BV) The CCA can easily be kernelized and also generalized to more than two random vectors. So the max correlation between variables can be found efficiently, which is very nice. 29.12.2018

CCA – Canonical Correlation Analysis Though this kernelization of CCA can help us, the generalization is not precise in terms mutual independence measure using F – Correlation. But that is not limitation in practice, both because of empirical results as well as because mutuality could be achieved using pair-wise dependence. 29.12.2018

Kernel ICA We saw And also that can be calculated using kernelized CCA. So we now have Kernel – ICA not in the sense that the basic ICA is kernelized but because using kernelized CCA. 29.12.2018

KICA – Kernel ICA Algorithm Input: W and Procedure: Estimate set Minimize are [N*N] Gram matrices for each component of the random vector. (Equivalent to generalized CCA, where each of the m vectors is a single element vector) 29.12.2018

KICA – Kernel ICA Computational Complexity of calculating ‘smallest’ generalized eigen value of matrices of size mN is O(N3). (Note: the eigen values are not directly related to the entries in W.) But we can reduce it because of special properties of the Gram matrix spectrum (or range of values in its space) to O(M2N), where M is a constant < N. 29.12.2018

KICA – Kernel ICA The next crucial job is to find minimum C(W) in the space and that W is called de-mixing matrix. Preferably data is whitened (PCA) and W is restricted to be ‘orthogonal’ because de-correlation implies independence. The search for W in this restricted space (called Stiefel manifold) can be done with Riemannian metric suggesting gradient type algorithms. 29.12.2018

KICA – Kernel ICA The problem of local-minima can be solved either using heuristics (instead of random) for selecting initial W. Also it has been shown empirically that a decent number of restarts would solve this problem when large number of samples are available. 29.12.2018

KGV – Kernel Generalized Variance F – Correlation is the ‘smallest’ generalized eigenvalue of KCCA. Idea with KGV is to make use of other values as well. The mutual information contrast function is defined as where 29.12.2018

Simulation Results The results on the simulation data showed that the KICA is better compared to other ICA algorithms like FastICA, Jade, Imax for larger number of ‘components’. The simulation data was mixture of variety of source distributions like subgaussian, supergaussian and nearly gaussian. The KICA is also robust for outliers. 29.12.2018

Simulation Results 29.12.2018

Conclusions This paper proposed novel kernel-based measures for independence. The approach is flexible and computationally demanding (because of additional search in finding eigenvalues). 29.12.2018

Questions!! 29.12.2018