Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.

Slides:



Advertisements
Similar presentations
Independent Component Analysis
Advertisements

Independent Component Analysis: The Fast ICA algorithm
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence (Fri)
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Independent Component Analysis & Blind Source Separation
REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.
Independent Component Analysis (ICA)
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
MACHINE LEARNING - Doctoral Class - EDIC EPFL A.. Billard MACHINE LEARNING Information Theory and The Neuron - II Aude.
Dimensional reduction, PCA
Independent Component Analysis & Blind Source Separation Ata Kaban The University of Birmingham.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Newton Method for the ICA Mixture Model
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Independent Component Analysis (ICA) and Factor Analysis (FA)
An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室.
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
Bayesian belief networks 2. PCA and ICA
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Adaptive Signal Processing
Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
© APT 2006 ICA And Hedge Fund Returns Dr. Andrew Robinson APT Program Trading Techniques and Financial Models for Hedge Funds June 27 th, 2007.
Summarized by Soo-Jin Kim
Independent Components Analysis with the JADE algorithm
Principal Component Analysis and Independent Component Analysis in Neural Networks David Gleich CS 152 – Neural Networks 11 December 2003.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Hongyan Li, Huakui Wang, Baojin Xiao College of Information Engineering of Taiyuan University of Technology 8th International Conference on Signal Processing.
Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Modern Navigation Thomas Herring
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
An Introduction to Blind Source Separation Kenny Hild Sept. 19, 2001.
Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.
Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Lecture 2: Statistical learning primer for biologists
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Independent Component Analysis Independent Component Analysis.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Object Orie’d Data Analysis, Last Time
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
HST.582J/6.555J/16.456J Gari D. Clifford Associate Director, Centre for Doctoral Training, IBME, University of Oxford
Computacion Inteligente Least-Square Methods for System Identification.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Lectures 15: Principal Component Analysis (PCA) and
LECTURE 11: Advanced Discriminant Analysis
Machine Learning Independent Component Analysis Supervised Learning
Brain Electrophysiological Signal Processing: Preprocessing
PCA vs ICA vs LDA.
Bayesian belief networks 2. PCA and ICA
Blind Source Separation: PCA & ICA
Presented by Nagesh Adluru
EE513 Audio Signals and Systems
A Fast Fixed-Point Algorithm for Independent Component Analysis
Presentation transcript:

Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction

Outline Introduction  History, Motivation and Problem Formulation Algorithms  Stochastic Gradient Algorithm  FastICA  Ordering Algorithm Applications Concluding Remark

Introduction There has been a wide discussion about the application of Independence Component Analysis (ICA) in Signal Processing, Neural Computation and Finance, first introduced as a novel tool to separate blind sources in a mixed signal. The Basic idea of ICA is to reconstruct from observation sequences the hypothesized independent original sequences.

ICA versus PCA Similarity  Feature extraction  Dimension reduction Difference  PCA uses up to second order moment of the data to produce uncorrelated components  ICA strives to generate components as independent as possible

Motivation - Blind Source Separation Suppose that there are k unknown independent sources A data vector x(t) is observed at each time point t, such that where A is a full rank scalar matrix

Mixing process A Blind source separation Independent components … Blind Source De-mixing process W … Observed sequences Recovered independent components

Problem formulation The goal of ICA is to find a linear mapping W such that the unmixed sequences u are maximally statistically independent Find some where C is a diagonal matrix and P is a permutation matrix.

Principle of ICA: Nongaussianity The fundamental restriction in ICA is that the independent components must be nongaussian for ICA to be possible. This is because gaussianity is invariant under orthogonal transformation and hence make the matrix A not identifiable for gaussian independent components.

Measures of nongaussianity (1) Kurtosis   Kurtosis can be very sensitive to outliers, when its value has to be estimate from a measured sample.

Measures of nongaussianity (2) Negentropy  A guassian variable has the largest entropy among all random variables of equal variance.  Definition: where is entropy and y gauss is a gaussian random variable of the same covariance matrix as y

Measures of nongaussianity (3) Mutual information  Definition:  Mutual information is a natural measure of the dependence between random variables.  It is always non-negative, and zero if and only if the variables are statistically independent.

Relation between negentropy and Mutual Information If we constrain y i to be uncorrelated and of unit variance where C is a constant that does not depend on W. This shows that finding an invertible transformation W that minimizes the mutual information is equivalent to finding directions in which the negentropy is maximized.

Algorithms Maximum likelihood Bell and Sejnowski (1995)  Maximum entropy  Minimum mutual information Low-Complexity Coding and Decoding (LOCOCODE ) Sepp Hochreiter et al. (1998) Neuro-mimetic approach

Maximum Likelihood The log-likelihood is: where the f i are the density functions of the s i Connection to mutual information: if the f i were equal to the true distributions of

Stochastic Gradient Algorithm Initialize the weight matrix W Iteration:  where is the learning rate, g is a nonlinear function, e.g.  Repeat until converges to The ICAs are the components of

FastICA - Preprocessing Centering:  Make the x-s mean 0 variables Whitening  Transform the observed vector x linearly so that it has unit variance:  One can show that: where

FastICA algorithm Initialize the weight matrix W Iteration:  where  Repeat until convergence The ICAs are the components of

Ordering of the ICAs Unlike PCA which has well-defined and intuitive explanation of the ordering of its components, i.e. the eigen values of its covariance matrix, ICA, however, deserves further investigation on this particular problem since a particular kind of ordering is not readily at hand. Follow a heuristic scheme called: testing-and- acceptance (TNA)

Ordering Algorithm

Applications (1) Feature extraction: Recognize the pattern of excess returns of Mutual Funds in the financial market of China Data: the time series of excess returns of four mutual funds in the financial market of China

ICA components

ICA reconstruction

Applications (2) Image de-noising  ICA  Sparse Code Shrinkage The example is exacted from (Hyvarinen, 1999).

Image de-noising (1) Suppose a noisy image model holds: where n is uncorrelated noise. where W is an orthogonal matrix that is the best orthogonal approximation of the inverse of the ICA mixing matrix.

Image de-noising (1) Sparse code shrinkage transformation: Function g(.) is zero close to the origin and linear after a cutting value depending on the parameters of the Laplacian density and the Gaussian noise density.

1. Original image2. Corrupted with noise 3. Recover by ICA and Sparse Code Shrinkage 3. Recover by classical wiener filtering

Concluding Remarks ICA is a very flexible and widely-applicable tool which searches the linear transformation of the observed data into statistically maximally independent components It is also interesting to note that the methods to compute ICA: maximum negentropy, minimum Mutual Information, maximum likelihood are equivalent to each other (at least in the statistical sense). There is also resemblance between the forms of the gradient descent (Newton Raphson) algorithm and the FastICA algorithm. Other application prospects: audio (signal) processing, image processing, telecommunication, Finance, Education

References [1] Amari, S., Cichocki, A., and Yang, H. (1996). A New Learning Algorithm for Blind Signal Separation, Advances in Neural Information Processing Systems 8, pages [2] Bell, A. J. and Sejnowski, T. J. (1995). An Information- Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7: [3] Cardoso, J. and Soloumiac, A. (1993). Blind beamforming for non-Gaussian signals. IEEE Proceedings-F, 140(46): [4] Chatfield, C. (1989). Analysis of Time Series: An Introduction, Fourth Edition. London: Chapman and Hall.

References continued [5] Moulines, E., Cardoso, J.-F., and Cassiat, E. (1997). Maximum likelihood for blind separation and deconvolution of noisy signals using mixture models. Proc. ICASSP’ 97, volume 5, pages , Munich. [6] Nadal, J.-P. and Parga, N. (1997). Redundancy reduction and independent component analysis: Conditions on cumulants and adaptive approaches. Neural Computation, 9: [7] Xu, L., Cheung, C., Yang, H., and Amari, S. (1997). Maximum equalization by entropy maximization and mixture of cumulative distribution functions. Proc. Of ICNN’97, pages , Houston [8] Yang, H., Amari, S., and Cichocki, A. (1997). Information back- propagation for blind separation of sources from non-linear mixtures. Proc. of ICNN, pages , Houston