Independent Components Analysis with the JADE algorithm

Slides:



Advertisements
Similar presentations
Independent Component Analysis
Advertisements

Independent Component Analysis: The Fast ICA algorithm
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Independent Component Analysis & Blind Source Separation
Lecture 7: Principal component analysis (PCA)
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
An introduction to Principal Component Analysis (PCA)
REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.
Principal Component Analysis
Independent Component Analysis (ICA)
Factor Analysis Purpose of Factor Analysis
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Dimensional reduction, PCA
Independent Component Analysis & Blind Source Separation Ata Kaban The University of Birmingham.
Face Recognition Jeremy Wyatt.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室.
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
Bayesian belief networks 2. PCA and ICA
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Modern Navigation Thomas Herring
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Multidimensional Data Analysis : the Blind Source Separation problem. Outline : Blind Source Separation Linear mixture model Principal Component Analysis.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Hongyan Li, Huakui Wang, Baojin Xiao College of Information Engineering of Taiyuan University of Technology 8th International Conference on Signal Processing.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 2) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
Discriminant Analysis
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Independent Component Analysis Independent Component Analysis.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Object Orie’d Data Analysis, Last Time
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
HST.582J/6.555J/16.456J Gari D. Clifford Associate Director, Centre for Doctoral Training, IBME, University of Oxford
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
LECTURE 11: Advanced Discriminant Analysis
LECTURE 10: DISCRIMINANT ANALYSIS
Brain Electrophysiological Signal Processing: Preprocessing
Application of Independent Component Analysis (ICA) to Beam Diagnosis
PCA vs ICA vs LDA.
Blind Source Separation: PCA & ICA
Presented by Nagesh Adluru
A Fast Fixed-Point Algorithm for Independent Component Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Presentation transcript:

Independent Components Analysis with the JADE algorithm Douglas N. Rutledge, Delphine Jouan-Rimbaud Bouveresse douglas.rutledge@agroparistech.fr delphine.bouveresse@agroparistech.fr

Multivariate Data Analysis Variables Samples (objects) (individuals) Column Vectors Line Vectors

A matrix as points in space 1.00 -1.00 -1.00 -0.50 1.00 -1.00 1.00 1.00 -1.00 2.00 -0.50 1.00 -1 1 2 -0.5 0.5 0.0

Principal Components Analysis Pearson, 1901

Principal Components Analysis Samples projected onto space defined by variables - if NO information → spherical distribution → NO preferred axes

Principal Components Analysis Samples projected onto space defined by variables - if information → non-spherical distribution → preferred axes PCA

“Blind Source Separation” Variables Observed Sensor Signals Row Vectors

Independent Components Analysis Principal Components Analysis (PCA) data matrix points in the multivariate space Independent Components Analysis (ICA) data matrix a set of observed signals, where each observed sensor signal is the weighted sum of pure source signals the weighting coefficients are proportions of the source signals x1 = a11*s1 + a12*s2 x2 = a21*s1 + a22*s2 … xn = an1*s1 + an2*s2 In matrix notation : X = A*S

Two Independent Sources Example Two Independent Sources Two mixtures aij ... Proportion of source signal sj in observed signal xi

The objective of ICA is to find "physically significant" vectors. Hypotheses : 1) No reason for the variations in one pure signal to depend in any way on the variations in another pure signal Pure signal sources should therefore be « independent » 2) The measured signals being combinations of several independants sources, they should be more gaussian than the sources (Central Limit Theorem)

“Nongaussian is independent”: Central Limit Theorem The measured signals being combinations of several independants sources, they should be more gaussian than the sources Idea of ICA is to search for sources the least gaussian possible Need to find a criterion to define the « gaussianity » of a signals : Kurtosis Negentropy Mutual information Complexity …

ICA Principal (Non-Gaussian is Independent) Key to estimating A is non-gaussianity The distribution of a sum of independent random variables tends toward a Gaussian distribution (by CLT) f(s1) f(s2) f(x1) = f(s1 +s2) Where w is one of the rows of matrix W : y is a linear combination of si, with weights given by zi. Since sum of two independent r.v. is more gaussian than individual r.v., so zTs is more gaussian than either si zTs becomes least gaussian when equal to one of si So we take w as a vector which maximizes the non-gaussianity of wTx .

Measures of Non-Gaussianity Quantitative measure of non-gaussianity for ICA estimation Kurtosis : gauss=0 (sensitive to outliers) Entropy : gauss=largest Neg-entropy : gauss = 0 (difficult to estimate) Approximations where v is a standard gaussian random variable and :

ICA : - decomposition of a set of vectors into linear components ICA : - decomposition of a set of vectors into linear components that are “as independent as possible” “Independence” : - goes beyond (second-order) decorrelation involves the non-Gaussianity of the data

“Independent” versus “non-correlated”

ICA attempts to recover the original signals by estimating a linear transformation, using a criterion that measures statistical independence among the sources This may be achieved by the use of higher-order information that can be extracted from the densities of the data PCA does not really look for (and usually does not find) components with physical reality Why not ? - PCA finds “direction of greatest dispersion of the samples” - this is the wrong direction for “signal separation” !

ICA demo (mixtures)

ICA demo (whitened)

ICA demo (step 1)

ICA demo (step 2)

ICA demo (step 3)

ICA demo (step 4)

ICA demo (step 5 - end)

A simulated example Pure source signals Observed sensor signals + Gaussian Noise Histograms of source signals Histograms

Initial data matrix Matrix rows Histograms

PCA applied to the example Loadings Histograms

PCA applied to the example

ICA applied to the example Source Signals Histograms

ICA applied to the example

Procedure ICA calculates a demixing matrix, W W approximates A-1 , the inverse mixing matrix The pure component signals are recovered from the measured mixed signals by : S = W*X The trick is to calculate W when we only have X !

Calculating the demixing matrix Many algorithms available : FastICA : numerical gradient search Projection Pursuit : interestingly structured vectors Complexity Pursuit : minimise common information JADE : based on “cumulants” ….

(Joint Approximate Diagonalization of Eigenmatrices) JADE (Joint Approximate Diagonalization of Eigenmatrices) JADE was developed by Cardoso et al. in 1993 [1]. A blind source separation method to extract independent non-Gaussian sources from signal mixtures with Gaussian noise Based on the construction of a fourth-order cumulant array from the data It is freely downloadable from http://perso.telecom-paristech.fr/~cardoso/Algo/Jade/jadeR.m [1] Cardoso and Souloumiac, IEE proceedings-F 140 (6) (1993), 362-370

Cumulants For a set of values : x 2° order auto-cumulant : s2(x) = Cum2{x, x} = E{x2} 4° order auto-cumulant kx(x) = Cum4{x, x, x, x} = E{x4} − 3.E2{x2}

Cumulants Cumulants can be expressed in tensor form. A tensor can be seen as a generalization of the notion of matrices. Cumulant tensors are a generalization of covariance matrices. The main diagonal of cumulants tensors are auto-cumulants and non-diagonal elements are cross-cumulants. Statistically mutually independent vectors (components) give : - null cross-cumulants - maximum auto-cumulants

Cumulants In PCA : The Eigenvalue decomposition of a covariance matrix results in its diagonalization, i.e. the cancellation of its non-diagonal terms. In JADE : A generalization of this methodology was proposed by Cardoso and Souloumiac to diagonalise fourth-order cumulants tensors.

Cumulants In the simple case of 3 observed signal vectors : x1, x2, x3 and 2 pure source signals : s1, s2 where : x1 = a11. s1 + a12. s2 x2 = a21. s1 + a22. s2 x3 = a31. s1 + a32. s2 k(x1) = Cum4{x1, x1, x1, x1} = E{x14} − 3.E2{x12} k(x1) = a114 . [E{s14} − 3.E2{s12}] + a124 . [E{s24} − 3.E2{s22}] k(x1) = a114 . k(s1) + a124 . k(s2)

4th-order Cumulants For the 3 observed signal vectors : x1, x2, x3 Create a four-way array with dimensions [3, 3, 3, 3] Each of the 81 elements is the mean of products of the vectors. For vi, vj, vk, vl = x1, x2, x3 Cum4{vi, vj, vk, vl} = E{vivjvkvl} − E{vivj} . E{vkvl} − E{vivk} . E{vjvl} − E{vivl} . E{vjvk}

3 Auto-Cumulants 3 different values For vi, vj, vk, vl = xp (p=1:3) Cum4{p, p, p, p} = E{xp4} − E{xp2} . E{xp2} − E{xp2} . E{xp2} − E{xp2} . E{xp2} Cum4{p, p, p, p} = mean (xp4) − mean (xp2) . mean (xp2) − mean (xp2) . mean (xp2) − mean (xp2) . mean (xp2) Cum4{p, p, p, p} = mean (xp4) − 3

18 Even Cross-Cumulants 3 different values For vi, vj = xp et vk, vl = xq (p=1:3, q=1:3, p≠q ) Cum4{p, p, q, q} = E{xp2xq2} − E{xp2} . E{xq2} − E{xpxq} . E{xpxq} − E{xpxq} . E{xpxq} Cum4{p, p, q, q} = mean (xp2xq2) − mean (xp2) . mean (xp2) − mean (xpxq) . mean (xpxq) − mean (xpxq) . mean (xpxq) Cum4{p, p, q, q} = mean (xp2xq2) − 1

60 Odd Cross-Cumulants 9 different values For vi= xp et vj, vk, vl = xq (p=1:3, q=1:3, p≠q) Cum4{p, q, q, q} = E{xpxq3} − E{xpxq} . E{xq2} − E{xpxq} . E{xq2} − E{xpxq} . E{xq2} Cum4{p, q, q, q} = mean (xpxq3) − mean (xpxq) . mean (xp2) − mean (xpxq) . mean (xq2) − mean (xpxq) . mean (xq2) Cum4{p, q, q, q} = mean (xpxq3) − 0

Initial X matrix Scores of first PCs of X Smaller, whitened X matrix Scaled Scores of X

4-way tensor of cumulants

Initial orthogonal matrices to project cumulants

Eigenmatrices of cumulants

JADE Joint diagonalization using the Jacobi algorithm, based on Givens rotation matrices : Minimize the sum of squares of the “off-diagonal" 4th-order cumulants (i.e. the cumulants between different signals)

Diagonalised matrices

     The JADE algorithm: a multi-step procedure

 Whitening (sphering) of the original data matrix X If X is very wide or tall, it can be replaced by the loadings obtained from Kernel-PCA, or by using (segmented-) PCT   Whitening (sphering) of the original data matrix X Reduce the data dimensionality Orthogonalize the data Normalise the data on all dimensions

  Cumulants computation Cumulants : Generalization of moments such as the mean (1st-order auto-cumulant) and the variance (2nd-order auto-cumulant) to statistics of order > 2: Independent signal vectors have : null 4th-order cross-cumulants and maximal auto-cumulants Find rotations of Pw so that loadings vectors become independent

  Decomposition of the cumulant tensor Generalization to higher-order statistics of the eigenvalue decomposition of the variance-covariance matrix: Decompose the 4th-order tensor into a set of n(n+1)/2 orthogonal eigenmatrices Mi Project the cumulant tensor onto these planes

 Joint Diagonalization of eigenmatrices (based on the Jacobi algorithm)

 Calculate the vector of proportions

Summary of JADE (x S-1) V SVD X U S B Scale Calculate Cumulants Decompose Tensor M B x X Diagonalise (Rotate) M M* R

JADE B Rotate Scores R W S W X Calculate Signals X Calculate Prop’ns S -1 A

Advantages of ICA Extracted vectors have a physico-chemical meaning ICA can be used to eliminate artefacts ICA can be applied to all kinds of signals, including multi-way signals: - Unfold the signals - Calculate the ICA model on the matrix of one-dimensional signals - Fold the ICs back

Difficulties with ICA Different results may be obtained using different algorithms Different results may be produced as a function of the number of Independent Components (ICs) extracted Determination of the number of ICs

Links 1 Feature extraction (Images, Video) Aapo Hyvarinen: ICA (1999) http://hlab.phys.rug.nl/demos/ica/ Aapo Hyvarinen: ICA (1999) http://www.cis.hut.fi/aapo/papers/NCS99web/node11.html ICA demo step-by-step http://www.cis.hut.fi/projects/ica/icademo/ Lots of links http://sound.media.mit.edu/~paris/ica.html

Links 2 Object-based audio capture demos http://www.media.mit.edu/~westner/sepdemo.html Demo for BBS with „CoBliSS“ (wav-files) http://www.esp.ele.tue.nl/onderzoek/daniels/BSS.html Tomas Zeman‘s page on BSS research http://ica.fun-thom.misto.cz/page3.html

Links 3 An efficient batch algorithm: JADE http://www-sig.enst.fr/~cardoso/guidesepsou.html Dr JV Stone: ICA and Temporal Predictability http://www.shef.ac.uk/~pc1jvs/

Links 4 Information for scientists, engineers and industrialists http://www.cnl.salk.edu/~tewon/ica_cnl.html FastICA package for matlab http://www.cis.hut.fi/projects/ica/fastica/fp.shtml Aapo Hyvärinen http://www.cis.hut.fi/~aapo/ Erkki Oja http://www.cis.hut.fi/~oja/

Conclusion PCA does not look for (and usually does not find) components with direct physical meaning ICA tries to recover the original signals by estimating a linear transformation, using a criterion that measures statistical independence among the sources This is done using higher-order information that can be extracted from the densities of the data ICA can be applied to all types of data, including multi-way data D.N. Rutledge, D. Jouan-Rimbaud Bouveresse, Independent Components Analysis with the JADE algorithm Trends in Analytical Chemistry, 50, (2013) 22–32