Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015
Agenda The “Cocktail Party” Problem ICA model Principle of ICA Fast ICA algorithm Separate mixed audio signal Reference
Sources Observations s1s1 s2s2 x1x1 x2x2 Purpose: estimate the two original speech signals s 1 (t) and s 2 (t), using only the recorded signals x 1 (t) and x 2 (t) The “Cocktail Party” Problem
Motivation Independent SourcesMixture signal
Motivation Independent Sources Recovered signals
What is ICA? “ Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multi-dimensional) statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and nonGaussian.” A.Hyvarinen, A.Karhunen, E.Oja ‘Independent Component Analysis ’
ICA Model Observe n linear mixtures x 1,…x n of n independent components x j = a j1 s 1 + a j2 s a jn s n, for all j x j: observed random variable s j : independent source variable ICA model: x = As a ij is the entry of A Task: estimate A and s using only the observeable random vector x
ICA Model Two assumptions: 1. The components s i are statistically independent 2. The independent components must have nongaussian distributions.
Why non-Gaussian Assume : 1) s 1 and s 2 are gaussian 2) mixing matrix A is orthogonal Then x 1 and x 2 are gaussian, uncorrelated, and of unit variance. Their joint density is
Why non-Gaussian Since the density is completely symmetric, it does not contain any information on the direction of the columns of the mixing matrix A.
Why non-Gaussian Assume s1 and s2 follow uniform distribution with zero mean and unit variance Mixing matrix A is x=As The edges of the parallelogram are in the direction of the columns of A
Principle of ICA y is a linear combination of s i, with weights given by z i Central Limit Theorem: the distribution of a sum of independent random variables tends toward a guassian distribution, under certain condition. z T s is more gaussian than either of s i. And becomes least gaussian when its equal to one of s i. So we could take w as a vector which maximizes the non-gaussianity of w T x.
Measure of Nongaussianity Entropy (H): degree of information that an observation gives A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy J Computationally difficult
Negentropy approximations In fastICA algorithm, use G is some nonquadratic function. v is a Gaussian variable of zero mean and unit variance. Maximize J(y) to maximize nongaussianity.
Fast ICA Data Preprocessing Centering Whitening Fast ICA algorithm Maximize non gaussianity
Data Preprocessing
Fast ICA Algorithm 1. Choose an initial weight vector w. 2. Let w + = E{xg(w T x)} – E{g ′ (w T x)}w g() is the derivatives of functions G 3. w = w + /||w + ||. (Normalization step) 4. If not converged go back to 2 converged if norm(w new – w old ) < ξ ξ typically around
Separate mixed audio signal
Mixed signals
Separated signals
Separated signals by PCA
Other applications Separation of Artifacts in MEG Data Finding Hidden Factors in Financial Data Reducing Noise in Natural Images Telecommunications
Reference Hyvärinen, A., Karhunen, J., Oja, E.: 2001, Independent Component Analysis: Algorithms and Applications, Wiley, New York. Särelä. "COCKTAIL PARTY PROBLEM." COCKTAIL PARTY PROBLEM. N.p., 20 Apr Web. Dec.-Jan