Presentation is loading. Please wait.

Presentation is loading. Please wait.

Independent Component Analysis & Blind Source Separation

Similar presentations


Presentation on theme: "Independent Component Analysis & Blind Source Separation"— Presentation transcript:

1 Independent Component Analysis & Blind Source Separation
Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

2 Overview Today we learn about
The cocktail party problem - called also ‘blind source separation’ (BSS) Independent Component Analysis (ICA) for solving BSS Other applications of ICA / BSS At an intuitive & introductory & practical level

3 A bit like… ...in the sense of having to find quantities that are not observable directly.

4 Signals, joint density Joint density Signals marginal densities time
Amplitude S1(t) Amplitude S2(t) Analogy with joint,marginal probabilities. In Bayes’ theorem P(A|B) = P(B|A)P(A)/P(B); where P(B|A)P(A) = P(A,B) known as the joint probability of A and B (i.e. probability of both). Marginal probability is P(B), the probability of B occurring by any means. Here these marginal densities are the distributions of probabilities that a signal takes a certain amplitude value at time t, and the joint density is the probability that both do (which, if independent, is just their product). time marginal densities

5 Original signals (hidden sources) s1(t), s2(t), s3(t), s4(t), t=1:T

6 The ICA model s1 s2 s3 s4 x1 x2 x3 x4 a11 a12 a13 a14
xi(t) = ai1*s1(t) ai2*s2(t) ai3*s3(t) ai4*s4(t) Here, i=1:4. In vector-matrix notation, and dropping index t, this is x = A * s i.e. So in the row vector x, xi(t) is the dot product of row i of matrix A with the column vector s. That is, the signal received by the microphones is a linear combination of the sources sj detected by the microphones xi, each source having the weighting aij.

7 This is recorded by the microphones: a linear mixture of the sources
a_ji are parameters (“weightings”) that depend on the distance of microphones from the sound sources. Think of it this way: microphone 1 records a mixture of the different sound sources. How much of each? a11 of source 1, a12 of source 2, a13 of source 3, and a14 of source 4. So the aij are weightings that say how much each source contributes to the signal measured by microphone i, and the total signal measured by each microphone, at each time step t, can then be treated as just the sum of these individual weighted contributions. We call such a combination a linear combination or linear mixture. This is recorded by the microphones: a linear mixture of the sources xi(t) = ai1*s1(t) + ai2*s2(t) + ai3*s3(t) + ai4*s4(t)

8 The Cocktail Party Problem
Also known as the Blind Source Separation (BSS) problem. Ill-posed problem, unless assumptions are made! The most common assumption is that source signals are statistically independent. This means that knowing the value of one of them does not give any information about the others. The methods based on this assumption are called Independent Component Analysis methods. These are statistical techniques of decomposing a complex data set into independent parts. It can be shown that under some reasonable conditions, if the ICA assumption holds, then the source signals can be recovered up to permutation and scaling. Ill-posed = infinitely many possible solutions. “Permutation and scaling”: Permutation; we might get the signals back in a different order. Scaling; we might think some of the signals are louder or quieter than they really are (e.g. because we’ve estimated the weight to be respectively smaller or greater than it really is), or we might have reversed the sign of the signal (e.g. because we’ve taken the weight to be positive when it should be negative). Determine the source signals, given only the mixtures

9 Recovered signals

10 Some further considerations
If we knew the mixing parameters aij then we would just need to solve a linear system of equations. We know neither aij nor si. ICA was initially developed to deal with problems closely related to the cocktail party problem Later it became evident that ICA has many other applications too (e.g. recovering underlying components of brain activity from electrical recordings at different locations of the scalp (EEG signals)). “Linear system of equations” same as “simultaneous equations” that you did for GCSE. There are efficient methods for solving such systems of equations exactly.

11 Illustration of ICA with 2 signals
x2 a1 a2 a2 a1 s1 x1 Original s Mixed signals Joint distribution of the uniformly distributed signals s1 and s2 Joint distribution of the observed mixtures x1 and x2. We assume that both the independent components si and the mixture variables aij are zero mean. NB that if this isn’t the case, then the data can always be centred by subtracting the sample mean of the observed data from each term, which will then make the model zero-mean. We will treat both the mixtures and the independent components as random variables, and so we can interpret each observed value x to be a sample from the corresponding random variable X. Recall that a random variable is a function from the domain of the experiment (sample space) to the real numbers, i.e. one that takes an observation and assigns an output in R such that the distribution of outputs is a function of the observed events. Also note that, cf slide 8, we cannot (knowing neither A nor s) get away from the possible permutation or scaling errors, hence we might as well make life easy for ourselves by assuming all of the signals have a uniform variance of 1 (i.e. E{si^2}=1). This still leaves the sign ambiguity, we could multiply the IC’s by -1 without affecting the model, but fortunately this ambiguity turns out not be significant in most applications. NB in the mixture that the xi are not independent, since if e.g. x1 attains its maximum this completely determines the value of x2, which was not the case with the original signals. Also note that for uniformly distributed signals s1,s2 we can recover the mixing matrix by means of looking at the directions of the edges of the parallelogram (although this wouldn’t work for non-uniform signals).

12 Illustration of ICA with 2 signals
x2 a1 a2 a2 a1 x1 Mixed signals Step2: Rotation Step1: Sphering

13 Illustration of ICA with 2 signals
x2 a1 a2 a2 a1 s1 x1 Original s Mixed signals Step2: Rotatation Step1: Sphering

14 Excluded case There is one case when rotation doesn’t matter. This case cannot be solved by basic ICA. Example of non-Gaussian density (-) vs.Gaussian (-.) Seek non-Gaussian sources for two reasons: * identifiability * interestingness: Gaussians are not interesting since the superposition of independent sources tends to be Gaussian …when both densities are Gaussian

15 Computing the pre-processing steps for ICA
0) Centring = make the signals centred in zero xi  xi - E[xi] for each i 1) Sphering = make the signals uncorrelated. i.e. apply a transform V to x such that Cov(Vx)=I // where Cov(y)=E[yyT] denotes covariance matrix V=E[xxT]-1/2 // can be done using ‘sqrtm’ function in MatLab xVx // for all t (indexes t dropped here) // bold lowercase refers to column vector; bold upper to matrix Scope: to make the remaining computations simpler. It is known that independent variables must be uncorrelated – so this can be fulfilled before proceeding to the full ICA. This process gives rise to a new mixing matrix, and the new mixing matrix is orthogonal (i.e. the bases are orthogonal), which ensures that only up to n(n-1)/2 parameters now need to be estimated, as opposed to up to n^2 without whitening.

16 Computing the rotation step
Aapo Hyvarinen (97) Computing the rotation step This is based on an the maximisation of an objective function G(.) which contains an approximate non-Gaussianity measure. Fixed Point Algorithm Input: X Random init of W Iterate until convergence: Output: W, S where g(.) is derivative of G(.), W is the rotation transform sought Λ is Lagrange multiplier to enforce that W is an orthogonal transform i.e. a rotation Solve by fixed point iterations The effect of Λ is an orthogonal de-correlation The overall transform then to take X back to S is (WTV) There are several g(.) options, each will work best in special cases. See FastICA sw / tut for details. The idea is to maximise the non-Gaussianity of S=WTX, since this then gives us one of the independent components (signals).

17 Application domains of ICA
Blind source separation (Bell&Sejnowski, Te won Lee, Girolami, Hyvarinen, etc.) Image denoising (Hyvarinen) Medical signal processing – fMRI, ECG, EEG (Mackeig) Modelling of the hippocampus and visual cortex (Lorincz, Hyvarinen) Feature extraction, face recognition (Marni Bartlett) Compression, redundancy reduction Watermarking (D Lowe) Clustering (Girolami, Kolenda) Time series analysis (Back, Valpola) Topic extraction (Kolenda, Bingham, Kaban) Scientific Data Mining (Kaban, etc)

18 Image denoising Noisy image Original image Wiener filtering
ICA filtering

19 Clustering In multi-variate data search for the direction along which the projection of the data is maximally non-Gaussian = has the most ‘structure’

20 Blind Separation of Information from Galaxy Spectra

21 Decomposition using Physical Models
Decomposition using ICA

22 Summing Up Assumption that the data consists of unknown components
Individual signals in a mix topics in a text corpus basis-galaxies Trying to solve the inverse problem: Observing the superposition only Recover components Components often give simpler, clearer view of the data

23 Related resources Demo and links to further info on ICA. ICA software in MatLab. Comprehensive tutorial paper, slightly more technical.


Download ppt "Independent Component Analysis & Blind Source Separation"

Similar presentations


Ads by Google