An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室
The Principle of ICA: a cocktail-party problem x 1 (t)=a 11 s 1 (t) +a 12 s 2 (t) +a 13 s 3 (t) x 2 (t)=a 21 s 1 (t) +a 22 s 2 (t) +a 12 s 3 (t) x 3 (t)=a 31 s 1 (t) +a 32 s 2 (t) +a 33 s 3 (t)
Independent Component Analysis Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001)A. HyvärinenJ. KarhunenE. Oja John Wiley & SonsJohn Wiley & Sons. Independent Component Analysis
Central limit theorem The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal=IC1IC2 ICn m1m1 + m 2 ….+ m n toward GaussianNon-Gaussian
Central Limit Theorem Partial sum of a sequence {z i } of independent and identically distributed random variables z i Since mean and variance of x k can grow without bound as k , consider instead of x k the standardized variables The distribution of y k a Gaussian distribution with zero mean and unit variance when k . Partial sum of a sequence {z i } of independent and identically distributed random variables z i
How to estimate ICA model Principle for estimating the model of ICA Maximization of NonGaussianity
Measures for NonGaussianity Kurtosis Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 Kurtosis : E{(x- ) 4 }-3*[E{(x- ) 2 }] 2 kurt(x 1 +x 2 )= kurt(x 1 ) + kurt(x 2 ) kurt( x 1 ) = 4 kurt(x 1 )
Assume measurement Whitening process is zero mean and Then is a whitening matrix xEDVxz T 2 1 Iss }{ T E sxA T EDV 2 1 TTT EEVxxVzz}{}{ EDEDEED TT I TT E xx }{ Let D and E be the eigenvalues and eigenvector matrix of covariance matrix of x, i.e.
Importance of whitening For the whitened data z, find a vector w such that the linear combination y=w T z has maximum nongaussianity under the constrain Maximize | kurt(w T z)| under the simpler constraint that ||w||=1 Then
Constrained Optimization max F(w), ||w|| 2 =1 At the stable point, the gradient of F(w) must point in the direction of w, i.e. equal to w multiplied by a scalar. 0 ),( w w L 0]2[ )( w w w F 1),1()(),( 2 2 wwwwww T FL w w w 2 )(F
Gradient of kurtosis }]){([3}) )()( 224 zwzwzww TTT EEkurtF w wwzw w zwzw w w )(3))(( 1 }}){(3}) )( T T t T TT t T EE F ))((2*3)]()[( wwwwzwz T T t T tt T ]3}][{))[((4 2 3 wwzwzzw TT Ekurtsign T t t T E 1 )( 1 }{yy
Fixed-point algorithm using kurtosis w k+1 = w k + Note that adding the gradient to w k does not change its direction, since Convergence : | |=1 since w k and w k+1 are unit vectors )( wkwk F wkwk = ( + 2 ) )( wkwk F wkwk 2 3 3}][{wwzwz Therefore, w T E www/ w k+1 = w k - ( w k ) 2 = (1- 2 )wkwk
Fixed-point algorithm using kurtosis 1. Centering 2. Whitening 3. Choose m, No. of ICs to estimate. Set counter p 1 4. Choose an initial guess of unit norm for w p, eg. randomly. 5. Let 6. Do deflation decorrelation 7.Let w p w p /||w p || 8.If w p has not converged (| | 1 ), go to step 5. 9.Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration
Fixed-point algorithm using negentropy The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations Need to find a more robust measure for nongaussianity Approximation of negentropy ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10 kurtosis at least equal to 10 4 /1000-3=7 kurtosis : E{x 4 }-3
Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy
w E{zg(w T z)} E{g '(w T z)} w w w/||w|| Fixed-point algorithm using negentropy Convergence : | |=1 Max J(y)
Fixed-point algorithm using negentropy 1. Centering 2. Whitening 3. Choose m, No. of ICs to estimate. Set counter p 1 4. Choose an initial guess of unit norm for w p, eg. randomly. 5. Let 6. Do deflation decorrelation 7.Let w p w p /||w p || 8.If w p has not converged, go back to step 5. 9.Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration
Implantations Create two uniform sources
Implantations Create two uniform sources
Implantations Two mixed observed signals
Implantations Two mixed observed signals
Implantations Centering
Implantations Centering
Implantations Whitening
Implantations Whitening
Implantations Fixed-point iteration using kurtosis
Implantations Fixed-point iteration using kurtosis
Implantations Fixed-point iteration using kurtosis
Implantations Fixed-point iteration using negentropy
Implantations Fixed-point iteration using negentropy
Implantations Fixed-point iteration using negentropy
Implantations Fixed-point iteration using negentropy
Implantations Fixed-point iteration using negentropy
Fixed-point algorithm using negentropy Entropy A Gaussian variable has the largest entropy among all random variables of equal variance Entropy Negentropy Gaussian 0 nonGaussian >0 A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian 0 nonGaussian >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Approximation of negentropy
Fixed-point algorithm using negentropy High-order cumulant approximation It’s quite common that most r.v. have approximately symmetric dist. Replace the polynomial function by any nonpolynomial fun G i ex.G 1 is odd and G 2 is even
Fixed-point algorithm using negentropy According to Lagrange multiplier the gradient must point in the direction of w
Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties. Finding by approximative Newton method Real Newton method is fast. small steps for convergence It requires a matrix inversion at every step. large computational load Special properties of the ICA problem approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method
Fixed-point algorithm using negentropy According to Lagrange multiplier the gradient must point in the direction of w Solve this equation by Newton method JF(w)* w = -F(w) w = [JF(w)] -1 [-F(w)] E{zz T }E{g ' (w T z)} = E{g ' (w T z)} I diagonalize w w [E{zg(w T z)} w] /[E{g ' (w T z)} ] w E{zg(w T z)} E{g ' (w T z)} w w w/||w|| Multiply by -E{g ' (w T z)}
w E{zg(w T z)} E{g ' (w T z)} w w w/||w|| Fixed-point algorithm using negentropy Convergence : | |=1 Because ICs can be defined only up to a multiplication sign