ICA and PCA 學生:周節 教授:王聖智 教授
Outline Introduction PCA ICA Reference
Introduction Why are these methods ? A: For computational and conceptual simplicity. And it is more convenient to analysis. What are these methods ? A: The “representation” is often sought as a linear transformation of the original data. Well-known linear transformation methods. Ex: PCA, ICA, factor analysis, projection pursuit………….
What is PCA? Principal Component Analysis It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Reducing the number of dimensions
example X Y Original data
example X Y (1)Get some data and subtract the mean
example eigenvectors = eigenvalues = (2)Get the covariance matrix Covariance= (3)Get their eigenvectors & eigenvalues
example eigenvectors
Example (4)Choosing components and forming a feature vector eigenvectors eigenvalues A B B is bigger!
Example Then we choose two feature vector sets: (a) A+B ( feature vector_1) (b) Only B (Principal Component ) ( feature vector_2 ) Modified_data = feature_vector * old_data
example X Y (a)feature vector_1
example
x (b)feature vector_2
Example (5)Deriving the new data set from feature vector (a)feature vector_1 (b)feature vector_2 New_data = feature_vector_transpose * Modified_data
example X Y (a)feature vector_1
example X Y (b)feature vector_2
example
Sum Up 可以降低資料維度 資料要有相關性比較適合使用 幾何意義:投影到主向量上
What is ICA? Independent Component Analysis For separating the blind or unknown sources Start with “A cocktail-party problem”
ICA The Principle of ICA: A cocktail-party problem x 1 (t)=a 11 s 1 (t) +a 12 s 2 (t) +a 13 s 3 (t) x 2 (t)=a 21 s 1 (t) +a 22 s 2 (t) +a 12 s 3 (t) x 3 (t)=a 31 s 1 (t) +a 32 s 2 (t) +a 33 s 3 (t)
ICA X1 X2 X3 Linear Transformation S1 S2 S3
Math model Given x1(t),x2(t),x3(t) Want to find s1(t), s2(t), s3(t) x 1 (t)=a 11 s 1 (t) +a 12 s 2 (t) +a 13 s 3 (t) x 2 (t)=a 21 s 1 (t) +a 22 s 2 (t) +a 12 s 3 (t) x 3 (t)=a 31 s 1 (t) +a 32 s 2 (t) +a 33 s 3 (t) X=AS
Math model Because A,S are Unknown We need some assumption (1) S is statistical independent (2) S is nongaussian distributions Goal : Find a W such that S=WX X=AS
Theorem Using Central limit theorem The distribution of a sum of independent random variables tends toward a Gaussian distribution Observed signal=S1S2 Sn a1a1 + a 2 ….+ a n toward GaussianNon-Gaussian
Theorem Given x = As Let y = w T x z = A T w => y = w T As = z T s =S1S2 Sn z1z1 + z 2 ….+ z n toward GaussianNon-Gaussian Observed signal=X1X2 Xn w1w1 + w 2 ….+ w n
Theorem Find a w such that Maximization of NonGaussianity of y = w T x But how to measure NonGaussianity ? Y=X1X2 Xn w1w1 + w 2 ….+ w n
Theorem Measures of nongaussianity Kurtosis: As y toward to gaussian, F(y) is much closer to zero !!! F(y) = E{ (y) 4 } - 3*[ E{ (y) 2 } ] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0
Steps (1) centering & whitening process (2) FastICA algorithm
Steps X1 X2 X3 Linear Transformation S1 S2 S3 FastICA S1 S2 S3 X1 X2 X3 centering & whitening Z1 Z2 Z3 Correlateduncorrelatedindependent
example Original data
example (1) centering & whitening process
example (2) FastICA algorithm
example (2) FastICA algorithm
Sum up 能讓成份間的統計相關性 (statistical dependent) 達到最小的線性轉換方法 可以解決未知訊號分解的問題 ( Blind Source Separation )
Reference “A tutorial on Principal Components Analysis”, Lindsay I Smith, February 26, 2002 “Independent Component Analysis : Algorithms and Applications “, Aapo Hyvärinen and Erkki Oja, Neural Networks Research Centre Helsinki University of Technology
centering & Whitening process Let is zero mean Then is a whitening matrix xEDVxz T 2 1 sxA T EDV 2 1 TTT EEVxxVzz}{}{ EDEDEED TT I TT E xx }{ Let D and E be the eigenvalues and eigenvector matrix of covariance matrix of x, i.e.
For the whitened data z, find a vector w such that the linear combination y=w T z has maximum nongaussianity under the constrain Maximize | kurt(w T z)| under the simpler constraint that ||w||=1 Then centering & Whitening process
FastICA 1. Centering 2. Whitening 3. Choose m, No. of ICs to estimate. Set counter p 1 4. Choose an initial guess of unit norm for w p, eg. randomly. 5. Let 6. Do deflation decorrelation 7.Let w p w p /||w p || 8.If w p has not converged (| | 1 ), go to step 5. 9.Set p p+1. If p m, go back to step 4.