Principal Component Analysis and Independent Component Analysis in Neural Networks David Gleich CS 152 – Neural Networks 11 December 2003.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Independent Component Analysis
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Subspace and Kernel Methods April 2004 Seong-Wook Joo.
Principal Component Analysis
Independent Component Analysis (ICA)
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
3D Geometry for Computer Graphics. 2 The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Factor Analysis Purpose of Factor Analysis
MACHINE LEARNING - Doctoral Class - EDIC EPFL A.. Billard MACHINE LEARNING Information Theory and The Neuron - II Aude.
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
3D Geometry for Computer Graphics
Bayesian belief networks 2. PCA and ICA
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Unsupervised learning
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Introduction to Adaptive Digital Filters Algorithms
Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”,
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
MTH 161: Introduction To Statistics
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Unsupervised learning
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Image cryptosystems based on PottsNICA algorithms Meng-Hong Chen Jiann-Ming Wu Department of Applied Mathematics National Donghwa University.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Independent Component Analysis Independent Component Analysis.
Feature Selection and Extraction Michael J. Watts
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
HST.582J/6.555J/16.456J Gari D. Clifford Associate Director, Centre for Doctoral Training, IBME, University of Oxford
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Principal Component Analysis
LECTURE 11: Advanced Discriminant Analysis
ROBUST SUBSPACE LEARNING FOR VISION AND GRAPHICS
Principle Component Analysis (PCA) Networks (§ 5.8)
Brain Electrophysiological Signal Processing: Preprocessing
PCA vs ICA vs LDA.
Blind Signal Separation using Principal Components Analysis
Unsupervised learning
A principled way to principal components analysis
Bayesian belief networks 2. PCA and ICA
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Descriptive Statistics vs. Factor Analysis
Equivariant Adaptive and Blind Source Separation
A Fast Fixed-Point Algorithm for Independent Component Analysis
Feature Selection Methods
Feature Extraction (I)
Presentation transcript:

Principal Component Analysis and Independent Component Analysis in Neural Networks David Gleich CS 152 – Neural Networks 11 December 2003

Outline Review PCA and ICA Neural PCA Models Neural ICA Models Experiments and Results Implementations Summary/Conclusion Questions

Principal Component Analysis PCA identifies an m dimensional explanation of n dimensional data where m < n. Originated as a statistical analysis technique. PCA attempts to minimize the reconstruction error under the following restrictions –Linear Reconstruction –Orthogonal Factors Equivalently, PCA attempts to maximize variance.

Independent Component Analysis Also known as Blind Source Separation. Proposed for neuromimetic hardware in 1983 by Herault and Jutten. ICA seeks components that are independent in the statistical sense. Two variables x, y are statistically independent iff P(x Å y) = P(x)P(y). Equivalently, E{g(x)h(y)} – E{g(x)}E{h(y)} = 0 where g and h are any functions.

Independent Component Analysis Given m signals of length n, construct the data matrix We assume that X consists of m sources such that X = AS where A is an unknown m by m mixing matrix and S is m independent sources.

Independent Component Analysis ICA seeks to determine a matrix W such that Y = BX where W is an m by m matrix and Y is the set of independent source signals, i.e. the independent components. B ¼ A -1 ) Y = A -1 AX = X Note that the components need not be orthogonal, but that the reconstruction is still linear.

PCA with Neural Networks Most PCA Neural Networks use some form of Hebbian learning. “Adjust the strength of the connection between units A and B in proportion to the product of their simultaneous activations.” w k+1 = w k +  k (y k x k ) Applied directly, this equation is unstable. ||w k || 2 ! 1 as k ! 1 Important Note: neural PCA algorithms are unsupervised.

PCA with Neural Networks Another fix: Oja’s rule. Proposed in 1982 by Oja and Karhunen. w k+1 = w k +  k (y k x k – y k 2 w k ) This is a linearized version of the normalized Hebbian rule. Convergence, as k ! 1, w k ! e 1. x1x1 x2x2 x3x3 xnxn + y w1w1 w2w2 w3w3 wnwn

PCA with Neural Networks Generalized Hebbian Algorithm y = Wx w ij,k = k y ik x jk -  k y ik  l≤i y lk w lj,k W =  k yx T -  k W LT(yy T ) x1x1 x2x2 x3x3 xnxn y1y1 y2y2 ymym

PCA with Neural Networks APEX Model: Kung and Diamantaras y = Wx – Cy, y = (I+C) -1 Wx ¼ (I-C)Wx x1x1 x2x2 x3x3 xnxn y1y1 y2y2 ymym cmcm c2c2

PCA with Neural Networks APEX Learning Properties of APEX model: –Exact principal components –Local updates, w ab only depends on x a, x b, w ab –“-Cy” acts as an orthogonalization term

Neural ICA ICA is typically posed as an optimization problem. Many iterative solutions to optimization problems can be cast into a neural network.

Feed-Forward Neural ICA General Network Structure 1.Learn B such that y = Bx has independent components. 2.Learn Q which minimizes the mean squared error reconstruction. xx’ y BQ

Feed-Forward Neural ICA General Network Structure 1.Learn B such that y = Bx has independent components. x y B

Neural ICA Herault-Jutten: local updates B = (I+S) -1 S k+1 = S k +  k g(y k )h(y k T ) g = t, h = t 3 ; g = hardlim, h = tansig Bell and Sejnowski: information theory B k+1 = B k +  k [B k -T + z k x k T ] z(i) = /u(i) u(i)/y(i) u = f(Bx); f = tansig, etc.

Neural ICA EASI (Equivariant Adaptive Separation via Independence): Cardoso et al B k+1 =B k - k [y k y k T –I+g(y k )h(y k ) T -(y k )g(y k ) T ]B k g=t, h=tansig(t)

Oja’s Rule in Action Matlab Demo!

PCA Error Plots – Close Eigenvalues Final APEX error: Final GHA error: Minimum error:

PCA Error Plots – Separate Eigenvalues Final APEX error: Final GHA error: Minimum error:

Observations Why just the PCA subspace? –Bad Oja’s example – Matlab demo! Why can APEX fail? –“-Cy” term continually orthogonalizes

ICA Image Example

Bell and Sejnowski EASI

Real World Data 8 channel ECG from pregnant mother

Implementation Matlab sources for each neural algorithm, and supporting functions –PCA: oja.m; gha.m; apex.m –ICA: hj.m; easi.m; bsica.m –Evaluation: pcaerr.m –Demos: fetal_plots.m; pca_error_plots.m; ojademo.m; ica_image_demo.m –Data generators: evalpca.m; evalica.m

Future Work Investigate adaptive learning and changing separations –Briefly examined this; the published paper on ICA adjusted the learning rate correctly, but didn’t converge to a separation. Convergence of PCA algorithms based on eigenvalue distribution? Use APEX to compute “more” components than desired?

Summary Neural PCA: –Algorithms work “sometimes.” They tend to find the PCA subspace rather than the exact PCA components –GHA is better than APEX Neural ICA: –Algorithms converge to something reasonable. –Bell and Sejnowski’s ICA possibly superior to EASI.

Questions?

References Diamantras, K.I. and S. Y. Kung. Principal Component Neural Networks. Comon, P. “Independent Component Analysis, a new concept?” In Signal Processing, vol. 36, pp , FastICA, Oursland, A., J. D. Paula, and N. Mahmood. “Case Studies in Independent Component Analysis.” Weingessel, A. “An Analysis of Learning Algorithms in PCA and SVD Neural Networks.” Karhunen, J. “Neural Approaches to Independent Component Analysis.” Amari, S., A. Cichocki, and H. H. Yang. “Recurrent Neural Networks for Blind Separation of Sources.”