Bayesian belief networks 2. PCA and ICA

Slides:



Advertisements
Similar presentations
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Advertisements

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Face Recognition Jeremy Wyatt.
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
Bayesian belief networks 2. PCA and ICA
Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Techniques for studying correlation and covariance structure
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 Exercise 1 Submission Monday 19 Dec, 2010 Delayed Submission: 4 points every week How would you calculate efficiently the PCA of data where the dimensionality.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Linear Algebra Diyako Ghaderyan 1 Contents:  Linear Equations in Linear Algebra  Matrix Algebra  Determinants  Vector Spaces  Eigenvalues.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
Linear Algebra Diyako Ghaderyan 1 Contents:  Linear Equations in Linear Algebra  Matrix Algebra  Determinants  Vector Spaces  Eigenvalues.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Object Orie’d Data Analysis, Last Time
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Principal Component Analysis
Lectures 15: Principal Component Analysis (PCA) and
Ch 12. Continuous Latent Variables ~ 12
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Principle Component Analysis (PCA) Networks (§ 5.8)
School of Computer Science & Engineering
LECTURE 10: DISCRIMINANT ANALYSIS
9.3 Filtered delay embeddings
Brain Electrophysiological Signal Processing: Preprocessing
Principal Component Analysis (PCA)
PCA vs ICA vs LDA.
Numerical Analysis Lecture 16.
Techniques for studying correlation and covariance structure
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Presented by Nagesh Adluru
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
X.1 Principal component analysis
A Fast Fixed-Point Algorithm for Independent Component Analysis
Feature space tansformation methods
Symmetric Matrices and Quadratic Forms
Principal Components What matters most?.
LECTURE 09: DISCRIMINANT ANALYSIS
Feature Selection Methods
Principal Component Analysis
Eigenvalues and Eigenvectors
Vector Spaces COORDINATE SYSTEMS © 2012 Pearson Education, Inc.
Symmetric Matrices and Quadratic Forms
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Bayesian belief networks 2. PCA and ICA Peter Andras andrasp@ieee.org

Principal component analysis PCA 1. Idea: the high dimensional data might be situated on a lower dimensional surface.

PCA 2. How to find the lower dimensional surface ? We look for linear surfaces, i.e., hyperplanes. We decompose the correlation matrix of data conform its eigenvectors.

PCA 3. The eigenvectors are called principal component vectors. The new data vectors are formed by the projections of the original data vectors onto the principal component vectors.

PCA 4. are the data vectors The correlation matrix is:

PCA 5. The eigenvectors are determined by the equation: where  is a real number. Example with two eigenvectors:

PCA 6. In principle we should find d eigenvectors if the dimensionality of the data vectors is d. If the data vectors are situated on a lower dimensional linear surface we find less than d eigenvectors (i.e., the determinant of the correlation matrix is zero).

PCA 7. If v1, v2, …, vm, m<d, are the eigenvectors of R then the new, transformed data vectors are calculated as:

PCA 8. How to calculate the eigenvectors of R ? First method: use standard matrix algebra methods. (it is very laborious) Second method: iterative calculation of the eigenvectors inspired by artificial neural networks.

PCA 9. Iterative calculation of the eigenvectors Let w1 Rd a randomly chosen vector, such that ||w1||=1 Perform iteratively the calculation: where yi=w1Txi and  is a learning constant. The algorithm converges to the eigenvector corresponding to the largest eigenvalue ().

PCA 10. To calculate the following eigenvectors we modify the iterative algorithm. Now we use the calculation formula: where and uji=wjTxi. This iterative algorithm converges to wk the k-th eigenvector.

PCA 11. If the algorithm doesn’t converge the situation can be: a. the vector enters in a cycle; b. the values doesn’t form any cycle. If we have a cycle, all the vectors of the cycle are eigenvectors, and their corresponding eigenvalues are very close. If we have no convergence and no cycle, that means that there is no more eigenvector that can be determined.

PCA 12. How to use the PCA for dimension reduction ? Select the important eigenvectors. Many times all of the eigenvectors can be determined but only part of them are important. The importance of the eigenvectors is shown by their associated eigenvalue.

PCA 13. Selecting the important eigenvectors. 1. Graphical method:

PCA 14. Selecting the important eigenvectors. 2. Relative power: 3. Cumulative power:

PCA 15. Summary The PCA is used for dimensionality reduction. The data vectors are projected on the eigenvectors of their correlation matrix to obtain the transformed data vectors. To calculate easily the PCA we can use the iterative algorithm. To reduce the data dimension we consider only the important eigenvectors.

Independent component analysis ICA 1. The idea: if the data vectors are linear combination of statistically independent data components, they should be separable in their components. This is true if the component vectors have non-Gaussian distribution, with sharper or flatter peak.

ICA 2. Suppose xi=Asi, where xi are the data vectors, si are the vectors of statistically independent components (sji) Our goal is to find the matrix A (more precisely, the rows of it). Example: ‘cocktail-party’ effect: many independent voices registered together; goal: separate the independent voices; the recorded mixture is a linear mixture.

ICA 3. How to find the independent components ? Optimize: All solution vectors (w) are local minimum solutions, and they correspond to one of the independent components, i.e., on the components of the si vectors.

ICA 4. How to do it practically ? FastICA algorithm (Hyvarinen and Oja): Calculates by iterations the w vectors. The calculation formula is: w converges to one of the vectors corresponding to one of the independent components.

ICA 5. In practice we have to calculate several w vectors. To test whether the generated independent components are really independent we can use statistical tests. Let us consider s1i=w1Txi and s2i=w2Txi. Then we can test the independence of s1 and s2 by calculating their correlation and testing their identical origin by the F-test (they may not be strongly correlated but at the same time they may have identical origin). If the testing accepts the independence of the two series we may accept w2 as a new vector that corresponds to a separate independent component.

ICA 6. Remarks By calculating the independent components we get a new representation of the data, which has the property that the components contain minimum mutual information. We can use the ICA to select the independent non-Gaussian components, but we cannot separate the Gaussian mixtures.