Computational Intelligence: Methods and Applications

Slides:

Advertisements

Similar presentations

FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.

Advertisements

Applications of one-class classification

Independent Component Analysis

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner

Dimension reduction (1)

Principal Component Analysis

Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.

Dimensional reduction, PCA

Independent Component Analysis & Blind Source Separation Ata Kaban The University of Birmingham.

Projection Pursuit. Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.

A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Lecture 09 Clustering-based Learning

Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.

Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.

Self Organizing Feature Map CS570 인공지능 이대성 Computer Science KAIST.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Neuronal Goal Neuronal Goal n-dimensional vectors m-dimensional vectors m < n transform from 2 to 1 dimension Ex: We look for axes which minimise projection.

Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Data statistics and transformation revision Michael J. Watts

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Unsupervised Learning

Self-Organizing Network Model (SOM) Session 11

Lectures 15: Principal Component Analysis (PCA) and

Deep Feedforward Networks

Computational Intelligence: Methods and Applications

LECTURE 11: Advanced Discriminant Analysis

Computational Intelligence: Methods and Applications

Brain Electrophysiological Signal Processing: Preprocessing

Computational Intelligence: Methods and Applications

10701 / Machine Learning.

Principal Component Analysis (PCA)

Computational Intelligence: Methods and Applications

Clustering (3) Center-based algorithms Fuzzy k-means

Overview of Supervised Learning

Application of Independent Component Analysis (ICA) to Beam Diagnosis

PCA vs ICA vs LDA.

Lecture 22 Clustering (3).

Computational Intelligence: Methods and Applications

Lecture 14 PCA, pPCA, ICA.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Computational Intelligence: Methods and Applications

Computational Intelligence: Methods and Applications

A Fast Fixed-Point Algorithm for Independent Component Analysis

Computational Intelligence: Methods and Applications

Principal Component Analysis

Machine Learning – a Probabilistic Perspective

Feature Selection in BCIs (section 5 and 6 of Review paper)

Unsupervised Learning

What is Artificial Intelligence?

Presentation transcript:

Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics, UMK Google: W Duch (c) 1999. Tralvex Yeap. All Rights Reserved

Exploratory Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of merit” function, that allows for low-dim (usually 2D or 3D) projection. General transformation with parameters W. Index of “interestingness” Interesting indices may use a priori knowledge about the problem: 1. mean nearest neighbor distance – increase clustering of Y(j) 2. maximize mutual information between classes and features 3. find projection that have non-Gaussian distributions. The last index does not use a priori knowledge; it leads to the Independent Component Analysis (ICA), unsupervised method. ICA features are not only uncorrelated, but also independent. (c) 1999. Tralvex Yeap. All Rights Reserved

Kurtosis ICA is a special version of PP, recently very popular. Gaussian distributions of variable Y are characterized by 2 parameters: mean value: variance: These are the first 2 moments of distribution; all higher are 0 for G(Y). One simple measure of non-Gaussianity of projections is the 4-th moment (cumulant) of the distribution, called kurtosis, measures “concentration” of the distribution. It the mean E{Y}=0 kurtosis is: Super-Gaussian distribution: long tail, peak at zero, k4(Y)>0, like binary image data. sub-Gaussian distribution is more flat and has k4(Y)<0, like speech signal data. Find interesting direction looking for maxW |k4(Y(W))| (c) 1999. Tralvex Yeap. All Rights Reserved

Correlation and independence Variables Yi are statistically independent if their joint probability distribution is a product of probabilities for all variables: Features Yi, Yj are uncorrelated if covariance is diagonal, or: Uncorrelated features are orthogonal, but may have higher-order dependencies, while any functions of independent features Yi, Yj This is much stronger condition than correlation; in particular the functions may be powers of variables; any non-Gaussian distribution after PCA transformation will still have some feature dependencies. (c) 1999. Tralvex Yeap. All Rights Reserved

PP/ICA example Example: PCA and PP based on maximal kurtosis: note nice separation of the blue class. (c) 1999. Tralvex Yeap. All Rights Reserved

Some remarks Many algorithms for exploratory PP and ICA methods exist. PP is used for visualization, dimensionality reduction & regression. Nonlinear projections are frequently considered, but solutions are more numerically intensive. PCA may also be viewed as PP, maximizing (for standard. data): Index I(Y;W) is based here on maximum variance. Other components are found in the space orthogonal to W1 Same index is used, with projection on space orthogonal to k-1 PCs. PP/ICA description: Chap. 14.6, Friedman, Hastie, Tibshirani. (c) 1999. Tralvex Yeap. All Rights Reserved

ICA demos ICA has many applications in signal and image analysis. Finding independent signal sources allows for separation of signals from different sources, removal of noise or artifacts. Observations X are a linear mixture W of unknown sources Y Both W and Y are unknown! This is a blind separation problem. How can they be found? If Y are Independent Components and W linear mixing the problem is similar to PCA, only the criterion function is different. Play with ICA-Lab PCA/ICA Matlab software for signal/image analysis: http://www.bsp.brain.riken.go.jp/page7.html (c) 1999. Tralvex Yeap. All Rights Reserved

ICA examples Mixing simple signals: sinus + chainsaw. Vectors X = samples of signals in some time window From: Chap. 14.6, Friedman, Hastie, Tibshirani: The elements of statistical learning. (c) 1999. Tralvex Yeap. All Rights Reserved

ICA demo: images & audio Play with ICA-Lab PCA/ICA Matlab software for signal/image analysis from Cichocki’s lab, http://www.bsp.brain.riken.go.jp/page7.html X space for images: take intensity of all pixels  one vector per image, or take smaller patches (ex: 64x64), increasing # vectors 5 images: originals, mixed, convergence of ICA iterations X space for signals: sample the signal for some time Dt 10 songs: mixed samples and separated samples Good survey paper on ICA is at: http://www.cis.hut.fi/aapo/papers/NCS99web/ (c) 1999. Tralvex Yeap. All Rights Reserved

Further reading Many other visualization and dimensionality reduction methods exit. See the links here: http://www.is.umk.pl/~duch/CI.html#vis http://www.is.umk.pl /software.html#Visual Principal curves Web page http://www.iro.umontreal.ca/~kegl/research/pcurves/ Good page with research papers on manifold learning: http://www.cse.msu.edu/~lawhiu/manifold/ A. Webb, Chapter 6.3 on projection pursuit, chap. 9.3 on PCA Duda/Hart/Stork, chap. 3.8 on PCA and FDA Now we shall turn to non-linear methods inspired by the approach that is used by our brains. (c) 1999. Tralvex Yeap. All Rights Reserved

Self-organization PCA, FDA, ICA, PP are all inspired by statistics, although some neural-inspired methods have been proposed to find interesting solutions, especially for non-linear PP versions. Brains learn to discover the structure of signals: visual, tactile, olfactory, auditory (speech and sounds). This is a good example of unsupervised learning: spontaneous development of feature detectors, compressing internal information that is needed to model environmental states (inputs). Some simple stimuli lead to complex behavioral patterns in animals; brains use specialized microcircuits to derive vital information from signals – for example, amygdala nuclei in rats sensitive to ultrasound signals signifying “cat around”. (c) 1999. Tralvex Yeap. All Rights Reserved

Models of self-organizaiton SOM or SOFM (Self-Organized Feature Mapping) – self-organizing feature map, one of the simplest models. How can such maps develop spontaneously? Local neural connections: neurons interact strongly with those nearby, but weakly with those that are far (in addition inhibiting some intermediate neurons). History: von der Malsburg and Willshaw (1976), competitive learning, Hebb mechanisms, „Mexican hat” interactions, models of visual systems. Amari (1980) – models of continuous neural tissue. Kohonen (1981) - simplification, no inhibition; leaving two essential factors: competition and cooperation. (c) 1999. Tralvex Yeap. All Rights Reserved