Projection Pursuit. Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Independent Component Analysis
Independent Component Analysis: The Fast ICA algorithm
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.
Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.
Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner
Dimension reduction (1)
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Principal Component Analysis
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Neural Computation Prof. Nathan Intrator
Dimensional reduction, PCA
Independent Component Analysis (ICA) and Factor Analysis (FA)
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
Face Collections : Rendering and Image Processing Alexei Efros.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Lecture 09 Clustering-based Learning
Projection methods in chemistry Autumn 2011 By: Atefe Malek.khatabi M. Daszykowski, B. Walczak, D.L. Massart* Chemometrics and Intelligent Laboratory.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Summarized by Soo-Jin Kim
Outline Separating Hyperplanes – Separable Case
Computational Intelligence: Methods and Applications Lecture 5 EDA and linear transformations. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neuronal Goal Neuronal Goal n-dimensional vectors m-dimensional vectors m < n transform from 2 to 1 dimension Ex: We look for axes which minimise projection.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,
Neural Computation Prof. Nathan Intrator
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Independent Component Analysis Independent Component Analysis.
Feature Selection and Extraction Michael J. Watts
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Computational Intelligence: Methods and Applications Lecture 9 Self-Organized Mappings Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Self-Organizing Network Model (SOM) Session 11
Deep Feedforward Networks
LECTURE 11: Advanced Discriminant Analysis
Principal Component Analysis (PCA)
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Computational Intelligence: Methods and Applications
Lecture 14 PCA, pPCA, ICA.
Computational Intelligence: Methods and Applications
Dimensionality Reduction
A Fast Fixed-Point Algorithm for Independent Component Analysis
Feature space tansformation methods
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Computational Intelligence: Methods and Applications
Presentation transcript:

Projection Pursuit

Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of merit” function, that allows for low-dim (usually 2D or 3D) projection. Interesting indices may use a priori knowledge about the problem: 1.mean nearest neighbor distance – increase clustering of Y (j) 2. maximize mutual information between classes and features 3.find projection that have non-Gaussian distributions. The last index does not use a priori knowledge; it leads to the Independent Component Analysis (ICA). ICA features are not only uncorrelated, but also independent. Index of “interestingness” General transformation with parameters W.

Kurtosis ICA is a special version of PP, recently very popular. Gaussian distributions of variable Y are characterized by 2 parameters: mean value: variance: These are the first 2 moments of distribution; all higher are 0 for G(Y). Super-Gaussian distribution: long tail, peak at zero,  4 (y)>0, like binary image data. sub-Gaussian distribution is more flat and has  4 (y)<0, like speech signal data. One simple measure of non-Gaussianity of projections is the 4-th moment (cumulant) of the distribution, called kurtosis, measures “skewedness” of the distribution. For E{Y} =0 kurtosis is:

Correlation and independence Features Y i, Y j are uncorrelated if covariance is diagonal, or: This is much stronger condition than correlation; in particular the functions may be powers of variables; any non-Gaussian distribution after PCA transformation will still have correlated features. Uncorrelated features are orthogonal. Statistically independent features Y i, Y j for any functions give: Variables are statistically independent if their joint probability distribution is a product of probabilities for all variables:

PP/ICA example Example: PCA and PP based on maximal kurtosis: note nice separation of the blue class.

Some remarks Other components are found in the space orthogonal to W 1 T X Many formulations of PP and ICA methods exist. PP is used for data visualization and dimensionality reduction. Nonlinear projections are frequently considered, but solutions are more numerically intensive. PCA may also be viewed as PP, max (for standardized data): Same index is used, with projection on space orthogonal to k-1 PCs. Index I(Y;W) is based here on maximum variance.

How do we find multiple Projections Statistical approach is complicated: –Perform a transformation on the data to eliminate structure in the already found direction –Then perform PP again Neural Comp approach: Lateral Inhibition

High Dimensional Data Dimension Reduction Feature Extraction Visualisation Classification Analysis

Projection Pursuit what: An automated procedure that seeks interesting low dimensional projections of a high dimensional cloud by numerically maximizing an objective function or projection index. Huber, 1985

Projection Pursuit why: Curse of dimensionality Less Robustness worse mean squared error greater computational cost slower convergence to limiting distributions … Required number of labelled samples increases with dimensionality.

What is an interesting projection In general: the projection that reveals more information about the structure. In pattern recognition: a projection that maximises class separability in a low dimensional subspace.

Projection Pursuit Dimensional Reduction Find lower-dimensional projections of a high-dimensional point cloud to facilitate classification. Exploratory Projection Pursuit Reduce the dimension of the problem to facilitate visualization.

Projection Pursuit How many dimensions to use for visualization for classification/analysis Which Projection Index to use measure of variation (Principal Components) departure from normality (negative entropy) class separability(distance, Bhattacharyya, Mahalanobis,...) …

Projection Pursuit Which optimization method to choose We are trying to find the global optimum among local ones hill climbing methods (simulated annealing) regular optimization routines with random starting points.

Timetable for Dimensionality reduction Begin 16 April 1998 Report on the state-of-the-art. 1 June 1998 Begin software implementation15 June 1998 Prototype software presentation 1 November 1998

ICA demos ICA has many applications in signal and image analysis. Finding independent signal sources allows for separation of signals from different sources, removal of noise or artifacts. Observations X are a linear mixture W of unknown sources Y Play with ICALab PCA/ICA Matlab software for signal/image analysis: ml ml Both W and Y are unknown! This is a blind separation problem. How can they be found? If Y are Independent Components and W linear mixing the problem is similar to FDA or PCA, only the criterion function is different.

ICA demo: images & audio Example from Cichocki’s lab, X space for images: take intensity of all pixels  one vector per image, or take smaller patches (ex: 64x64), increasing # vectors 5 images: originals, mixed, convergence of ICA iterationsoriginalsmixedICA iterations X space for signals: sample the signal for some time  t 10 songs: mixed samples and separated samplesmixed samplesseparated samples

Self-organization PCA, FDA, ICA, PP are all inspired by statistics, although some neural-inspired methods have been proposed to find interesting solutions, especially for their non-linear versions. Brains learn to discover the structure of signals: visual, tactile, olfactory, auditory (speech and sounds). This is a good example of unsupervised learning: spontaneous development of feature detectors, compressing internal information that is needed to model environmental states (inputs). Some simple stimuli lead to complex behavioral patterns in animals; brains use specialized microcircuits to derive vital information from signals – for example, amygdala nuclei in rats sensitive to ultrasound signals signifying “cat around”.

Models of self-organizaiton SOM or SOFM (Self-Organized Feature Mapping) – self-organizing feature map, one of the simplest models. How can such maps develop spontaneously? Local neural connections: neurons interact strongly with those nearby, but weakly with those that are far (in addition inhibiting some intermediate neurons). History: von der Malsburg and Willshaw (1976), competitive learning, Hebb mechanisms, „Mexican hat” interactions, models of visual systems. Amari (1980) – models of continuous neural tissue. Kohonen (1981) - simplification, no inhibition; leaving two essential factors: competition and cooperation.

21 Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch SCE, NTU, Singapore Google: Duch

22 Computational Intelligence: Methods and Applications Lecture 6 Principal Component Analysis. Włodzisław Duch SCE, NTU, Singapore