Object Orie’d Data Analysis, Last Time

Slides:

Advertisements

Similar presentations

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Advertisements

Independent Component Analysis

Independent Component Analysis: The Fast ICA algorithm

General Linear Model With correlated error terms  =  2 V ≠  2 I.

Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

HDLSS Asy’s: Geometrical Represent’n Assume, let Study Subspace Generated by Data Hyperplane through 0, ofdimension Points are “nearly equidistant to 0”,

The General Linear Model. The Simple Linear Model Linear Regression.

2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer

Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.

Independent Component Analysis & Blind Source Separation

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Subspace and Kernel Methods April 2004 Seong-Wook Joo.

Independent Component Analysis (ICA)

Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.

Dimensional reduction, PCA

Independent Component Analysis & Blind Source Separation Ata Kaban The University of Birmingham.

Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.

Independent Component Analysis (ICA) and Factor Analysis (FA)

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

An Introduction to Independent Component Analysis (ICA) 吳育德陽明大學放射醫學科學研究所台北榮總整合性腦功能實驗室.

Bayesian belief networks 2. PCA and ICA

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Radial Basis Function Networks

Survey on ICA Technical Report, Aapo Hyvärinen, 1999.

Independent Components Analysis with the JADE algorithm

Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data.

Object Orie’d Data Analysis, Last Time

Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.

Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.

Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.

Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England

Yaomin Jin Design of Experiments Morris Method.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Object Orie’d Data Analysis, Last Time Finished Q-Q Plots –Assess variability with Q-Q Envelope Plot SigClust –When is a cluster “really there”? –Statistic:

Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.

Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,

Introduction to Linear Algebra Mark Goldman Emily Mackevicius.

 The Sinkhorn-Knopp Algorithm and Fixed Point Problem  Solutions for 2 × 2 and special n × n cases  Circulant matrices for 3 × 3 case  Ongoing work.

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Principal Component Analysis (PCA)

Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition

Independent Component Analysis Independent Component Analysis.

Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.

An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.

Object Orie’d Data Analysis, Last Time Reviewed Clustering –2 means Cluster Index –SigClust When are clusters really there? Q-Q Plots –For assessing Goodness.

PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.

Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.

Return to Big Picture Main statistical goals of OODA:

Object Orie’d Data Analysis, Last Time

LECTURE 11: Advanced Discriminant Analysis

Brain Electrophysiological Signal Processing: Preprocessing

PCA vs ICA vs LDA.

Bayesian belief networks 2. PCA and ICA

A Fast Fixed-Point Algorithm for Independent Component Analysis

Presentation transcript:

Object Orie’d Data Analysis, Last Time More on K-Means Clustering Explored simple 2-d examples Focus on local mimina, and random starts Sometimes results were stable Sometimes very unstable Breast Cancer Data Significant splits  Few local minima Insignificant splits  Many local minima

Independent Component Analysis ??? Next time: review material from 11-Intro2ICA.ppt

Independent Component Analysis Personal Viewpoint: Directions that maximize independence (e.g. PCA, DWD) Motivating Context: Signal Processing “Blind Source Separation”

Independent Component Analysis References: Cardoso (1989) (1st paper) Lee (1989) (early book, not recco’ed) Hyvärinen & Oja (1998) (excellent short tutorial) Hyvärinen, Karhunen & Oja (2001) (detailed monograph)

ICA, Motivating Example “Cocktail party problem”: Hear several simulatneous conversations Would like to separate them Model for “conversations”: time series and

ICA, Motivating Example Model for “conversations”: time series

ICA, Motivating Example Mixed version of signals And also a 2nd mixture:

ICA, Motivating Example Mixed version of signals:

ICA, Motivating Example Goal: Recover signal From data For unknown mixture matrix where for all

ICA, Motivating Example Goal is to find separating weights so that for all Problem: would be fine, but is unknown

ICA, Motivating Example Solutions for Cocktail Party Problem: PCA (on population of 2-d vectors)

ICA, Motivating Example PCA (on population of 2-d vectors)

ICA, Motivating Example Solutions for Cocktail Party Problem: PCA (on population of 2-d vectors) [direction of maximal variation doesn’t solve this problem] ICA (will describe method later)

ICA, Motivating Example ICA (will describe method later)

ICA, Motivating Example Solutions for Cocktail Party Problem: PCA (on population of 2-d vectors) [direction of maximal variation doesn’t solve this problem] ICA (will describe method later) [Independent Components do solve it] [modulo sign changes and identification]

ICA, Motivating Example Recall original time series:

ICA, Motivating Example Relation to FDA: recall data matrix Signal Processing: focus on rows ( time series, for ) Functional Data Analysis: focus on columns ( data vectors) Note: 2 viewpoints like “duals” for PCA

ICA, Motivating Example Scatterplot View (signal processing): Study Signals

ICA, Motivating Example Study Signals

ICA, Motivating Example Scatterplot View (signal processing): Study Signals Corresponding Scatterplot

ICA, Motivating Example Signals - Corresponding Scatterplot

ICA, Motivating Example Scatterplot View (signal processing): Study Signals Corresponding Scatterplot Data

ICA, Motivating Example Study Data

ICA, Motivating Example Scatterplot View (signal processing): Study Signals Corresponding Scatterplot Data

ICA, Motivating Example Data - Corresponding Scatterplot

ICA, Motivating Example Scatterplot View (signal processing): Plot Signals & Scat’plot Data & Scat’plot Scatterplots give hint how ICA is possible Affine transformation Stretches indep’t signals into dep’t “Inversion” is key to ICA (even w/ unknown)

ICA, Motivating Example Why not PCA? Finds direction of greatest variation

ICA, Motivating Example PCA - Finds direction of greatest variation

ICA, Motivating Example Why not PCA? Finds direction of greatest variation Which is wrong for signal separation

ICA, Motivating Example PCA - Wrong for signal separation

ICA, Motivating Example Why not PCA? Finds direction of greatest variation Which is wrong for signal separation

(shown on right in scatterplot view) ICA, Algorithm ICA Step 1: “sphere the data” (shown on right in scatterplot view)

ICA, Algorithm ICA Step 1: sphere the data

ICA, Algorithm ICA Step 1: “sphere the data” (shown on right in scatterplot view) i.e. find linear transf’n to make mean = , cov = i.e. work with requires of full rank (at least , i.e. no HDLSS) search for independent, beyond linear and quadratic structure

ICA, Algorithm ICA Step 2: Find dir’ns that make (sphered) data independent as possible Recall “independence” means: joint distribution is product of marginals In cocktail party example:

ICA, Algorithm ICA Step 2: Cocktail party example

ICA, Algorithm ICA Step 2: Find dir’ns that make (sphered) data independent as possible Recall “independence” means: joint distribution is product of marginals In cocktail party example: Happens only when support parallel to axes Otherwise have blank areas, but marginals are non-zero

ICA, Algorithm Parallel Idea (and key to algorithm): Find directions that maximize non-Gaussianity Based on assumption: starting from independent coordinates Note: “most” projections are Gaussian (since projection is “linear combo”) Mathematics behind this: Diaconis and Freedman (1984)

Independent & spherically symmetric ICA, Algorithm Worst case for ICA: Gaussian Then sphered data are independent So have independence in all (ortho) dir’ns Thus can’t find useful directions Gaussian distribution is characterized by: Independent & spherically symmetric

ICA, Algorithm Criteria for non-Gaussianity / independence: Kurtosis (4th order cumulant) Negative Entropy Mutual Information Nonparametric Maximum Likelihood “Infomax” in Neural Networks Interesting connections between these

ICA, Algorithm Matlab Algorithm (optimizing any of above): “FastICA” Numerical gradient search method Can find directions iteratively Or by simultaneous optimization (note: PCA does both, but not ICA) Appears fast, with good defaults Careful about local optima (Recco: several random restarts)

ICA, Algorithm FastICA Notational Summary: First sphere data: Apply ICA: Find to make rows of “indep’t” Can transform back to: original data scale:

ICA, Algorithm Careful look at identifiability Seen already in above example Could rotate “square of data” in several ways, etc.

ICA, Algorithm Careful look at identifiability

ICA, Algorithm Identifiability: Swap Flips

ICA, Algorithm Identifiability problem 1: Generally can’t order rows of (& ) (seen as swap above) Since for a “permutation matrix” (pre-multiplication by “swaps rows”) (post-multiplication by “swaps columns”) For each column, i.e.

Generally can’t order rows of (& ) ICA, Algorithm Identifiability problem 1: Generally can’t order rows of (& ) Since for a “permutation matrix” For each column, So and are also ICA solut’ns (i.e. ) FastICA: appears to order in terms of “how non-Gaussian”

ICA, Algorithm Identifiability problem 2: Can’t find scale of elements of (seen as flips above) Since for a (full rank) diagonal matrix (pre-multipl’n by is scalar mult’n of rows) (post-multipl’n by is scalar mult’n of col’s) For each col’n, i.e. So and are also ICA solutions

Cocktail Party Example ICA, Algorithm Signal Processing Scale identification: (Hyvärinen and Oja, 1999) Choose scale so each signal has “unit average energy”: (preserves energy along rows of data matrix) Explains “same scales” in Cocktail Party Example

ICA & Non-Gaussianity Explore main ICA principle: For indep., non-Gaussian, standardized, r.v.’s: Projections farther from coordinate axes are more Gaussian

ICA & Non-Gaussianity Explore main ICA principle: Projections farther from coordinate axes are more Gaussian: For the dir’n vector , where (thus ), have , for large and

ICA & Non-Gaussianity Illustrative examples (d = 100, n = 500) Uniform Marginals Exponential Marginals Bimodal Marginals Study over range of values of k

ICA & Non-Gaussianity Illustrative example - Uniform marginals

ICA & Non-Gaussianity Illustrative examples (d = 100, n = 500) Uniform Marginals k = 1: very poor fit (Uniform far from Gaussian) k = 2: much closer? (Triangular closer to Gaussian) k = 4: very close, but still have stat’ly sig’t difference k >= 6: all diff’s could be sampling var’n

ICA & Non-Gaussianity Illustrative example - Exponential Marginals

ICA & Non-Gaussianity Illustrative examples (d = 100, n = 500) Exponential Marginals still have convergence to Gaussian, but slower (“skewness” has stronger impact than “kurtosis”) now need k >= 25 to see no difference

ICA & Non-Gaussianity Illustrative example - Bimodal Marginals

ICA & Non-Gaussianity Summary: For indep., non-Gaussian, standardized, r.v.’s: , Projections farther from coordinate axes are more Gaussian

ICA & Non-Gaussianity Projections farther from coordinate axes are more Gaussian Conclusions: Expect most proj’ns are Gaussian Non-G’n proj’ns (ICA target) are special Is a given sample really “random”??? (could test???) High dim’al space is a strange place

More ICA Wish I had time for: More toy examples Explore local min’rs & random starts Deep look at kurtosis (as “non-linearity”) Connection to Projection Pursuit ICA to find outliers in HDLSS space