Modeling and Estimation of Dependent Subspaces J. A. Palmer 1, K. Kreutz-Delgado 2, B. D. Rao 2, Scott Makeig 1 1 Swartz Center for Computational Neuroscience.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

NORMAL OR GAUSSIAN DISTRIBUTION Chapter 5. General Normal Distribution Two parameter distribution with a pdf given by:
Independent Component Analysis: The Fast ICA algorithm
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Computer vision: models, learning and inference Chapter 8 Regression.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Pattern Recognition and Machine Learning
Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Newton Method for the ICA Mixture Model
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Variational and Scale Mixture Density Representations for Estimation in the Bayesian Linear Model : Sparse Coding, Independent Component Analysis, and.
Clustering.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Computer vision: models, learning and inference
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Computer vision: models, learning and inference Chapter 5 The Normal Distribution.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
Outline Separating Hyperplanes – Separable Case
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
CSC321: Neural Networks Lecture 24 Products of Experts Geoffrey Hinton.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Geology 6600/7600 Signal Analysis 02 Sep 2015 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Joint Moments and Joint Characteristic Functions.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Probability Theory and Parameter Estimation I
Parameter Estimation 主講人:虞台文.
Classification of unlabeled data:
Computer vision: models, learning and inference
Of Probability & Information Theory
Latent Variables, Mixture Models and EM
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Bayesian Models in Machine Learning
EE513 Audio Signals and Systems
A Fast Fixed-Point Algorithm for Independent Component Analysis
Parametric Methods Berlin Chen, 2005 References:
Independent Factor Analysis
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Linear Discrimination
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Probabilistic Surrogate Models
Presentation transcript:

Modeling and Estimation of Dependent Subspaces J. A. Palmer 1, K. Kreutz-Delgado 2, B. D. Rao 2, Scott Makeig 1 1 Swartz Center for Computational Neuroscience 2 Department of Electrical and Computer Engineering University of California San Diego September 11, 2007

Outline Previous work on adaptive source densities Types of dependency – Variance dependency – Skew dependency – Non-radially symmetric dependency Normal Variance-Mean Mixtures Examples from EEG

Independent Source Densities A general classification of sources: Sub- and Super-Gaussian Super-Gaussian = more peaked, than Gaussian, heavier tail Sub-Gaussian = flatter, more uniform, shorter tail than Gaussian Sub- AND Super-Gaussian Super-Gaussian Sub-Gaussian Gaussian

Extended Infomax The (independent) source models used in the Extended Infomax algorithm (Lee) are: Super-Gaussian (Logistic)Sub-Gaussian (Gaussian mixture)

Scale Mixture Representation Gaussian Scale Mixtures (GSMs) are sums of Gaussians densities with different variances, but all zero mean: A random variable with a GSM density can be represented as a product of Standard Normal random variable Z, and an arbitrary non-negative random variable : X =  1/2 Z Gaussians Gaussian Scale Mixture Sums of random number of random variables leads to GSM (Renyi) Multivariate densities can be modeled by product non-negative scalar and Gaussian random vector:

Super-Gaussian Mixture Model Generalize of Gaussian mixture model to super-Gaussian mixtures: The update rules are similar to the Gaussian mixture model, but include the variational parameters ,

Gaussian Scale Mixture Examples 1 Generalized Gaussian, 0 <  < 2: Mixing density is related to positive alpha stable density, S  /2 :

Gaussian Scale Mixture Examples 2 Generalized Logistic, > 0: Mixing density is Generalized Kolmogorov:

Gaussian Scale Mixture Examples 3 Generalized Hyperbolic: Mixing density is Generalized Inverse Gaussian:

Dependent Subspaces Dependent sources modeled by Gaussian scale mixture, i.e. Gaussian vector with common scalar multiplier, yielding “variance dependence”:

Dependent Multivariate Densities Multiply Gaussian vector by common scalar: Taking derivatives of both sides: For GSM evaluated at :

Define the linear operator V : Then we have, Thus, given univariate GSM, can form multivariate GSM: Posterior moments can be calculated for EM: Dependent Multivariate Densities

Example: Generalized Gaussian: Example: Generalized Logistic: Given a univariate GSM p(x), a dependent multivariate density in R 3 is given by: Examples in R 3

Non-radial Symmetry Use Generalized Gaussian vectors to model non-radially symmetric dependence:

For a Generalized Gaussian scale mixture, The Generalized Hyperbolic density (Barndorff-Nielsen, 1982) is a GSM: The posterior is Generalized Inverse Gaussian: Generalized Hyperbolic

The posterior moment for EM is given by: This yields the “Hypergeneralized Hyperbolic density” Hypergeneralized Hyperbolic

Integrating this over R d we get: Thus, given a multivariate GSM, we can formulate a multivariate GGSM: More generally, evaluating a multivariate GSM at x p/2 : Generalized Gauss. Scale Mixtures

Skew Dependence Skew is modeled with “location-scale mixtures”:

Now, for any vector , we have: This can be written in the form, For a multivariate GSM: This is equivalent to the following model: Skew Models

EEG brain sources ocular sources scalp muscle sources external EM sources heartbeat

Pairwise Mutual Information

Maximize Block Diagonality

Variance Dependency Variance dependence can be estimated directly using 4 th order cross moments Find covariance of source power: Finds components whose activations are “active” at the same times, “co-modulated”

Mutual Information/Power Covariance Most of the dependence in mutual information is captured by covariance of power. Summed over 50 lags Some pairs of sources are more sensitive to variance dependence.

Variance Dependent Sources

Marginal Histograms are “Sparse” However product density is approximately “radially symmetric” Radially symmetric non-Gaussian densities are dependent

Conclusion We described a general framework for modeling dependent sources Estimation of model parameters is carried out using the EM algorithm Models include variance dependency, non- radial symmetric dependence, and skew dependence Application to analysis of EEG sources