Newton Method for the ICA Mixture Model

Slides:



Advertisements
Similar presentations
Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.
Advertisements

Zhen Lu CPACT University of Newcastle MDC Technology Reduced Hessian Sequential Quadratic Programming(SQP)
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Segmentation and Fitting Using Probabilistic Methods
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Visual Recognition Tutorial
Lecture 11: Recursive Parameter Estimation
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Raymond J. Carroll Texas A&M University Nonparametric Regression and Clustered/Longitudinal Data.
Function Optimization Newton’s Method. Conjugate Gradients
Motion Analysis Slides are from RPI Registration Class.
MECH300H Introduction to Finite Element Methods Lecture 2 Review.
Modeling and Estimation of Dependent Subspaces J. A. Palmer 1, K. Kreutz-Delgado 2, B. D. Rao 2, Scott Makeig 1 1 Swartz Center for Computational Neuroscience.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Expectation-Maximization
ECIV 301 Programming & Graphics Numerical Methods for Engineers REVIEW II.
CSci 6971: Image Registration Lecture 5: Feature-Base Regisration January 27, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart,
Maximum likelihood (ML)
QUASI MAXIMUM LIKELIHOOD BLIND DECONVOLUTION QUASI MAXIMUM LIKELIHOOD BLIND DECONVOLUTION Alexander Bronstein.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

By Mary Hudachek-Buswell. Overview Atmospheric Turbulence Blur.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Algebra 2: Lesson 5 Using Matrices to Organize Data and Solve Problems.
Alignment Introduction Notes courtesy of Funk et al., SIGGRAPH 2004.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
CMPS 1371 Introduction to Computing for Engineers MATRICES.
Matrices. A matrix, A, is a rectangular collection of numbers. A matrix with “m” rows and “n” columns is said to have order m x n. Each entry, or element,
Chapter 3 Solution of Algebraic Equations 1 ChE 401: Computational Techniques for Chemical Engineers Fall 2009/2010 DRAFT SLIDES.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Matrix Differential Calculus By Dr. Md. Nurul Haque Mollah, Professor, Dept. of Statistics, University of Rajshahi, Bangladesh Dr. M. N. H. MOLLAH.
Solution of a System of ODEs with POLYMATH and MATLAB, Boundary Value Iterations with MATLAB For a system of n simultaneous first-order ODEs: where x is.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Measure Independence in Kernel Space Presented by: Qiang Lou.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.
3.6 Solving Systems Using Matrices You can use a matrix to represent and solve a system of equations without writing the variables. A matrix is a rectangular.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
2/26/ Gauss-Siedel Method Electrical Engineering Majors Authors: Autar Kaw
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
10.4 Matrix Algebra. 1. Matrix Notation A matrix is an array of numbers. Definition Definition: The Dimension of a matrix is m x n “m by n” where m =
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Chapter 7 Matrix Mathematics
LECTURE 11: Advanced Discriminant Analysis
A Fast Trust Region Newton Method for Logistic Regression
Classification of unlabeled data:
Latent Variables, Mixture Models and EM
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
5.4 General Linear Least-Squares
A Fast Fixed-Point Algorithm for Independent Component Analysis
Mathematical Solution of Non-linear equations : Newton Raphson method
Independent Factor Analysis
1.8 Matrices.
1.8 Matrices.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Computer Animation Algorithms and Techniques
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Newton Method for the ICA Mixture Model Jason A. Palmer1 Scott Makeig1 Ken Kreutz-Delgado2 Bhaskar D. Rao2 1 Swartz Center for Computational Neuroscience 2 Dept of Electrical and Computer Engineering University of California San Diego, La Jolla, CA

Introduction Want to model sensor array data with multiple independent sources — ICA Non-stationary source activity — mixture model Want the adaptation to be computationally efficient — Newton method

Outline ICA mixture model Basic Newton method Positive definiteness of Hessian when model source densities are true source densities Newton for ICA mixture model Example applications to analysis of EEG

ICA Mixture Model—toy example 3 models in two dimensions, 500 points per model Newton method converges < 200 iterations, natural gradient fails to converge, has difficulty on poorly conditioned models

ICA Mixture Model Want to model observations x(t), t = 1,…,N, different models “active” at different times Bayesian linear mixture model, h = 1, . . . , M : Conditionally linear given the model, : Samples are modeled as independent in time:

Source Density Mixture Model Each source density mixture component has unknown location, scale, and shape: Generalizes Gaussian mixture model, more peaked, heavier tails

ICA Mixture Model—Invariances The complete set of parameters to be estimated is: h = 1, . . ., M, i = 1, . . ., n, j = 1, . . ., m Invariances: W row norm/source density scale and model centers/source density locations:

Basic ICA Newton Method Transform gradient (1st derivative) of cost function using inverse Hessian (2nd derivative) Cost function is data log likelihood: Gradient: Natural gradient (positive definite transform):

Newton Method – Hessian Take derivative of (i,j)th element of gradient with respect to (k,l)th element of W : This defines a linear transform : In matrix form, this is:

Newton Method – Hessian To invert: rewrite the Hessian transformation in terms of the source estimates: Define , , : Want to solve linear equation :

Newton Method – Hessian The Hessian transformation can be simplified using source independence and zero mean: This leads to 2x2 block diagonal form:

Newton Direction Invert Hessian transformation, evaluate at gradient: Leads to the following equations: Calculate the Newton direction:

Positive Definiteness of Hessian Conditions for positive definiteness: Always true for true when model source densities match true densities: 1) 2) 3)

Newton for ICA Mixture Model Similar derivation applies to ICA mixture model:

Convergence Rates Convergence is really much faster than natural gradient. Works with step size 1! Need correct source density model log likelihood iteration iteration

Segmentation of EEG experiment trials 3 models 4 models trial trial time time log likelihood log likelihood iteration iteration

Applications to EEG—Epilepsy 1 model 5 models log likelihood time time log likelihood difference from single model time

Conclusion We applied method of Amari, Cardoso and Laheld, to formulate a Newton method for the ICA mixture model Arbitrary source densities modeled with non-gaussian source mixture model Non-stationarity modeled with ICA mixture model (multiple mixing matrices learned) It works! Newton method is substantially faster (superlinear). Also Newton can converge when Natural Gradient fails

Code There is Matlab code available!! Generate toy mixture model data for testing Full method implemented: mixture sources, mixture ICA, Newton Extended version of paper in preparation, with derivation of mixture model Newton updates Download from: http://sccn.ucsd.edu/~jason

Acknowledgements Thanks to Scott Makeig, Howard Poizner, Julie Onton, Ruey-Song Hwang, Rey Ramirez, Diane Whitmer, and Allen Gruber for collecting and consulting on EEG data Thanks to Jerry Swartz for founding and providing ongoing support the Swartz Center for Computational Neuroscience Thanks for your attention!

Newton for ICA Mixture Model