Jinbo Bi Joint work with Tingyang Xu, Chi-Ming Chen, Jason Johannesen

Slides:



Advertisements
Similar presentations
Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Chapter 2: Lasso for linear models
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
More MR Fingerprinting
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
NOTES ON MULTIPLE REGRESSION USING MATRICES  Multiple Regression Tony E. Smith ESE 502: Spatial Data Analysis  Matrix Formulation of Regression  Applications.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Ordinary least squares regression (OLS)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to Multilevel Modeling Using SPSS
Online Dictionary Learning for Sparse Coding International Conference on Machine Learning, 2009 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
EE369C Final Project: Accelerated Flip Angle Sequences Jan 9, 2012 Jason Su.
Fast and incoherent dictionary learning algorithms with application to fMRI Authors: Vahid Abolghasemi Saideh Ferdowsi Saeid Sanei. Journal of Signal Processing.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Center for Evolutionary Functional Genomics Large-Scale Sparse Logistic Regression Jieping Ye Arizona State University Joint work with Jun Liu and Jianhui.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin, 2 University of.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization Julio Martin Duarte-Carvajalino, and Guillermo Sapiro.
Multilevel Modeling: Other Topics David A. Kenny January 7, 2014.
Full-rank Gaussian modeling of convolutive audio mixtures applied to source separation Ngoc Q. K. Duong, Supervisor: R. Gribonval and E. Vincent METISS.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1 Optimal design which are efficient for lack of fit tests Frank Miller, AstraZeneca, Södertälje, Sweden Joint work with Wolfgang Bischoff, Catholic University.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Ultra-high dimensional feature selection Yun Li
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
CSE 4705 Artificial Intelligence
Neural networks and support vector machines
Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
CH 5: Multivariate Methods
Boosting and Additive Trees
Machine Learning Basics
Pattern Classification, Chapter 3
Probabilistic Models for Linear Regression
Statistical Learning Dong Liu Dept. EEIS, USTC.
Advanced applications of the GLM: Cross-frequency coupling
Sparse Learning Based on L2,1-norm
Learning Theory Reza Shadmehr
CRISP: Consensus Regularized Selection based Prediction
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Sparse Principal Component Analysis
Multivariate Methods Berlin Chen
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Contrasts & Statistical Inference
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Advanced applications of the GLM: Cross-frequency coupling
Stochastic Methods.
Presentation transcript:

Spatio-temporal modeling of EEG features for understanding working memory Jinbo Bi Joint work with Tingyang Xu, Chi-Ming Chen, Jason Johannesen University of Connecticut Yale University

Outline The main technical idea EEG data analysis problem The proposed approach – GEE + regularization Our algorithm Preliminary experimental results Summary Before I get to the details of our problem and the proposed approach, I would like to introduce our main technical idea, which is fairly straightforward. Then we introduce the problem. Our approach is a bit more complicated because it is built on top of the generalized estimating equations. We briefly review the algorithm we used to solve this formulation, then show some preliminary results and then give a summary.

Main idea Variables are observed/measured at different locations and different time points Temporal line Spatial line The features

Main idea If we build a linear model using all features, the coefficients in the model form another matrix, we want it to have sparsity patterns Temporal line Spatial line The coefficient matrix W The features X

Main idea The idea is to decompose the W matrix into a summation of two matrices of the same dimension, and then impose different sparsity-inducing regularizers. = + W U V

Main idea For instance, the widely-used L1,2 matrix norm computes the summation of the L2 norm of individual row vectors in a matrix, and enforces the row sparsity of a matrix U V

EEG data analysis problem EEG recording provides a powerful method to study neural dynamics of human cognition (e.g., working memory) EEG recording Montage An illustration of a BCI program

EEG data analysis problem Stenberg tests Baseline Encoding Retention Retrieval A sample trial of Sternberg experiment depicting stages of information processing. Time Courses are extracted for EEG analysis based on memory span of 4 letters The outcome is if a person responded correctly (-1 incorrect).

EEG data analysis problem Our data 37 schizophrenia, 6 healthy controls Each individual has 90 trials of Stenberg in each of the 3 sessions Baseline Encoding Retention Retrieval Amplitudes of EEG in 5 frequency bands: delta, theta, alpha, beta, and gamma Fz Cz Oz

The proposed approach Our method combines the generalized estimating equation and the proposed regularizer Generalized estimating equations is a set of methods that expand the generalized linear models, but estimate both expectation and the covariance of the outcome The parameters are W and α

The proposed approach The parameters W and α are estimated by minimizing the so-called deviance function, i.e., Deviance(W,α) – the difference between the likelihood of observing the actual y and the likelihood of observing the mean The deviance function is not explicit for an arbitrary distribution, but its gradient can be computed for the exponential families We propose

Our FISTA-based algorithm We solve our problem using FISTA – fast iterative shrinkage thresholding algorithm We solve alternatively between (U,V) and α We use a FISTA algorithm to solve for (U,V) which is an alternating proximal gradient method that solves U and V alternatively using proximal operators We use the original GEE updating formula to update α because when U and V are fixed, the proposed formulation is exactly same as GEE formulation when W is fixed

Our FISTA-based algorithm The algorithm globally converges to an optimal solution of the problem with a convergence rate of quadratic order Under some regularity conditions, optimizing the proposed formula yields an asymptotically consistent and normally distributed estimator where

Preliminary experimental results We preliminarily tested this algorithm on EEG feature analysis – to predict if a person answers the Stenberg test correctly (0) and incorrectly (1) based on the EEG features 37 schizophrenia and 6 healthy controls, separate classifiers for schizophrenia and health controls After data cleaning, each patient on average has 83 trials and incorrect answer rate is 27.2% Each health control on average has 87 trials, and incorrect answer rate is 14.7% Using multiple three-fold cross validation to tune parameters λ’s

Preliminary experimental results We first compared with the classic GEE method We report the area under the ROC curves (AUCs) Our method outperformed GEE consistently in four different kinds of covariance assumptions

Preliminary experimental results We demonstrate the selected features and stages in the classifiers The strongest feature is the engagement of fontal alpha during encoding and retention, which replicate an early report that used a separate sample. Schizophrenia: Rows are features; columns are stages of information processing

Preliminary experimental results We demonstrate the selected features and stages in the classifiers Based on these models, the two groups showed remarkably different patterns. EEG activity in higher frequency bands appear to be associated with the incorrect trial response for SZ, but not with that of health controls. For healthy controls, engaging the low frequency activity, especially delta, was associated with incorrent responses for healthy controls. The important stages were identified the same for schizophrenia and controls. Healthy controls: Rows are features; columns are stages of information processing

Summary We used a new learning formulation to select EEG features along the temporal and spatial dimensions This new method also simultaneously models the sample correlation via the GEE A new accelerated gradient descent algorithm can efficiently solve the related optimization problem Preliminary results show that the EEG features selected between SZ and HC are rather different Future work … The take-home message is the following.

References Chen et al, Gaba level, gamma oscillation, and working memory performance in schizophrenia, NeuroImage: Clinical, 4:531-539, 2014. Beck et al, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Science, 2(1):183-202, 2009. Xu et al, Longitudinal LASSO: jointly learning features and temporal contingency for outcome prediction, to appear in ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2015. http://www.labhealthinfo.uconn.edu/ Thank you!! Here are some references. Thank you for your attention.