Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Pattern Recognition and Machine Learning
Chapter 5 Multiple Linear Regression
Face Recognition and Biometric Systems Eigenfaces (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Pattern Recognition and Machine Learning
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
SVM—Support Vector Machines
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
PCA + SVD.
Chapter 2: Lasso for linear models
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Kernels CMPUT 466/551 Nilanjan Ray. Agenda Kernel functions in SVM: A quick recapitulation Kernels in regression Kernels in k-nearest neighbor classifier.
Support Vector Machine
The Kernel Trick Kenneth D. Harris 3/6/15.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Principal Component Analysis
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
INSTANCE-BASE LEARNING
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Classification and Prediction: Regression Analysis
An Introduction to Support Vector Machines Martin Law.
TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Summarized by Soo-Jin Kim
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
An Introduction to Support Vector Machines (M. Law)
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
LECTURE 11: Advanced Discriminant Analysis
Boosting and Additive Trees (2)
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
CH 5: Multivariate Methods
Principal Component Analysis
What is Regression Analysis?
Generally Discriminant Analysis
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Multivariate Methods Berlin Chen
Feature Selection Methods
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Once Size Does Not Fit All: Regressor and Subject Specific Techniques for Predicting Experience in Natural Environments Denis Chigirev, Chris Moore, Greg Stephens & The Princeton EBC Team

How do we learn in a very high dimensional setting (~35K voxels) ? Look for linear projection(s): linear regression, ridge regression, linear SVM How to control for complexity? Loss function: quadratic linear, hinge Prior (regularization) Create a “look-up table”: nonlinear kernel methods, kernel ridge regression, RKHS, GP, nonlinear SVM Need similarity measure between brain states (i.e. kernel) & regularization Assumes “clustering” of similar states Advantage: pools together many weak signals Assumes regressor continuity along paths of data points weights similarity measure LINEARNONLINEAR

How do we learn in a very high dimensional setting (~35K voxels) ? Focus on informative areas: choose voxels by correlation thresholding, searchlight Look for global modes: whole brain, PCA, euclidean distance kernel, searchlight kernel without thresholding Advantage: improves stability by pooling over larger areas Disadvantage: correlated noisy areas that do not carry any information may bias the predictor Advantage: ignore areas that are mostly noise Assumes that information is localized, and feature selection method is stable LOCALGLOBAL

Different methods emphasize different aspects of the learning problem LinearNonlinear LocalCorr. thresh& ridge, searchlight & ridge Searchlight RKHS GlobalPCA & ridgeEuclidean RKHS

Ridge Regression using ALL voxels Difference of means (centroids): Linear regression solution: Ridge regression solution: Regularization allows to use all ~ 30K voxels Centroids are well estimated (1 st order statistic), but covariance matrix is 2 nd order, therefore requires regularization

Whole Brain Ridge Regression Keeping only large eigenvalues of covariance matrix (i.e. PCA-type compexity control) is MUCH LESS effective than ridge regularization.

Reproducing Kernel Hilbert Space (RKHS) T. Poggio Instead of looking for linear projections (ridge regression, SVM w/ linear kernel), use the measure of similarity between brain states to project the new brain state onto existing ones in feature space. where (number of TRs) learn “support” coefficients by solving this equation, where represents regularization in feature space. (aka Kernel Ridge Regression, if use gaussian kernel recover mean GP solution) We choose where is the distance between brain states. We use Euclidean distance and searchlight distance.

This framework allows the similarity measure between different brain states to be tested for their use in prediction dataprediction How similar are the brain states? Learning algorithm (SVM, RKHS, etc. – choice of regularization and loss ) (euclidean distance, mahalanobis, searchlight, earth movers?) This allows to assess independently the quality of brain state similarity measure and the quality of the learning procedure. Euclidean measure (default), in practice, performs relatively well.

Basics of Searchlight which pair of brain states is further apart? Mahalanobis distance: more different less different Problem: amplifies poorly estimated dimension for whole brain states. Solution: apply locally to 3x3x3 supervoxel and then sum individual contributions here is a 3x3x3 “supervoxel”. Then the distance between brain states can be computed as a weighted average: We used to find that this solution is now self-regularizing, i.e. one can take the complexity penalty to zero.

Why might searchlight help? (hint: stability!) m2 m1 m2 voxel correlation with feature (movie1 & movie2) Threshold voxel correlation with feature (movie1 & movie2) searchlight correlation with feature (movie1 & movie2) Threshold searchlight corr with feature (movie1 & movie2) m1 The projection learned by linear ridge is only as good as the stability of the underlying voxel correlations with the regressor. Searchlight distance versus Euclidean distance, tested in RKHS

Different methods emphasize different aspects of the learning problem LinearNonlinear Local Correlation thresholding, ridge complexity control (Chigirev et al. PBAIC 2006, implemented as part of a public MVPA matlab toolbox) Weighted searchlight RKHS allows to zoom on areas of interest – future work! Global SVD trick allows to compute 30k x 30k covariance matrix, ridge regularization outperforms PCA as complexity control. Eucledian RKHS (Kernel Ridge) may be slightly improved by considering global searchlight kernel as similarity measure, has remarkable self-regularization property.

I would like to thank my collaboraters: Chris Moore*, Greg Stephens, Greg Detre, Michael Bannert as well as Ken Norman and Jon Cohen for supporting Princeton EBC Team.