Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

MEG/EEG Inverse problem and solutions In a Bayesian Framework EEG/MEG SPM course, Bruxelles, 2011 Jérémie Mattout Lyon Neuroscience Research Centre ? ?
The General Linear Model Or, What the Hell’s Going on During Estimation?
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Supervised Learning Recap
What is Statistical Modeling
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Visual Recognition Tutorial
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Assuming normally distributed data! Naïve Bayes Classifier.
Bayesian Image Super-resolution, Continued Lyndsey C. Pickup, David P. Capel, Stephen J. Roberts and Andrew Zisserman, Robotics Research Group, University.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Evaluating Hypotheses
Hidden Process Models: Decoding Overlapping Cognitive States with Unknown Timing Rebecca A. Hutchinson Tom M. Mitchell Carnegie Mellon University NIPS.
1 Classifying Instantaneous Cognitive States from fMRI Data Tom Mitchell, Rebecca Hutchinson, Marcel Just, Stefan Niculescu, Francisco Pereira, Xuerui.
Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.
Multi-voxel Pattern Analysis (MVPA) and “Mind Reading” By: James Melrose.
Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department.
Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment.
Experimental Evaluation
Scalable Text Mining with Sparse Generative Models
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
Modeling Count Data over Time Using Dynamic Bayesian Networks Jonathan Hutchins Advisors: Professor Ihler and Professor Smyth.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Radial Basis Function Networks
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
MVPD – Multivariate pattern decoding Christian Kaul MATLAB for Cognitive Neuroscience.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
Lecture 2: Statistical learning primer for biologists
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Mixture Models with Adaptive Spatial Priors Will Penny Karl Friston Acknowledgments: Stefan Kiebel and John Ashburner The Wellcome Department of Imaging.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Bayesian Perception.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Course: Autonomous Machine Learning
Dynamical Statistical Shape Priors for Level Set Based Tracking
Hidden Process Models with applications to fMRI data
Probabilistic Models with Latent Variables
Learning Probabilistic Graphical Models Overview Learning Problems.
Bayesian Methods in Brain Imaging
School of Computer Science, Carnegie Mellon University
Learning Theory Reza Shadmehr
Mixture Models with Adaptive Spatial Priors
Probabilistic Modelling of Brain Imaging Data
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer Research Center

2 Introduction Hidden Process Models (HPMs): –A probabilistic model for time series data. –Designed for data generated by a collection of latent processes. Motivating problem: –Modeling mental processes (e.g. making a decision) in functional Magnetic Resonance Imaging time series. Characteristics of potential domains: –Processes with spatial-temporal signatures. –Uncertainty about temporal location of processes. –High-dimensional, sparse, noisy. 2

3 fMRI Data … Signal Amplitude Time (seconds) Hemodynamic Response Neural activity Features: 10,000 voxels, imaged every second. Training examples: trials (task repetitions).

4

5 Study: Pictures and Sentences Task: Decide whether sentence describes picture correctly, indicate with button press. 13 normal participants, 40 trials per participant. Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. Images are acquired every 0.5 seconds. Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest Keller01

6 Motivation To track mental processes over time. –Estimate process hemodynamic responses. –Estimate process timings. Allowing processes that do not directly correspond to the stimuli timing is a key contribution of HPMs! To compare hypotheses of cognitive behavior.

7 Related Work fMRI –General Linear Model (Dale99) Must assume timing of process onset to estimate hemodynamic response. –Computer models of human cognition (Just99, Anderson04) Predict fMRI data rather than learning parameters of processes from the data. Machine Learning –Classification of windows of fMRI data (overview in Haynes06) Does not typically model overlapping hemodynamic responses. –Dynamic Bayes Networks (Murphy02, Ghahramani97) HPM assumptions/constraints can be encoded by extending factorial HMMs with links between the Markov chains. 7

8 Outline Overview of HPMs –Generative model –Formalism –Graphical model –Algorithms Synthetic data experiments –Accurately estimate parameters –Choose correct model from alternatives with different numbers of processes Real data experiments –Evaluation methodology –Extensions to standard HPMs –Model comparison via classification accuracy and data log-likelihood Visualizing HPMs Conclusions –Summary of contributions –Future work

9 Process 1: ReadSentence Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } One configuration c of process instances i 1, i 2, … i k : Predicted mean: Input stimulus  : i1i1  Timing landmarks : 2 1 i2i2 Process instance: i 2 Process  : 2 Timing landmark: 2 Offset O: 1 (Start time: 2 + O) sentence picture v1 v2 Process 2: ViewPicture Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } v1 v2 Processes of the HPM: v1 v2 + N(0,  1 2 ) + N(0,  2 2 )

10 HPM Formalism HPM =  =, a set of processes (e.g. ReadSentence)  =, a process W = response signature d = process duration  = allowable offsets  = multinomial parameters over values in  C =, a set of possible configurations c =, a set of process instances i =, a process instance (e.g. ReadSentence(S1))  = process ID = timing landmark (e.g. stimulus presentation of S1) O = offset (takes values in   ) C= a latent variable indicating the correct configuration  =, standard deviation for each voxel 10

11 HPMs: the graphical model Offset o Process Type  Start Time s observed unobserved Timing Landmark Y t,v i 1,…,i I t=[1,T], v=[1,V]  The set C of configurations constrains the joint distribution on {  (k),o(k)}  k. Configuration c 11

12 Encoding Experiment Design Configuration 1: Input stimulus  : Timing landmarks : 2 1 ViewPicture = 2 ReadSentence = 1 Decide = 3 Configuration 2: Configuration 3: Configuration 4: Constraints Encoded:  (i 1 ) = {1,2}  (i 2 ) = {1,2}  (i 1 ) !=  (i 2 ) o(i 1 ) = 0 o(i 2 ) = 0  (i 3 ) = 3 o(i 3 ) = {1,2} Processes: 12 ViewPicture = 2 ReadSentence = 1 Decide = 3 ViewPicture = 2 ReadSentence = 1

13 Inference Over C, the latent indicator of the correct configuration Choose the most likely configuration C n for each trial (n=[1,N]) where: 13

14 Learning Parameters to learn: –Response signature W for each process –Timing distribution  for each process –Standard deviation  for each voxel Expectation-Maximization (EM) algorithm: –E step: estimate the probability distribution over C. –M step: update estimates of W (using reweighted least squares), , and  (using standard MLEs) based on the E step. 14

15 Synthetic Data Approximate timings for ReadSentence, ViewPicture, Decide

16 Synthetic Experiments MSE of responses (averaged over all voxels, processes, timepoints) = MSE of timing parameters (averaged over all processes, offsets) ~0.01. Estimated standard deviations are and (true value is 2.5).

17 Model Selection All numbers *10^5

18 Synthetic Data Results Tracking processes: HPMs can recover the parameters of the model used to generate the data. Comparing models: HPMs can use held-out data log-likelihood to identify the model with the correct number of latent processes.

19 Evaluating HPMs on real data No ground truth for the problems HPMs were designed for. Can use data log-likelihood to compare –Baseline: average of all training trials Can do classification of known entities (like the stimuli), but HPMs are not optimized for this.

20 Models HPM-GNB: ReadSentence and ViewPicture, duration=8sec. (no overlap) –an approximation of Gaussian Naïve Bayes classifier, with HPM assumptions and noise model HPM-2: ReadSentence and ViewPicture, duration=12sec. (temporal overlap) HPM-3: HPM-2 + Decide (offsets=[0,7] images following second stimulus) HPM-4: HPM-3 + PressButton (offsets = {-1,0} following button press)

21 Configurations for HPM-3

22 Held-out log-likelihood (improvement over baseline) 1000 most active voxels per participant 5-fold cross-validation per participant; mean over 13 participants. Standard HPMs HPM-GNB-293 HPM HPM HPM

23 Extension 1: Regularization Subtract a term from the objective function penalizing deviations from: –Temporal smoothness –Spatial smoothness (based on adjacency matrix A) –Other possibilities: spatial sparsity and spatial priors.

24 Extension 2: Basis Functions Re-parameterize process response signatures in terms of a basis set

25 The basis set Generated as in Hossein-Zadeh03 –Create Q (10,000 x 24): 10,000 realizations of 24 timepoints of h(t) varying a in [0.05,0.21] and b in [3,7] Basis set = first 3 principal components of Q’Q

26 Held-out log-likelihood (improvement over baseline) 1000 most active voxels per participant 5-fold cross-validation per participant; mean over 13 participants. StandardRegularizedBasis functions HPM-GNB HPM HPM HPM

27 Classification Accuracies ReadSentence vs. ViewPicture for first 2 processes 1000 most active voxels per participant, 5-fold cross- validation per participant; mean over 13 participants. Standard Regularized Basis functions HPM-GNB HPM HPM HPM GNB = 93.1

28 Interpretation and Visualization Focus in on HPM-3 for a single participant, trained on all trials, all voxels. Timing for the third (Decide) process in HPM-3: (Values have been rounded.) Offset: Stand Reg Basis

29 Standard 29

30 Regularized 30

31 Basis functions 31

32 Time courses Standard Regularized Basis functions 32

33 Trial 1: observed RDLPFC Z-slice 5 Trial 1: predicted Standard HPM Full brain, trained on all trials

34 ViewPicture ReadSentence Decide Standard HPM

35 Trial 1: observed RDLPFC Z-slice 5 Trial 1: predicted Regularized HPM Full brain, trained on all trials

36 ViewPicture ReadSentence Decide Regularized HPM

37 Trial 1: observed RDLPFC Z-slice 5 Trial 1: predicted Basis function HPM Full brain, trained on all trials

38 ViewPicture ReadSentence Decide Basis Function HPM

39 Caveats While visualizing these parameters can help us understand the model, it is important to remember that they are specific to the design choices of the particular HPM. These are parameters – not the results of statistical significance tests.

40 Summary of Results Synthetic data results –HPMs can recover the parameters of the model used to generate the data in an ideal situation. –We can use held-out data log-likelihood to identify the model with the correct number of latent processes. Real data results –Standard HPMs can overfit on real fMRI data. –Regularization and HPMs parameterized with basis functions consistently outperform the baseline in terms of held-out data log-likelihood. –Example comparison of 4 models.

41 Contributions Estimates for Decide! To our knowledge, HPMs are the first probabilistic model for fMRI data that can estimate the hemodynamic response for overlapping mental processes with unknown onset while simultaneously estimating a distribution over the timing of the processes.

42 Future Directions Combine regularization and basis functions. Develop a better noise model. Relax the linearity assumption. Automatically discover the number of latent processes. Learn process durations. Continuous offsets. Leverage DBN algorithms.

43 References John R. Anderson, Daniel Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, Anders M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109–114, Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245–275, John-Dylan Haynes and Geraint Rees. Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7:523–534, July Gholam-Ali Hossein-Zadeh, Babak A. Ardekani, and Hamid Soltanian-Zadeh. A signal subspace approach for modeling the hemodynamic response function in fmri. Magnetic Resonance Imaging, 21:835–843, Marcel Adam Just, Patricia A. Carpenter, and Sashank Varma. Computational modeling of high-level cognition and brain function. Human Brain Mapping, 8:128–136, modeling4CAPS.htm. Tim A. Keller, Marcel Adam Just, and V. Andrew Stenger. Reading span and the timecourse of cortical activation in sentence-picture verification. In Annual Convention of the Psychonomic Society, Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M. Jordan, November

44 Thank you!

45 Questions?

46 (end of talk)

47 ReadSentence Landmarks ReadSentence Process (  =1,  = {0}) ViewPicture Landmarks ViewPicture Process (  =2,  = {0}) Decide Landmarks Decide Process (  =3,  = {0,1,2}) fMRI I 3 (1 ) S 3 (1) S 3 (2) I 3 (2 ) I 3 (3 ) S 3 (3) Y3Y3

48 PressButton Landmarks PressButton Process (  =4,  = {0,1,2}) Decide Landmarks fMRI Decide Process (  =3,  = {0,1,2})