Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment.

Slides:



Advertisements
Similar presentations
Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.
Advertisements

CS188: Computational Models of Human Behavior
Supervised Learning Recap
Introduction of Probabilistic Reasoning and Bayesian Networks
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.
Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Hidden Process Models: Decoding Overlapping Cognitive States with Unknown Timing Rebecca A. Hutchinson Tom M. Mitchell Carnegie Mellon University NIPS.
1 Classifying Instantaneous Cognitive States from fMRI Data Tom Mitchell, Rebecca Hutchinson, Marcel Just, Stefan Niculescu, Francisco Pereira, Xuerui.
Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Multi-voxel Pattern Analysis (MVPA) and “Mind Reading” By: James Melrose.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department.
Scalable Text Mining with Sparse Generative Models
Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer.
Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.
Radial Basis Function Networks
General Linear Model & Classical Inference
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
FMRI Methods Lecture7 – Review: analyses & statistics.
Current work at UCL & KCL. Project aim: find the network of regions associated with pleasant and unpleasant stimuli and use this information to classify.
Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Contrasts & Inference - EEG & MEG Himn Sabir 1. Topics 1 st level analysis 2 nd level analysis Space-Time SPMs Time-frequency analysis Conclusion 2.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Pattern Classification of Attentional Control States S. G. Robison, D. N. Osherson, K. A. Norman, & J. D. Cohen Dept. of Psychology, Princeton University,
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
C O R P O R A T E T E C H N O L O G Y Information & Communications Neural Computation Machine Learning Methods on functional MRI Data Siemens AG Corporate.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.
Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Mixture Models with Adaptive Spatial Priors Will Penny Karl Friston Acknowledgments: Stefan Kiebel and John Ashburner The Wellcome Department of Imaging.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Bayesian Perception.
Hidden Markov Models BMI/CS 576
Lecture 1.31 Criteria for optimal reception of radio signals.
Classification of fMRI activation patterns in affective neuroscience
Data Mining Lecture 11.
Hidden Process Models with applications to fMRI data
Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox
Revealing priors on category structures through iterated learning
EE513 Audio Signals and Systems
Learning Probabilistic Graphical Models Overview Learning Problems.
Bayesian Methods in Brain Imaging
School of Computer Science, Carnegie Mellon University
Learning Theory Reza Shadmehr
Speech recognition, machine learning
Mixture Models with Adaptive Spatial Priors
Probabilistic Modelling of Brain Imaging Data
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Speech recognition, machine learning
Presentation transcript:

Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment of the Speaking Requirement Carnegie Mellon University Computer Science Department

2 Introduction Hidden Process Models (HPMs): –A new probabilistic model for time series data. –Designed for data generated by a collection of latent processes. Potential domains: –Biological processes (e.g. synthesizing a protein) in gene expression time series. –Human processes (e.g. walking through a room) in distributed sensor network time series. –Cognitive processes (e.g. making a decision) in functional Magnetic Resonance Imaging time series.

3 t t d1 … dN … Process 1: t t d1 … dN … Process P: … d1 … dN Prior knowledge: An instance of Process 1 begins in this window. An instance of Process P begins in this window. An instance of either Process 1 OR Process P begins in this window. There are a total of 6 processes in this window of data.

4 t t d1 … dN … Process 1: t t d1 … dN … Process P: … d1 … dN Process 1 timings: Process P timings: … More questions: -Can we learn the parameters of these processes from the data (even when we don’t know when they occur)? -Would a different set of processes model the data better?

5 Simple Case: Known Timing If we know which processes occur when, we can estimate their shapes with the general linear model. The timings generate a convolution matrix X: … … … p1p3 t=1 t=2 t=3 t=4 … P T p2

6 Simple Case: Known Timing T D = … … … p1p3p2 p1 p3 p2 D W(1) W(2) W(3) Y

7 Challenge: Unknown Timing T D = … … … p1p3p2 p1 p3 p2 D W(1) W(2) W(3) Y Uncertainty about the processes essentially makes the convolution matrix a random variable.

8 Our Approach Model of processes contains a probability distribution over when it occurs relative to a known event (called a timing landmark). When predicting the underlying processes, use prior knowledge about timing to limit the hypothesis space.

9 fMRI Data … Signal Amplitude Time (seconds) Hemodynamic Response Neural activity Features: 10,000 voxels, imaged every second. Training examples: trials (task repetitions).

10

11 Study: Pictures and Sentences Task: Decide whether sentence describes picture correctly, indicate with button press. 13 normal subjects, 40 trials per subject. Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. Images are acquired every 0.5 seconds. Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest

12 Goals for fMRI To track cognitive processes over time. –Estimate process hemodynamic responses. –Estimate process timings. Allowing processes that do not directly correspond to the stimuli timing is a key contribution of HPMs! To compare hypotheses of cognitive behavior.

13 HPM Modeling Assumptions Model latent time series at process-level. Process instances share parameters based on their process types. Use prior knowledge from experiment design. Sum process responses linearly.

14 Process 1: ReadSentence Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } One configuration c of process instances  1,  2, …  k : (with prior  c ) Predicted mean: Input stimulus  : 11  Timing landmarks : 2 1 22 Process instance:  2 Process h: 2 Timing landmark: 2 Offset O: 1 (Start time: 2 + O) sentence picture v1 v2 Process 2: ViewPicture Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } v1 v2 Processes of the HPM: v1 v2 + N(0,  1 ) + N(0,  2 )

15 HPM Formalism HPM = H =, a set of processes (e.g. ReadSentence) h =, a process W = response signature d = process duration  = allowable offsets  = multinomial parameters over values in  C =, a set of configurations c =, a set of process instances  =, a process instance (e.g. ReadSentence(S1)) h = process ID = timing landmark (e.g. stimulus presentation of S1) O = offset (takes values in  h )  =, priors over C  =, standard deviation for each voxel

16 HPMs: the graphical model Offset o Process Type h Start Time s observed unobserved Timing Landmark Y t,v  1,…,  k t=[1,T], v=[1,V]  The set C of configurations constrains the joint distribution on {h(k),o(k)}  k. Configuration c

17 Encoding Experiment Design Configuration 1: Input stimulus  : Timing landmarks : 2 1 ViewPicture = 2 ReadSentence = 1 Decide = 3 Configuration 2: Configuration 3: Configuration 4: Constraints Encoded: h(  1 ) = {1,2} h(  2 ) = {1,2} h(  1 ) != h(  2 ) o(  1 ) = 0 o(  2 ) = 0 h(  3 ) = 3 o(  3 ) = {1,2} Processes:

18 Inference Over configurations Choose the most likely configuration, where: C=configuration, Y=observed data,  =input stimuli, HPM=model

19 Learning Parameters to learn: –Response signature W for each process –Timing distribution  for each process –Standard deviation  for each voxel Expectation-Maximization (EM) algorithm to estimate W and . –E step: estimate a probability distribution over configurations. –M step: update estimates of W (using reweighted least squares) and  (using standard MLEs) based on the E step. –After convergence, use standard MLEs for 

20 Uncertain Timings Convolution matrix models several choices for each time point … … … … … PD t=1 t=2 … t=18 … T’>T S Configurations for each row: 3,4 1,2 3,4 1,2 … …

21 Uncertain Timings … … … PD e1 e2 e3 e4 … S Y = W 3,4 1,2 3,4 1,2 … Configurations:Weights: e1 = P(C=3|Y,W old,  old,  old ) + P(C=4|Y,W old,  old,  old ) Weight each row with probabilities from E-step.

22 Learned HPM with 3 processes (S,P,D), and d=13sec. P P SS D? observed Learned models: S P D D start time chosen by program as t+18 predicted P P SS D D D?

23 ViewPicture in Visual Cortex Offset  = P(Offset)

24 ReadSentence in Visual Cortex Offset  = P(Offset)

25 Decide in Visual Cortex Offset  = P(Offset)

26 ViewPicture

27 ReadSentence

28 Decide Seconds following the second stimulus Multinomial probabilities on these time points

29 Comparing Models HPMAvg. Test Set LL PS * 10^6 PSD * 10^6 PS+S-D * 10^6 PSD+D * 10^6 PSDB * 10^6 PSDyDn * 10^6 PSDyDnDc** * 10^6 PSDyDnDcB * 10^6 5-fold cross-validation, 1 subject P = ViewPicture S = ReadSentence S+ = ReadAffirmativeSentence S- = ReadNegatedSentence D = Decide D+ = DecideAfterAffirmative D- = DecideAfterNegated Dy = DecideYes Dn = DecideNo Dc = DecideConfusion B = Button ** - This HPM can also classify Dy vs. Dn with 92.0% accuracy. GNBC gets 53.9%. (using the window from the second stimulus to the end of the trial)

30 Are we learning the right number of processes? Use synthetic data where we know ground truth. –Generate training and test sets with 2/3/4 processes. –Train HPMs with 2/3/4 processes on each. –For each test set, select the HPM with the highest data log likelihood. Number of processes in the training and test data Number of times the correct number of processes was chosen for the test set 25/5 3 44/5 Total:14/15 = 93.3%

31 Related Work fMRI –General Linear Model (Dale99) Must assume timing of process onset to estimate hemodynamic response. –Computer models of human cognition (Just99, Anderson04) Predict fMRI data rather than learning parameters of processes from the data. Machine Learning –Classification of windows of fMRI data (Cox03, Haxby01, Mitchell04) Does not typically model overlapping hemodynamic responses. –Dynamic Bayes Networks (Murphy02, Ghahramani97) HPM assumptions/constraints are difficult to encode in DBNs.

32 Future Work Incorporate spatial prior knowledge. E.g. share parameters across voxels (extending Niculescu05). Smooth hemodynamic responses (e.g. Boynton96). Improve algorithm complexities. Apply to open cognitive science problems.

33 Conclusions Take-away messages: –HPMs are a probabilistic model for time series data generated by a collection of latent processes. –In the fMRI domain, HPMs can simultaneously estimate the hemodynamic response and localize the timing of cognitive processes.

34 References John R. Anderson, Daniel Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, Geoffrey M. Boynton, Stephen A. Engel, Gary H. Glover, and David J. Heeger. Linear systems analysis of functional magnetic resonance imaging in human V1. The Journal of Neuroscience, 16(13):4207–4221, David D. Cox and Robert L. Savoy. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, 19:261–270, Anders M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109–114, Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245–275, James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jennifer L. Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425–2430, September Marcel Adam Just, Patricia A. Carpenter, and Sashank Varma. Computational modeling of high-level cognition and brain function. Human Brain Mapping, 8:128–136, modeling4CAPS.htm. Tom M. Mitchell et al. Learning to decode cognitive states from brain images. Machine Learning, 57:145–175, Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M. Jordan, November Radu Stefan Niculescu. Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks. PhD thesis, Carnegie Mellon University, July CMU-CS