Hidden Process Models with applications to fMRI data

Slides:

Advertisements

Similar presentations

Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.

Advertisements

CS188: Computational Models of Human Behavior

Autonomic Scaling of Cloud Computing Resources

Supervised Learning Recap

Patch to the Future: Unsupervised Visual Prediction

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Inferring individual perceptual experience from MEG: Robust statistics approach Andrey Zhdanov 1,4, Talma Hendler 1,2, Leslie Ungerleider 3, Nathan Intrator.

1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.

Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.

1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Hidden Process Models: Decoding Overlapping Cognitive States with Unknown Timing Rebecca A. Hutchinson Tom M. Mitchell Carnegie Mellon University NIPS.

1 Classifying Instantaneous Cognitive States from fMRI Data Tom Mitchell, Rebecca Hutchinson, Marcel Just, Stefan Niculescu, Francisco Pereira, Xuerui.

Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Multi-voxel Pattern Analysis (MVPA) and “Mind Reading” By: James Melrose.

Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department.

Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment.

Scalable Text Mining with Sparse Generative Models

Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer.

Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

How To Do Multivariate Pattern Analysis

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

1 Blockwise Coordinate Descent Procedures for the Multi-task Lasso with Applications to Neural Semantic Basis Discovery ICML 2009 Han Liu, Mark Palatucci,

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Current work at UCL & KCL. Project aim: find the network of regions associated with pleasant and unpleasant stimuli and use this information to classify.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.

Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.

1 Bernard Ng 1, Arash Vahdat 2, Ghassan Hamarneh 3, Rafeef Abugharbieh 1 Contact 1 Biomedical Signal and Image Computing Lab,

Mixture Models with Adaptive Spatial Priors Will Penny Karl Friston Acknowledgments: Stefan Kiebel and John Ashburner The Wellcome Department of Imaging.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Bayesian Perception.

Lecture 1.31 Criteria for optimal reception of radio signals.

Who am I? Work in Probabilistic Machine Learning Like to teach 

The Linear Systems Approach

Online Multiscale Dynamic Topic Models

Multi-task learning approaches to modeling context-specific networks

Multi-Voxel Pattern Analyses MVPA

Classification of fMRI activation patterns in affective neuroscience

Data Mining Lecture 11.

Neural Language Model CS246 Junghoo “John” Cho.

Hidden Markov Models Part 2: Algorithms

Probabilistic Models with Latent Variables

EE513 Audio Signals and Systems

SPM2: Modelling and Inference

Bayesian Methods in Brain Imaging

Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1 Yuzhi Chen, Eyal Seidemann Neuron Volume.

School of Computer Science, Carnegie Mellon University

Learning Theory Reza Shadmehr

Machine Learning for Visual Scene Classification with EEG Data

Parametric Methods Berlin Chen, 2005 References:

Speech recognition, machine learning

Volume 22, Issue 18, Pages (September 2012)

Machine Learning: Lecture 6

Mixture Models with Adaptive Spatial Priors

Machine Learning: UNIT-3 CHAPTER-1

Probabilistic Modelling of Brain Imaging Data

Mathematical Foundations of BME

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Group DCM analysis for cognitive & clinical studies

Speech recognition, machine learning

Presentation transcript:

Hidden Process Models with applications to fMRI data Rebecca Hutchinson August 2, 2009 Joint Statistical Meetings, Washington DC Oregon State University

Introduction Hidden Process Models (HPMs): Example domain: A probabilistic model for time series data. Designed for data generated by a collection of latent processes. Example domain: Modeling cognitive processes (e.g. making a decision) in functional Magnetic Resonance Imaging time series. Characteristics of potential domains: Processes with spatial-temporal signatures. Uncertainty about temporal location of processes. High-dimensional, sparse, noisy.

fMRI Data Features: 5k-15k voxels, imaged every second. Hemodynamic Response Features: 5k-15k voxels, imaged every second. Training examples: 10-40 trials (task repetitions). Signal Amplitude … Neural activity Time (seconds)

Study: Pictures and Sentences Press Button View Picture Read Sentence Read Sentence Fixation View Picture Rest t=0 4 sec. 8 sec. Task: Decide whether sentence describes picture correctly, indicate with button press. 13 normal subjects, 40 trials per subject. Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. Images are acquired every 0.5 seconds.

Goals for fMRI To track cognitive processes over time. Estimate hemodynamic response signatures. Estimate process timings. Modeling processes that do not directly correspond to the stimuli timing is a key contribution of HPMs! To compare hypotheses of cognitive behavior.

v1 v1 v2 v2 1 2 1 2 v1 + N(0,s1) v2 + N(0,s2) Process 1: ReadSentence Response signature W: Duration d: 11 sec. Offsets W: {0,1} P(): {q0,q1} Process 2: ViewPicture Response signature W: Duration d: 11 sec. Offsets W: {0,1} P(): {q0,q1} Processes of the HPM: v1 v2 v1 v2 Input stimulus : sentence picture Timing landmarks : Process instance: 2 Process h: 2 Timing landmark: 2 Offset O: 1 (Start time: 2+ O) 1 2 One configuration c of process instances 1, 2, … k: 1 2  Predicted mean: v1 v2 + N(0,s1) + N(0,s2)

HPM Formalism HPM = <H,C,F,S> H = <h1,…,hH>, a set of processes (e.g. ReadSentence) h = <W,d,W,Q>, a process W = response signature d = process duration W = allowable offsets Q = multinomial parameters over values in W C = <c1,…, cC>, a set of possible configurations c = <p1,…,pL>, a set of process instances = <h,l,O>, a process instance (e.g. ReadSentence(S1)) h = process ID = timing landmark (e.g. stimulus presentation of S1) O = offset (takes values in Wh) C= a latent variable indicating the correct configuration S = <s1,…,sV>, standard deviation for each voxel

HPMs: the graphical model Configuration c Timing Landmark l The set C of configurations constrains the joint distribution on {h(k),o(k)} " k. Process Type h Offset o Start Time s S p1,…,pk observed unobserved Yt,v t=[1,T], v=[1,V]

Encoding Experiment Design Processes: Input stimulus : Constraints Encoded: h(p1) = {1,2} h(p2) = {1,2} h(p1) != h(p2) o(p1) = 0 o(p2) = 0 h(p3) = 3 o(p3) = {1,2} ReadSentence = 1 ViewPicture = 2 Timing landmarks : 1 2 Decide = 3 Configuration 1: Configuration 2: Configuration 3: Configuration 4:

Inference Over C, the latent indicator of the correct configuration Choose the most likely configuration, where: Y=observed data, D=input stimuli, HPM=model

Learning Parameters to learn: Response signature W for each process Timing distribution Q for each process Standard deviation s for each voxel Expectation-Maximization (EM) algorithm to estimate W and Q. E step: estimate a probability distribution over configurations. M step: update estimates of W (using reweighted least squares), Q, and s (using standard MLEs) based on the E step.

Process Response Signatures Standard: Each process has a matrix of parameters, one for each point in space and time for the duration of the response (e.g. 24). Regularized: Same as standard, but learned with penalties for deviations from temporal and/or spatial smoothness. Basis functions: Each process has a small number (e.g. 3) weights for each voxel that are combined with a basis to get the response.

Evaluation Select 1000 most active voxels. Compute improvement in test data log-likelihood as compared with predicting the mean training trial for all test trials (a baseline). 5 folds of cross-validation. Average over 13 subjects. Standard Regularized Basis functions HPM-GNB -293 2590 2010 HPM-2 -1150 3910 3740 HPM-3 -2000 4960 4710 HPM-4 -4490 4810 4770

Interpretation and Visualization Timing for the third (Decide) process in HPM-3: (Values have been rounded.) For each subject, average response signatures for each voxel over time, plot result in each spatial location. Compare time courses for the same voxel. Offset: 1 2 3 4 5 6 7 Stand. 0.3 0.08 0.1 0.05 0.2 0.15 Reg. Basis 0.5 0.03

Standard

Regularized

Basis functions

Time courses Basis functions Standard The basis set Regularized

Related Work fMRI Machine Learning General Linear Model (Dale99) Must assume timing of process onset to estimate hemodynamic response. Computer models of human cognition (Just99, Anderson04) Predict fMRI data rather than learning parameters of processes from the data. Machine Learning Classification of windows of fMRI data (Cox03, Haxby01, Mitchell04) Does not typically model overlapping hemodynamic responses. Dynamic Bayes Networks (Murphy02, Ghahramani97) HPM assumptions/constraints can be encoded by extending factorial HMMs with links between the Markov chains.

Conclusions Take-away messages: Future work: HPMs are a probabilistic model for time series data generated by a collection of latent processes. In the fMRI domain, HPMs can simultaneously estimate the hemodynamic response and localize the timing of cognitive processes. Future work: Automatically discover the number of latent processes. Learn process durations. Apply to open cognitive science problems.

References John R. Anderson, Daniel Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, 2004. http://act-r.psy.cmu.edu/about/. Geoffrey M. Boynton, Stephen A. Engel, Gary H. Glover, and David J. Heeger. Linear systems analysis of functional magnetic resonance imaging in human V1. The Journal of Neuroscience, 16(13):4207–4221, 1996. David D. Cox and Robert L. Savoy. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, 19:261–270, 2003. Anders M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109–114, 1999. Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245–275, 1997. James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jennifer L. Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425–2430, September 2001. Marcel Adam Just, Patricia A. Carpenter, and Sashank Varma. Computational modeling of high-level cognition and brain function. Human Brain Mapping, 8:128–136, 1999. http://www.ccbi.cmu.edu/project 10modeling4CAPS.htm. Tom M. Mitchell et al. Learning to decode cognitive states from brain images. Machine Learning, 57:145–175, 2004. Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M. Jordan, November 2002. Radu Stefan Niculescu. Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks. PhD thesis, Carnegie Mellon University, July 2005. CMU-CS-05-147.