Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department.

Slides:

Advertisements

Similar presentations

Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.

Advertisements

Basis Functions. What’s a basis ? Can be used to describe any point in space. e.g. the common Euclidian basis (x, y, z) forms a basis according to which.

The General Linear Model Or, What the Hell’s Going on During Estimation?

Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.

Supervised Learning Recap

Designing a behavioral experiment

Segmentation and Fitting Using Probabilistic Methods

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

The loss function, the normal equation,

Visual Recognition Tutorial

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.

Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.

1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Hidden Process Models: Decoding Overlapping Cognitive States with Unknown Timing Rebecca A. Hutchinson Tom M. Mitchell Carnegie Mellon University NIPS.

Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Multi-voxel Pattern Analysis (MVPA) and “Mind Reading” By: James Melrose.

Visual Recognition Tutorial

Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment.

Scalable Text Mining with Sparse Generative Models

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer.

Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

Radial Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

FMRI Methods Lecture7 – Review: analyses & statistics.

Current work at UCL & KCL. Project aim: find the network of regions associated with pleasant and unpleasant stimuli and use this information to classify.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Lecture 2: Statistical learning primer for biologists

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Flat clustering approaches

Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven

1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.

Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

EE 551/451, Fall, 2006 Communication Systems Zhu Han Department of Electrical and Computer Engineering Class 15 Oct. 10 th, 2006.

Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Mixture Models with Adaptive Spatial Priors Will Penny Karl Friston Acknowledgments: Stefan Kiebel and John Ashburner The Wellcome Department of Imaging.

Bayesian Perception.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Data Mining Lecture 11.

Hidden Process Models with applications to fMRI data

CSCI 5822 Probabilistic Models of Human and Machine Learning

Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.

EE513 Audio Signals and Systems

Bayesian Methods in Brain Imaging

School of Computer Science, Carnegie Mellon University

Learning Theory Reza Shadmehr

Hierarchical Models and

Mixture Models with Adaptive Spatial Priors

Probabilistic Modelling of Brain Imaging Data

Mathematical Foundations of BME

Will Penny Wellcome Trust Centre for Neuroimaging,

Presentation transcript:

Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department

2 Talk Outline Motivation: fMRI (functional Magnetic Resonance Imaging) data. Problem: new kind of probabilistic time series modeling. Solution: Hidden Process Models (HPMs). Results: preliminary experiments with HPMs. Extensions: proposed improvements to HPMs.

3 fMRI Data: High-Dimensional and Sparse … Imaged once per second for minutes Only a few dozen trials (i.e. training examples) 10,000-15,000 voxels per image

4 The Hemodynamic Response Signal Amplitude Time (seconds) Subject reads a word and indicates whether it is a noun or verb in less than a second. fMRI measures an indirect, temporally blurred correlate of neural activity. Also called BOLD response: Blood Oxygen Level Dependent.

5 Study: Pictures and Sentences Task: Decide whether sentence describes picture correctly, indicate with button press. 13 normal subjects, 40 trials per subject. Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. Images are acquired every 0.5 seconds. Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest

6 Motivation To track cognitive processes over time. –Estimate process hemodynamic responses. –Estimate process timings. Allowing processes that do not directly correspond to the stimuli timing is a key contribution of HPMs! To compare hypotheses of cognitive behavior.

7 The Thesis It is possible to –simultaneously –estimate the parameters and timing of –temporally and spatially overlapped, –partially observed processes –(using many features and a small number of noisy training examples). We are developing a class of probabilistic models called Hidden Process Models (HPMs) for this task.

8 Related Work in fMRI General Linear Model (GLM) –Must assume timing of process onset to estimate hemodynamic response –Dale99 4-CAPS and ACT-R –Predict fMRI data rather than learning parameters of processes from the data –Anderson04, Just99

9 Related Work in Machine Learning Classification of windows of fMRI data –Does not typically estimate hemodynamic response –Cox03, Haxby01, Mitchell04 Dynamic Bayes Networks –HPM assumptions/constraints are difficult to encode in DBNs –Murphy02, Ghahramani97

10 HPM Modeling Assumptions Model latent time series at process-level. Process instances share parameters based on their process types. Use prior knowledge from experiment design. Sum process responses linearly.

11 HPM Formalism (Hutchinson06) HPM = H =, a set of processes h =, a process W = response signature d = process duration  = allowable offsets  = multinomial parameters over values in  C =, a set of configurations c =, a set of process instances  =, a process instance h = process ID = associated stimulus landmark O = offset (takes values in  h )  =, priors over C  =, standard deviation for each voxel Notation: parameter(entity) e.g. W(h) is the response signature of process h, and O(  ) is the offset of process instance .

12 Process 1: ReadSentence Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } One configuration c of process instances  1,  2, …  k : (with prior  c ) Predicted mean: Input stimulus  : 11  Timing landmarks : 2 1 22 Process instance:  2 Process h: 2 Timing landmark: 2 Offset time O: 1 sec (Start time: 2 + O) sentence picture v1 v2 Process 2: ViewPicture Response signature W: Duration d: 11 sec. Offsets  : {0,1} P(  ): {  0,  1 } v1 v2 Processes of the HPM: v1 v2 + N(0,  1 ) + N(0,  2 )

13 HPMs: the graphical model Offset o Process Type h Start Time s observed unobserved Timing Landmark Y t,v  1,…,  k t=[1,T], v=[1,V]  The set C of configurations constrains the joint distribution on {h(k),o(k)}  k. Configuration c

14 Encoding Experiment Design Configuration 1: Input stimulus  : Timing landmarks : 2 1 ViewPicture = 2 ReadSentence = 1 Decide = 3 Configuration 2: Configuration 3: Configuration 4: Constraints Encoded: h(  1 ) = {1,2} h(  2 ) = {1,2} h(  1 ) != h(  2 ) o(  1 ) = 0 o(  2 ) = 0 h(  3 ) = 3 o(  3 ) = {1,2} Processes:

15 Inference Over configurations Choose the most likely configuration, where: C=configuration, Y=observed data,  =input stimuli, HPM=model

16 Learning Parameters to learn: –Response signature W for each process –Timing distribution  for each process –Standard deviation  for each voxel Case 1: Known Configuration. –Least squares problem to estimate W. –Standard MLEs for  and  Case 2: Unknown Configuration. –Expectation-Maximization (EM) algorithm to estimate W and . E step: estimate a probability distribution over configurations. M step: update estimates of W (using reweighted least squares) and  (using standard MLEs) based on the E step. –Standard MLEs for 

17 Case 1: Known Configuration Following Dale99, use GLM. The (known) configuration generates a TxD convolution matrix X: … … … d(1)d(3) t=1 t=2 t=3 t=4 … Configuration:  1 : h=1, start=1  2 : h=2, start=2  3 : h=3, start=2 D=  h d(h) T d(2) For this example, d(1)=d(2)=d(3)=3.

18 Case 1: Known Configuration T V = … … … d(1)d(3)d(2) d(1) d(3) d(2) V W(1) W(2) W(3) Y

19 Case 2: Unknown Configuration E step: Use the inference equation to estimate a probability distribution over the set of configurations. M step: Use the probabilities computed in the E-step to form weights for the least- squares procedure for estimating W.

20 Case 2: Unknown Configuration Convolution matrix models several choices for each time point … … … … … d(1)d(3) t=1 t=2 … t=18 … T’>T d(2) Configurations for each row: 3,4 1,2 3,4 1,2 … …

21 Case 2: Unknown Configuration … … … d(1)d(3) e1 e2 e3 e4 … d(2) Y = W 3,4 1,2 3,4 1,2 … Configurations:Weights: e1 = P(C=3|Y,W old,  old,  old ) + P(C=4|Y,W old,  old,  old ) Weight each row with probabilities from E-step.

22 Learned HPM with 3 processes (S,P,D), and d=13sec. P P SS D? observed Learned models: S P D D start time chosen by program as t+18 reconstructed P P SS D D D?

23 Results: Model Selection Use cross-validation to choose a model. –GNB = Gaussian Naïve Bayes –HPM-2 = HPM with ViewPicture, ReadSentence –HPM-3 = HPM-2 + Decide Accuracy predicting picture vs. sentence (random = 0.5) Data log likelihood Subject:ABC GNB HPM HPM GNB HPM HPM

24 Synthetic Data Results Timing of synthetic data mimics the real data, but we have ground truth. Can use to investigate effects of –signal to noise ratio –number of voxels –number of training examples on –training time –cross-validated classification accuracy –cross-validated data log-likelihood

25

26 Recall Motivation To track cognitive processes over time. –Estimate process hemodynamic responses. –Estimate process timings. Allowing processes that do not directly correspond to the stimuli timing is a key contribution of HPMs! To compare hypotheses of cognitive behavior.

27 Proposed Work Goals: –Increase efficiency. fewer parameters better accuracy from fewer examples faster inference and learning –Handle larger, more complex problems. more voxels more processes fewer assumptions Research areas: –Model Parameterization –Timing Constraints –Learning Under Uncertainty

28 Model Parameterization Goals: –Improve biological plausibility of learned responses. –Decrease the number of parameters to be estimated (improving sample complexity). Tasks: –Parameter sharing across voxels –Parametric form for response signatures –Temporally smoothed response signatures

29 Timing Constraints Goals: –Specify experiment design domain knowledge more efficiently. –Improve the computational and sample complexities of the HPM algorithms. Tasks: –Formalize limitations in terms of fMRI experiment design. –Improve the specification of timing constraints. –Develop more efficient exact and/or approximate algorithms.

30 Learning Under Uncertainty Goals: –Relax the current modeling assumptions. –Allow more types of uncertainty about the data. Tasks: –Learn process durations. –Learn the number of processes in the model.

31 HPM Parameter Sharing (Niculescu05) Special case: HPMs with known configuration. Parameter reduction: from d(h) * V to d(h) + V. Scaling parameter per voxel per process. No more voxel index on weights. New mean for voxel v at time t:

32 Extension to Unknown Timing Simplifying assumptions: 1.No clustering. All voxels share a response. 2.Voxels that share a response for one process share a response for all processes. Algorithm notes: –Residual is linear in shared response parameters and in scaling parameters, so minimize iteratively. –Empirically, convergence occurs within 3-5 iterations.

33 Iterative M-step: Step 1 = W(1) W(2) W(3) T’xD for v1 T’xD for v2 Replace ones of convolution matrix with s hv. Repeat for all v. No more voxel index here. Single column of parameters describing the shared responses. New shape: T’V x 1 s s s s … … … d(1)d(3)d(2) Y s s s s … … … Using current estimates of S, re-estimate W.

34 Y = s 11 … s 1V … … … s s 1V s 21 … s 2V … … … s s 2V … … … s H1.... s HV Each column has the scaling parameters for a voxel. The parameter for each process is replicated over its duration. d(1) Need to constrain these parameter sets to be equal. Original size convolution matrix. Ones replaced with W estimates. Iterative M-step: Step 2 Using current estimates of W, re-estimate S. d(1)d(3)d(2) Original size data matrix. w w w w … … …

35 Next Step? Implement this approach. Anticipated memory issues: –Replicating the convolution matrix for each voxel in step 1. –Working on exploiting sparsity/structure of these matrices. Add clustering back in Adapt for other parameterizations of response signatures

36 Response Signature Parameters Temporal smoothing Gamma functions Hemodynamic basis functions

37 Temporally Smooth Responses Idea: Add a regularizer to the loss function to penalize large jumps between time points. –e.g. minimize (Y-XW) 2 +  t (W t -W t-1 ) 2 –choose by cross-validation –should be a straightforward extension to the optimization code Concerns: –this adds instead of reducing the number of parameters!

38 Gamma-shaped Responses Idea: Use a gamma function with 3 parameters for each process response signature (Boynton96). – a controls amplitude –  controls width of peak – n controls delay of peak Questions: –Are gamma functions a reasonable modeling assumption? –Details of how to fit parameters in M-step? Seconds Signal Amplitude a  n

39 Hemodynamic Basis Functions Idea: Process response signatures are weighted sum of basis functions. –parameters are weights on n basis functions –e.g. gammas with different sets of parameters –learn process durations “for free” with variable length basis functions –share basis functions across voxels and processes Questions: –How to choose/learn basis? (Dale99)

40 Schedule August 2006 –Parameter sharing. –Progress on model parameterization. December 2006 –Improved expression of timing constraints. –Corresponding updates to HPM algorithms. June 2007 –Application of HPMs to an open cognitive science problem. December 2007 –Projected completion.

41 References John R. Anderson, Daniel Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. An integrated theory of the mind. Psychological Review, 111(4):1036–1060, Geoffrey M. Boynton, Stephen A. Engel, Gary H. Glover, and David J. Heeger. Linear systems analysis of functional magnetic resonance imaging in human V1. The Journal of Neuroscience, 16(13):4207–4221, David D. Cox and Robert L. Savoy. Functional magnetic resonance imaging (fMRI) ”brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, 19:261–270, Anders M. Dale. Optimal experimental design for event-related fMRI. Human Brain Mapping, 8:109–114, Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. Machine Learning, 29:245–275, James V. Haxby, M. Ida Gobbini, Maura L. Furey, Alumit Ishai, Jennifer L. Schouten, and Pietro Pietrini. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425–2430, September Rebecca A. Hutchinson, Tom M. Mitchell, and Indrayana Rustandi. Hidden Process Models. To appear at International Conference on Machine Learning, Marcel Adam Just, Patricia A. Carpenter, and Sashank Varma. Computational modeling of high-level cognition and brain function. Human Brain Mapping, 8:128–136, modeling4CAPS.htm. Tom M. Mitchell et al. Learning to decode cognitive states from brain images. Machine Learning, 57:145–175, Kevin P. Murphy. Dynamic bayesian networks. To appear in Probabilistic Graphical Models, M. Jordan, November Radu Stefan Niculescu. Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks. PhD thesis, Carnegie Mellon University, July CMU-CS