Improved characterization of neural and behavioral response properties using point-process state-space framework Anna Alexandra Dreyer Harvard-MIT Division.

Slides:



Advertisements
Similar presentations
What is the neural code? Puchalla et al., What is the neural code? Encoding: how does a stimulus cause the pattern of responses? what are the responses.
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
Unsupervised Learning
CS479/679 Pattern Recognition Dr. George Bebis
2011 COURSE IN NEUROINFORMATICS MARINE BIOLOGICAL LABORATORY WOODS HOLE, MA GENERALIZED LINEAR MODELS Uri Eden BU Department of Mathematics and Statistics.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
An Introductory to Statistical Models of Neural Data SCS-IPM به نام خالق ناشناخته ها.
Shin Ishii Nara Institute of Science and Technology
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
1 Expectation Maximization Algorithm José M. Bioucas-Dias Instituto Superior Técnico 2005.
A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 5: Learning models using EM
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
GENERALIZED LINEAR MODELS
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Probability Theory and Random Processes
Random Sampling, Point Estimation and Maximum Likelihood.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
BCS547 Neural Decoding.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Lecture 2: Statistical learning primer for biologists
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Ch 3. Likelihood Based Approach to Modeling the Neural Code Bayesian Brain: Probabilistic Approaches to Neural Coding eds. K Doya, S Ishii, A Pouget, and.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.
Introduction to Sampling based inference and MCMC
HST 583 fMRI DATA ANALYSIS AND ACQUISITION
2016 METHODS IN COMPUTATIONAL NEUROSCIENCE COURSE
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Course: Autonomous Machine Learning
Michael Epstein, Ben Calderhead, Mark A. Girolami, Lucia G. Sivilotti 
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
More about Posterior Distributions
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
Parametric Methods Berlin Chen, 2005 References:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Improved characterization of neural and behavioral response properties using point-process state-space framework Anna Alexandra Dreyer Harvard-MIT Division of Health Sciences and Technology Speech and Hearing Bioscience and Technology Program Neurostatistics Research Laboratory, MIT PI: Emery Brown, M.D., Ph.D. September 27, 2007

Action potentials as binary events Figure from laboratory of Mark Ungless Action potentials (spikes) are binary events Cells using timing and frequency of action potentials to communicate with neighboring cells Most cell emit action potentials spontaneously in the absence of stimulation Models should begin with spikes to most accurately describe the response

Point Process Framework: Definition of Conditional Intensity Function Given a recording interval of [0,T), the counting process N(t) represents the number of spikes that have occurred on the interval [0,t). A model can be completely characterized by the conditional intensity function (CIF) that defines the instantaneous firing rate at every point in time as: where H(t) represents the autoregressive history until time t. Brown et al., 2003; Daley and Vere-Jones, 2003; Brown, 2005

Joint Probability of Spiking (Likelihood function) Discretize time on duration [0,T) into B intervals. As  becomes increasingly small (t b |Ψ,H k ), where Ψ are parameters and H b is the autoregressive history up to bin b, approaches the probability of seeing one event in the binwidth of . If we select a sufficiently small binwidth, , such that the probability of seeing more than one event in this binwidth approaches 0, the joint probability can be written as the product of Bernoulli independent events (Truccolo, et al., 2005): where o(  J ) represents the probability of seeing two or more events on the interval (t b -1,t b ]. Truccolo et al., 2005

An example of using PP models to analyze auditory data: Experimental Paradigm Recordings of action potentials to 19 stimulus levels for multiple repetitions of the stimulus Need to develop encoding model to characterize responses to each stimulus level as well as the noise in the system Inference: find the lowest stimulus level for which the response is more than system noise Given new responses from the same cell, need to decode the stimulus from which response originated Data from Lim and Anderson (2006)

Modeling example of cortical response across stimulus levels Response characteristics include –Autoregressive components –Temporal and rate-dependent elements To have adequate goodness-of-fit and predictive power, must capture these elements from raw data Current method Does NOT capture Typical autoregressive components

Point process state space framework Instantaneous Firing Intensity Model: –The firing intensity in each Δ=1ms bin, b, is modeled as a function of the past spiking history, H l,k,bΔ and of the effect of the stimulus –Observation Equation –State equation Conditional firing intensityStimulus effectPast spiking history effect where ε l+1,r is a Gaussian random vector Computational methods developed with G Czanner, U Eden, E Brown

Encoding and Decoding Methodology Estimation/Encoding/Inference –The Expectation-Maximization algorithm used –Monte Carlo techniques to estimate confidence bounds for stimulus effect Goodness-of-fit –KS and autocorrelation of rescaled times Decoding and response property inference

Expectation-Maximization algorithm Used in computing maximum likelihood (ML) parameters in statistical models with hidden variables or missing data. The algorithm consists of two steps –expectation (E) step where the expectation of the complete likelihood is estimated –maximization (M) step when the maximum likelihood of the expectation is taken. As the algorithm progresses, the initial estimates of the parameters are improved by taking iterations until the estimate converges on the maximum likelihood estimator. Dempster et al., 1977; McLachlan and Krishnan, 1997; Pawitan, 2001

SS-GLM model of stimulus effect Level dependent stimulus effect captures many phenomena seen in data –Increase of spiking with level –Spread of excitation in time Removes the effect of autoregressive history which is system (not stimulus dependent) property Level Number Time since stimulus onset (ms) Stimulus Effect (spikes/s)

Threshold inference based on all trials and all levels Define threshold as the first stimulus level for which we can be reasonably (>0.95) certain that the response at that level is different from the noise and continues to differ for higher stimulus levels For this example, we define threshold as level 8 Compare to common methodology of rate-level function (level 11) Dreyer et al., 2007; Czanner et al., 2007

Goodness-of-fit assessment The KS plot fits close to the 45 degree line indicating uniformity of rescaled spike times The autocorrelation plot implies that Gaussian rescaled spike times are relatively uncorrelated, implying independence. In contrast, the KS plots for the underlying rate-based models provide a very poor fit to the data Johnson & Kotz, 1970; Brown et al, 2002; Box et al., 1994

Decoding based on a single trial Decoding of new data based on encoding parameters Given a spike train, estimate the likelihood that the spike train, n l*, came from any stimulus, s l’, in our encoding model Calculate the likelihood for all stimuli, s 1:L Take the most likely level as the decoded stimulus

Single-trial threshold inference using decoding based on ML more sensitive around than ROC based on number of spikes The area under ROC curve specifies the probability that, when two responses are drawn, one from a lower level and one from a higher level, the algorithm assigns a larger value to the draw from a higher level.

Decoding across multiple trials improves performance

Neural Model Conclusions This methodology has potential for characterizing the behavior of any noisy system where separation of signal from noise is important in predicting responses to future stimuli

Bayesian techniques – the alternative to frequentist estimation Use Bayesian sampling techniques to: –Estimate behavioral responses to auditory stimuli –Apply methodology used for auditory encoding models to learning experiments to discover the neural mechanisms that encode for behavioral learning in the Basal Ganglia. In collaboration with B. Pfingst, A. Smith, A. Graybiel, E. Brown

Bayesian sampling methodology Goal is to compute the posterior probability density of the parameters and the hidden state given the data Use Monte Carlo Markov Chain (MCMC) methods to compute the posterior probability by simulating stationary Markov Chains. MCMC methods provide approximate posterior probability density for parameters Can compute credible intervals (analogous to confidence intervals for unknown parameter estimates) for parameter estimates Gilks et al., 1996; Congdon, 2003

References Box GEP, Jenkins GM, Reinsel GC. Time series analysis, forecasting and control. 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, Brown EN. Theory of Point Processes for Neural Systems. In: Chow CC, Gutkin B, Hansel D, Meunier C, Dalibard J, eds. Methods and Models in Neurophysics. Paris, Elsevier, 2005, Chapter 14: Brown EN, Barbieri R, Eden UT, and Frank LM. Likelihood methods for neural data analysis. In: Feng J, ed. Computational Neuroscience: A Comprehensive Approach. London: CRC, 2003, Chapter 9: Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. Time-Rescaling theorem and its application to neural spike train data analysis. Neural. Comput 2002: 14: Congdon P. Applied Bayesian Modelling. John Wiley and Sons Ltd., Chichester, United Kingdom, Daley D and Vere-Jones D. An Introduction to the Theory of Point Process. 2nd ed., Springer-Verlag, New York, Czanner G, Dreyer AA, Eden UT, Wirth S, Lim HH, Suzuki W, Brown EN. Dynamic Models of Neural Spiking Activity. IEEE Conference on Decision and Control Dec 12. Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1977, 39(1): Dreyer AA, Czanner G, Eden UT, Lim HH, Anderson DJ, Brown EN. Enhanced auditory neural threshold detection using a point process state-space model analysis. Computational Systems Neuroscience Conference (COSYNE). February, 2007 Gilks WR, Richardson S, Spiegelhalter DJ. Monte Carlo Markov chain in practice. New York: Chapman and Hall/CRC, Johnson A, Kotz S. Distributions in Statistics: Continuous Univariate Distributions. New York: Wiley, Lim HH, Anderson DJ. Auditory cortical responses to electrical stimulation of the inferior colliculus: Implications for an auditory midbrain implant. J. Neurophysiol. 2006, 96(3): McLachlan GJ and Krishnan T. The EM Algorithm and Extensions. John Wiley & Sons, Pawitan Y. In All Likelihood: Statistical Modeling and Inference Using Likelihood. New York: Oxford Univ. Press, Truccolo, W. Eden, U.T., Fellows, M.R., Donoghue, J.P. and Brown, E.N. A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. J. Neurophysiol. 2005, 93: