Real-time optimization of neurophysiology experiments Jeremy Lewi 1, Robert Butera 1, Liam Paninski 2 1 Department of Bioengineering, Georgia Institute.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
The linear/nonlinear model s*f 1. The spike-triggered average.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Biological Modeling of Neural Networks: Week 9 – Coding and Decoding Wulfram Gerstner EPFL, Lausanne, Switzerland 9.1 What is a good neuron model? - Models.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
1 Testing the Efficiency of Sensory Coding with Optimal Stimulus Ensembles C. K. Machens, T. Gollisch, O. Kolesnikova, and A.V.M. Herz Presented by Tomoki.
Visual Recognition Tutorial
Reading population codes: a neural implementation of ideal observers Sophie Deneve, Peter Latham, and Alexandre Pouget.
Lecture 11: Recursive Parameter Estimation
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Lecture 5: Learning models using EM
Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.
Machine Learning CMPT 726 Simon Fraser University
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
From T. McMillen & P. Holmes, J. Math. Psych. 50: 30-57, MURI Center for Human and Robot Decision Dynamics, Sept 13, Phil Holmes, Jonathan.
Maximum likelihood (ML)
Today Wrap up of probability Vectors, Matrices. Calculus
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Modeling Your Spiking Data with Generalized Linear Models.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
EM and expected complete log-likelihood Mixture of Experts
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Projects: 1.Predictive coding in balanced spiking networks (Erwan Ledoux). 2.Using Canonical Correlation Analysis (CCA) to analyse neural data (David Schulz).
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Encoding/Decoding of Arm Kinematics from Simultaneously Recorded MI Neurons Y. Gao, E. Bienenstock, M. Black, S.Shoham, M.Serruya, J. Donoghue Brown Univ.,
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
The Information Bottleneck Method clusters the response space, Y, into a much smaller space, T. In order to informatively cluster the response space, the.
What is the neural code?. Alan Litke, UCSD What is the neural code?
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Neural Coding: Integrate-and-Fire Models of Single and Multi-Neuron Responses Jonathan Pillow HHMI and NYU Oct 5, Course.
Machine Learning 5. Parametric Methods.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Asymptotic Behavior of Stochastic Complexity of Complete Bipartite Graph-Type Boltzmann Machines Yu Nishiyama and Sumio Watanabe Tokyo Institute of Technology,
Ch 3. Likelihood Based Approach to Modeling the Neural Code Bayesian Brain: Probabilistic Approaches to Neural Coding eds. K Doya, S Ishii, A Pouget, and.
CHARACTERIZATION OF NONLINEAR NEURON RESPONSES AMSC 664 Final Presentation Matt Whiteway Dr. Daniel A. Butts Neuroscience.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
A Kriging or Gaussian Process emulator has: an unadjusted mean (frequently a least squares fit: ), a correction / adjustment to the mean based on data,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Variational filtering in generated coordinates of motion
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Maximum Likelihood Estimation
Parametric Methods Berlin Chen, 2005 References:
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Real-time optimization of neurophysiology experiments Jeremy Lewi 1, Robert Butera 1, Liam Paninski 2 1 Department of Bioengineering, Georgia Institute of Technology, 2. Department of Statistics, Columbia University

Neural Encoding The neural code: what is P(response | stimulus) Main Question: how to estimate P(r|x) from (sparse) experimental data?

Curse of dimensionality Both stimuli and responses can be very high-dimensional Stimuli: Images Sounds Time -varying behavior Responses: observations from single or multiple simultaneously recorded point processes

All experiments are not equally informative Possible p(r|x) possible p(r|x) after experiment A Goal: Constrain set of possible systems as much as possible How: Maximize mutual information I({experiment};{possible systems}) Possible p(r|x) possible p(r|x) after experiment B

Adaptive optimal design of experiments Assume: parametric model p(r|x,θ) of responses r on stimulus x prior distribution p(θ) on finite-dimensional parameter space Goal: estimate θ from data Usual approach: draw stimuli i.i.d. from fixed p(x) Adaptive approach: choose p(x) on each trial to maximize I(θ;{r,x})

Theory: info. max is better 1.Info. max. is in general more efficient and never worse than random sampling [Paninski 2005] 2.Gaussian approximations are asymptotically accurate

Computational challenges 3.Computations need to be performed quickly: 10ms – 1 sec Speed limits the number of trials 1. Updating the posterior: p(θ|x,r) Difficult to represent/manipulation high dimensional posteriors 2.Maximizing the mutual information I(r;θ|x) High dimensional integration High dimensional optimization

Solution Overview 1.Model responses using a 1-d GLM Computationally tractable 2.Approximate posterior as Gaussian easy to work with even in high-d 3.Reduce optimization of mutual information to a 1-d problem

Neural Model: GLM We model a neuron using a general linear model whose output is the expected firing rate. The nonlinear stage is the exponential function; also ensures the log likelihood is a concave function of θ.

GLM Computationally tractable 1.log likelihood is concave 2.log likelihood is 1-dimensional

Updating the Posterior 1. Approximate the posterior, as Gaussian. Posterior is product of log concave functions Posterior distribution is asymptotically Gaussian 2.Use a Laplace approximation to determine the parameters of the Gaussian, μt, Ct. μt = peak of posterior Ct – negative of the inverse hessian evaluated at the peak

Updating the Posterior 3.Update is rank 1 4.Find the peak: Newton’s method in 1-d 5.Invert the Hessian: use the Woodbury Lemma: O(d 2 ) time += log priorlog likelihoodlog posterior

Choosing the optimal stimulus Maximize the mutual information  Minimize the posterior entropy Posterior is Gaussian: Compute the expected determinant –Simplify using matrix perturbation theory Result: Maximize an expression for the expected fisher information Maximization Strategy –Impose a power constraint on the stimulus –Perform an eigendecomposition –Simplify using lagrange multipliers –Find solution by performing a 1-d numerical optimization Bottleneck: Eigendecomposition – takes O(d 2 ) in practice

Running Time 1.Updating the posterior O(d 2 ) d- dimensionality 2.Eigen decomposition O(d 2 ) 3.Choosing the stimulus O(d)

Simulation Setup Compare: Random vs. Information maximizing stimuli Objective: learn parameters

A Gabor Receptive Field high dimensional Info. Max converges to true receptive field Converges faster than random 25x33

Non-stationary parameters Biological systems are non-stationary –Degradation of the preparation –Fatigue –Attentive state Use a Kalman filter type approach Model slow changes using diffusion

Non-stationary parameters θ i follow Gaussian curve whose center moves randomly over time

Assuming θ is constant overestimates certainty  poor choices for optimal stimuli Non-stationary parameters

Conclusions 1.Efficient implementation achievable with: 1.Model based approximations Model is specific but reasonable 2.Gaussian approximation of the posterior Justified by the theory 3.Reduction of the optimization to a 1-d problem 2.Assumptions are weaker than typically required for system identification in high dimensions 3.Efficiency could permit system identification in previously intractable systems

References 1.A. Watson, et al., Perception and Psychophysics 33, 113 (1983). 2.M. Berry, et al., J. Neurosci (1998) 3.L. Paninski, Neural Computation 17, 1480 (2005). 4.P. McCullagh, et al., Generalized linear models (Chapman and Hall, London, 1989). 5.L. Paninski, Network: Computation in Neural Systems 15, 243 (2004). 6.E. Simoncelli, et al., The Cognitive Neurosciences, M. Gazzaniga, ed. (MIT Press, 2004), third edn. 7.M. Gu, et al., SIAM Journal on Matrix Analysis and Applications 15, 1266 (1994). 8.E. Chichilnisky, Network: Computation in Neural Systems 12, 199 (2001). 9.F. Theunissen, et al., Network: Computation in Neural Systems 12, 289 (2001). 10.L. Paninski, et al., Journal of Neuroscience 24, 8551 (2004)

Acknowledgements This work was supported by the Department of Energy Computational Science Graduate Fellowship Program of the Office of Science and National Nuclear Security Administration in the Department of Energy under contract DE-FG02-97ER25308 and by the NSF IGERT Program in Hybrid Neural Microsystems at Georgia Tech via grant number DGE

Spike history posterior mean after 500 trials stimulus filterspike history filter

Previous Work System Identification 1.Minimize variance of parameter estimate Deciding among a menu of experiments which to conduct [Flaherty 05] 2. Maximize divergence of predicted responses for competing models [Dunlop06] Optimal Encoding 1.Maximize the mutual information input and output [Machens 02] 2.Maximize response hill-climbing to find stimulus to which V1 neurons in monkey respond strongly [Foldiak01] Efficient stimuli for cat auditory cortex [Nelken01] 3.Minimize stimulus reconstruction error [Edin04]

Derivation of Choosing the Stimulus I We choose the stimulus by maximizing the conditional mutual information between the response and θ. Neglecting higher order terms, we just need to maximize:

Derivation of Choosing the Stimulus II So we just need to minimize Therefore we need to maximize

Maximization We maximize the above subject to a power constraint by breaking it up into an inner and outer problem. To maximize this expression, we express everything in terms of the eigenvectors of C t.. are the projection of the mean and stimulus onto the eigenvectors.

Maximization II We maximize the inner problem using lagrange multipliers: To find the global maximum we perform a 1-d search over λ 1, for each λ 1 we compute F(y(λ 1 )) and then choose the stimulus which maximizes F(y(λ 1 ))

Posterior Update: Math