2011 COURSE IN NEUROINFORMATICS MARINE BIOLOGICAL LABORATORY WOODS HOLE, MA GENERALIZED LINEAR MODELS Uri Eden BU Department of Mathematics and Statistics.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
Brief introduction on Logistic Regression
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Kriging.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Spike Train Statistics Sabri IPM. Review of spike train  Extracting information from spike trains  Noisy environment:  in vitro  in vivo  measurement.
Integration of sensory modalities
Spike Trains Kenneth D. Harris 3/2/2015. You have recorded one neuron How do you analyse the data? Different types of experiment: Controlled presentation.
An Introductory to Statistical Models of Neural Data SCS-IPM به نام خالق ناشناخته ها.
20 10 School of Electrical Engineering &Telecommunications UNSW UNSW 2. Laguerre Parameterised Hawkes Process To model spike train data,
Visual Recognition Tutorial
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Simulation Modeling and Analysis
Statistics.
Statistical analysis and modeling of neural data Lecture 5 Bijan Pesaran 19 Sept, 2007.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Review of Lecture Two Linear Regression Normal Equation
SIMULATION MODELING AND ANALYSIS WITH ARENA
GENERALIZED LINEAR MODELS
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Regression Analysis (2)
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Modeling Your Spiking Data with Generalized Linear Models.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Random Sampling, Point Estimation and Maximum Likelihood.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Geographic Information Science
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Encoding/Decoding of Arm Kinematics from Simultaneously Recorded MI Neurons Y. Gao, E. Bienenstock, M. Black, S.Shoham, M.Serruya, J. Donoghue Brown Univ.,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
2011 COURSE IN NEUROINFORMATICS MARINE BIOLOGICAL LABORATORY WOODS HOLE, MA Introduction to Spline Models or Advanced Connect-the-Dots Uri Eden BU Department.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
Generalized Linear Models (GLMs) and Their Applications.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Ch 3. Likelihood Based Approach to Modeling the Neural Code Bayesian Brain: Probabilistic Approaches to Neural Coding eds. K Doya, S Ishii, A Pouget, and.
CHARACTERIZATION OF NONLINEAR NEURON RESPONSES AMSC 664 Final Presentation Matt Whiteway Dr. Daniel A. Butts Neuroscience.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Computacion Inteligente Least-Square Methods for System Identification.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Modeling and Simulation CS 313
Statistical Modelling
HST 583 fMRI DATA ANALYSIS AND ACQUISITION
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Modeling and Simulation CS 313
2016 METHODS IN COMPUTATIONAL NEUROSCIENCE COURSE
Ch3: Model Building through Regression
10701 / Machine Learning Today: - Cross validation,
Integration of sensory modalities
5.1 Introduction to Curve Fitting why do we fit data to a function?
Visually Cued Action Timing in the Primary Visual Cortex
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

2011 COURSE IN NEUROINFORMATICS MARINE BIOLOGICAL LABORATORY WOODS HOLE, MA GENERALIZED LINEAR MODELS Uri Eden BU Department of Mathematics and Statistics August 17, 2011

Outline Quick introduction to GLM theory GLM model for inhomogeneous Poisson spiking History dependent GLM model of retinal neurons in culture A GLM model of learning in behavioral experiments

3 Simple linear regression How does the height of a son depend of the height of his father? Father’s Height (inches)

4 Simple linear regression How does the height of a son depend of the height of his father? Father’s Height (inches) A B C Which is the correct regression line?

5 Count data Linear regression methods are not well suited for count data Time (msec)

6 Binary data Linear regression methods are not well suited for binary data Trial (msec)

Generalized Linear Models Linear regression models of the form: are useful for relating continuous valued observations to a set of covariates. Many types of data cannot be described by a Gaussian additive noise model. Generalized linear models extend a simple class of models to these data types. Count data:Binary data:

The Natural Exponential Family A probability model for the data {y 1,…, y n } is in the exponential family if you can write its likelihood in the form: Some common distributions in the exponential family include: Normal, Bernoulli, binomial, Poisson, gamma, beta, exponential, chi-square, lognormal, …

Generalized Linear Models Set the link function,, to be a linear function of the covariates, Differentiate the log likelihood with respect to the parameters, set equal to zero, and solve the resulting system of equations of the form

The Natural Exponential Family Poisson Data: So the link function is:

The Natural Exponential Family Binomial Data: So the link function is:

Link Function DistributionEquation logitBinomial logPoisson GLM Models for Spike Data

Fitting GLM As with ISI models, use maximum likelihood to obtain GLM parameters. In general it is not possible to obtain a closed form solution for the ML estimator or for its distribution. So, use your favorite numerical optimization technique (such as Newton’s method), or my favorite: MATLAB™

Stochastic Models Linear Regression Properties of GLM: Convex likelihood surface Estimators asymptotically have minimum MSE Generalized Linear Models (GLM) Neural Spiking Models

Case 1: Inhomogeneous Poisson Model Construct an inhomogeneous Poisson spiking model for repeated trial data as a function of time Time (ms) Trial

Case 1: Inhomogeneous Poisson Model Construct an inhomogeneous Poisson spiking model for repeated trial data as a function of time Time (ms) Firing Rate (Hz)

Case 1: Inhomogeneous Poisson Model For an inhomogeneous Poisson model for repeated trial data as a function of time –Polynomial model: or:

Case 1: Inhomogeneous Poisson Model Inhomogeneous Poisson GLM using 5 th order polynomial in time Time (ms) Firing Rate (Hz)

Case 1: Inhomogeneous Poisson Model Inhomogeneous Poisson GLM using 50 th order polynomial in time Time (ms) Firing Rate (Hz)

Goodness-of-fit measures Akaike’s Information Criterion: For maximum likelihood estimates it measures the trade-off between maximizing the likelihood (minimizing and the numbers of parameters the model requires. ) Selecting the (parsimonious) model that minimizes the AIC: Helps prevent overfitting Is asymptotically equivalent to complete leave-one- out cross-validation Asymptotically minimizes the KL distance between the selected model and the true unknown model

Case 1: Inhomogeneous Poisson Model AIC plot of model order Model order AIC

Case 1: Inhomogeneous Poisson Model For an inhomogeneous Poisson model for repeated trial data as a function of time –Spline model: where are spline basis functions

Case 1: Inhomogeneous Poisson Model Inhomogeneous Poisson GLM using spline fit in time Time (ms) Firing Rate (Hz)

Case 2: An Analysis of the Spiking Activity of Retinal Neurons in Culture (Iygengar and Liu, 1997) Retinal neurons are grown in culture under constant light and environmental conditions. The spontaneous spiking activity of these neurons is recorded. The objective is to develop a statistical model which accurately describes the stochastic structure of this activity.

Discrete Time Spike Train Data dN 1 dN 2 dN 3 dN 4 dN 5 dN 6 dN dN k is the spike indicator function in interval k is the intensity of spiking at time k, which in the limit is given by

GLM History Model How do we pick a model order? The ISI distribution models we constructed assume that Now, let the conditional intensity be a function of past spiking activity using GLM

Partial Correlogram

GLM Coefficients Lag (msec) Coefficient Value

AIC Results GLM Order AIC

KS Plots Graphical measure of goodness-of-fit, based on time rescaling, comparing an empirical and model cumulative distribution function. If the model is correct, then the rescaled ISIs are independent, identically distributed random variables whose KS plot should produce a 45° line [Ogata, 1988].

Uniform Quantiles Model CDF Kolmogorov-Smirnov Plots

Uniform Quantiles Model CDF KS Plots for Different Order GLMs

ISI Lag Order (rescaled to Gaussian) Correlation Correlation Function for Rescaled ISIs

Uniform Quantiles Model CDF GLM Model Classes

AIC and KS Statistics PoissonBinomial Order AIC KS ExpGamma Inv. Gauss KS Statistic Parametric Models:

ISI Histogram ISI (msec) Probability Density Exponential Gamma Inverse Gaussian Order 50 GLM

Inferences and Conclusions Iyengar and Liu showed that a generalized inverse Gaussian model described these data well. The fit of history-dependent GLM model improves appreciably on the fits of the exponential, gamma and inverse Gaussian models, most notably in terms of KS plots. Our analysis shows that the GLM model describes the essential stochastic features in the data. There is a significant history dependence in the retinal neural spiking data extending back 14 msec. There is another effect going back approximately 100 msec. The shorter time-scale phenomena may reflect intrinsic dynamics of the individual neuron whereas the longer time-scale effects may also include network dynamics.

Remarks 1.Only 14 parameters are used to fit ~ 30,000 data points! 2.This type of strong history dependent effect is something we have seen in neurons from a number of different brain regions, animal models and experimental protocols. It was all simply described by GLM fitting. Truccolo W, Eden UT, Fellow M, Donoghue JD, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble and covariate effects. Journal of Neurophysiology, 2005, 93: Kass RE, Ventura V, Brown EN. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology, 2005, 94: 8-25.

Relating Learning Dynamically to Neural Activity Single cell recording in monkey hippocampus Trial and error learning of association between picture and response Wirth et al. Science 2003 Smith et al. Neural Computation 2003 Smith et al. Journal of Neuroscience, 2004

Behavioral Learning Experiment (Binary Regression) Monkeys were trained to saccade to one of four targets, based on displayed images. Each day the monkey had to learn new associations between images and targets. Single cell recording in monkey hippocampus

Behavioral Learning Data Model Question: Can we estimate a learning curve to characterize performance as a function of trial? Can we use it to establish a learning criterion? Logistic Regression where is the trial number is the probability of a correct response on trial k the logistic regression coefficients

Behavioral Learning Experiment Data Trial Number Probability of a Correct Response red incorrect blue correct Learning trial Chance performance

Parameter Value p-value AIC AIC difference const trial number Generalized Linear Model Summary Inference There is a significant improvement in performance during the experiment. Performance is better than would be expected by chance (0.25) from trial 21 onward. This behavior is consistent with the animal learning the task. The logistic regression function provides an estimate of the animal’s learning curve and a framework for defining the learning trial.

Remark Dynamic analyses of these data based on state-space models are more informative and do not require the monotonic assumption of the logistic regression model. Wirth et al. (2003), Smith et al. (2004), Smith et al. (2005)

Summary GLM provides a computationally tractable generalization of the Gaussian linear model to non-Gaussian regression models. Estimation is carried out using maximum likelihood. This analysis has all the properties of maximum likelihood. AIC, deviance and parameter standard errors provide measures of goodness-of-fit and an inference framework analogous to regression. Can be applied to other exponential family models. Non-canonical link functions can also be used. GLM is a standard tool in Matlab, Minitab, R, S, SAS, Splus, and SPSS.

Monkeys were trained to saccade to one of four targets, based on displayed images. Single cell recording in monkey hippocampus. GLM Peristimulus Time Histogram

Time (sec) Trial Peristimulus Time Histogram Spiking Data

Model Parameter vector: Basis functions: –Indicator Functions: –Splines:

Indicator Function Basis

Spline Function Basis

Uniform CDF Model CDF Lag ACF of rescaled TimesKS Plot Goodness-of-Fit

Adding History Uniform CDF Model CDF KS Plot

Adding History

Stochastic Models Linear Regression Properties of GLM: Convex likelihood surface Estimators asymptotically have minimum MSE Generalized Linear Models (GLM) Neural Spiking Models

GLM Neural Models By selecting an appropriate set of basis functions we can capture arbitrary functional relations. Analysis of relative contributions of components to spiking Truccolo W, Eden UT, Fellows MR, Donoghue JP, Brown EN. (2004) J. Neurophys 93:

Monkeys were trained to saccade to one of four targets, based on displayed images. Single cell recording in monkey hippocampus. Dynamic Peristimulus Time Histogram Acknowledgements: Gabriela Czanner

Motivation Characterize trial-by-trial changes in firing properties Neither the PSTH nor the spike count per trial is adequate. PSTH (bin size 10 ms)

Model Parameter vector: Basis functions: –Indicator Functions: –Splines: Assume stochastic continuity between trials:

Results: Simulated Data T = 5 sec, K = 50 trials Data simulated from a model with polynomial basis functions of 4-th order True intensity function Contour plot of the true intensity function

Results: Simulated Data True firing intensity function Naïve estimate of the intensity 20*50 Parameters EM estimate with step- functions 20 State Parameters EM estimate with splines 5 State Parameters

Back to the Data

Results: Real Data Naïve estimate of intensityEM estimate: state-space model with step-functions EM estimate: state-space model with splines

Adding History

Conclusions We can construct and fit (using maximum likelihood) simple generalized linear models that capture the statistical properties of the spike train time series. We used the sample partial correlation function, the distribution of estimators and AIC to suggest the order of the model. AIC and the KS statistic are measures of goodness-of-fit between the model and the data.

Empirical Quantiles Model Quantiles KS and Cross-Validation