The Nathan Kline Institute, NY

The Nathan Kline Institute, NY
Applied Bayesian Inference: A Short Course Lecture 3 – Computational Techniques Kevin H Knuth, Ph.D. Center for Advanced Brain Imaging and Cognitive Neuroscience and Schizophrenia, The Nathan Kline Institute, NY

Outline Exploring the Posterior Probability
Obtaining Answers to Questions Cost Functions MAP estimates Markov Chain Monte Carlo More Examples

Bayesian Inference Bayes' Theorem
describes how our prior knowledge about a model, based on our prior information I, is modified by the acquisition of new information or data: Rev. Thomas Bayes Likelihood Posterior Probability Prior Probability Evidence

Exploring the Posterior
We have looked at one-dimensional problems The data is The signal model is Our estimate is Probability Phase Angle

Exploring the Posterior
And two-dimensional problems... a b Log P

Looking at the Mode The mode of the probability density
is the parameter value at which the peak occurs. This is often called the Maximum A Posteriori (MAP) estimate. It is often most easily found by looking for the parameter value at which the derivative of the probability (or Log p) is zero. The variance can be estimated from the local curvature at the peak One Dimensional Case Multi-Dimensional Case

Looking at the Mean The mean is also commonly reported.
For a symmetric density it is equal to the mode. The variance can be computed by calculating the expected squared deviation from the mean, or sometimes confidence intervals are defined to delineate regions of high probability.

An Example with Difficulties

Stellar Distances from Parallax Hajian, Knuth, Armstrong
Here we consider a “simple” one-dimensional problem. Given a measurement of stellar parallax, infer the distance to the star. D 

What is the Answer?? If we are given a mean and standard deviation for the parallax, we can derive a Gaussian prior probability. The probability for the distance can readily be found by the transform where

What is the Answer?? Resulting in Do we report the mode? The mean?
The most probable distance estimate does not correspond with the best parallax estimate! The result could make a big difference for someone planning a trip Distance - D

Warning! Be careful, the mode of the posterior of a parameter value will not in general correspond to the mode of a transform of that parameter! Often researchers will compute one optimal parameter and perform a transform assuming that the result is optimal. A physics example of this difficulty is a comparison of the frequency spectrum and wavelength spectrum of blackbody radiation.

Cost Functions To obtain a certain answer we must employ additional optimality criteria. By minimizing the expected squared error with respect to our optimized distance we obtain as a “best” estimate of the distance the mean.

Cost Functions Other optimality criteria lead to different “best” estimates. Minimization of the expected absolute value of the error results in the median Whereas, maximization of leads to the mode. The mean minimizes the expected squared deviation of our chosen answer from the correct solution, whereas the mode in the least- squares approximation minimizes the expected squared deviation of our predicted results and the data.

Marginalizing Problems Away
When one has a multi-dimensional posterior where many of the parameters are uninteresting, we can marginalize over those parameters to reduce the dimensionality of the problem. However, rarely can more than 3 or 4 parameters be analytically marginalized over. Also conventional numerical integration techniques often fail due to accumulation of round-off error

Other Difficulties How does one handle Multiply-peaked Distributions?
P(x|data,I) x

Other Difficulties Or other multi-dimensional difficulties

Sampling from the Posterior
There is a way to sample from the posterior using a dynamical system that evolves according to the probability density Markov chain Monte Carlo Allow exploration of high-dimensional parameter spaces These techniques can actually increase in accuracy with higher dimensional problems. Converge slowly - O(n-1/2)

Markov chain Monte Carlo
Start with a position in parameter space X Make a transition to a new point Y with the transition probability T(Y|X) We want the relative occurrences of X and Y to be proportional to the ratio of their probabilities. We can control this by either accepting or rejecting the transition with an acceptance probability . Y X Model Parameter Space - M

Strolling along Hypothesis Space
Metropolis-Hastings Algorithm The transition is performed by incrementing X with a random vector. We employ a symmetric transition probability with an acceptance probability of

Markov chain Monte Carlo
If the step size for the transition is too small, almost every step is accepted, but it takes many steps for the samples to be independent. If the step size is too large then the transitions are rarely accepted. It is best to keep the acceptance rate between 30% and 70%

Running the Simulation
We evolve many (50) simulations simultaneously. In my MCMC I change one parameter at a time. The numerous simulations allow one to adjust the step size for each parameter to make sure that the acceptance rate is high enough. Bad runs (low probability simulations) can be abandoned and good ones duplicated. Duplicated runs will diverge in time.

Simulated Annealing Bring in the data slowly by writing the posterior as By varying  from 0 to 1 we can slowly turn on the effect of the data. = 0 = 0.5 = 1

Benefits of Annealing Now with We define a function so that

Estimating the Evidence
We find so that we have so we can calculate the evidence using simulated annealing. This is sometimes called Thermodynamic Integration.

A MCMC Example

Estimating the BOLD Response Knuth, Ardekani, Helpern ISMRM 2001
Estimate the shape of the hemodynamic response function or Blood Oxygenation Level Dependent (BOLD) response in an event-related functional Magnetic Resonance Imaging (er-fMRI) experiment. Time (sec) Amplitude (arb units)

Modeling the Experiment
We assume a parameterized form of the response (a unit amplitude normalized Gamma function) Amplitude A, time dilation , shape parameter . The peak occurs at time   Time (sec) Amplitude (arb units) Given a series of stimuli at times {1, 2, ...i} we expect the total hemodynamic response to be

Modeling the Experiment
We expect that there will be a baseline voxel intensity, as well as a possible linear drift of the intensity due to the magnetic field drifting during the experiment. The intensity of a voxel is modeled as where a is the linear drift, b is the baseline intensity and n(t) is an unpredictable noise component.

The Data The experiment was a visual oddball experiment where the subject pressed a button in response to the oddball. Data, v(t), was taken from a single voxel in the motor strip. The presentation times of S=36 oddball stimuli are used in modeling the expected response. A total of 256 gradient echo single shot EPI images were obtained using a Siemens 1.5T Magneton Vision system with: TR=2s, TE=60ms, image size of 64x64, 20 axial slices, 5mm slice thickness, and FOV=250mm. Voxel Intensity Time (seconds)

Assigning Probabilities
We have using Bayes’ Theorem Assign a Gaussian for the Likelihood The prior probabilities are Cutoffs are employed as A, ,  are nonnegative,  positive, and  reflects time scales shorter than 30sec

Solving the Problem The posterior is then
we marginalize over  to obtain We still have a five-dimensional space to work in and must use MCMC.

Looking at the Posterior
We can take some slices of the logarithm of the five-dimensional posterior through some likely parameter values... Log Probability Log Probability Amplitude A Baseline Intensity b Slice taken through  = 1  = 5 a = 0 b = 400 Slice taken through A = 1  = 1  = 5 a = 0

Monte Carlo Results MCMC ran for 225,00 iterations, last 25,000 kept.
A = 3.38±1.28 (arb. intensity units)  = 4.84±4.10 (unitless)  = 1.28±1.73s a = ± (arb int/sec) b = ±0.517 (arb int) peak time =   = 2.980.27s (but notice the OTHER histogram peak!)

Monte Carlo Results A density plot of the 25,000 sampled waveforms gives a feel for the response and our uncertainty. A = 3.38±1.28 (arb. intensity units) peak time =   = 2.980.27s

The Nathan Kline Institute, NY

Similar presentations

Presentation on theme: "The Nathan Kline Institute, NY"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Nathan Kline Institute, NY

Similar presentations

Presentation on theme: "The Nathan Kline Institute, NY"— Presentation transcript:

Similar presentations

About project

Feedback