Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Tests of Hypotheses Based on a Single Sample
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Introduction to Mathematical Methods in Neurobiology: Dynamical Systems Oren Shriki 2009 Modeling Conductance-Based Networks by Rate Models 1.
AI – CS364 Uncertainty Management 26 th September 2006 Dr Bogdan L. Vrusias
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 22, 2010.
Cognitive modelling (Cognitive Science MSc.) Fintan Costello
Probability theory Much inspired by the presentation of Kren and Samuelsson.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Lecture 6 The dielectric response functions. Superposition principle.
INTEGRALS 5. INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area.  We also saw that it arises when we try to find.
Evaluating Hypotheses
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
Statistical Background
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Inference: Conscious and Unconscious J. McClelland SymSys 100 April 9, 2009.
Stochastic Interactive Activation and Interactive Activation in the Brain PDP Class January 20, 2010.
Chapter 9 Hypothesis Testing.
Copyright © Cengage Learning. All rights reserved. 5 Integrals.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Business Statistics - QBM117 Introduction to hypothesis testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Significance Tests: THE BASICS Could it happen by chance alone?
Inference: Conscious and Unconscious J. McClelland SymSys 100 April 20, 2010.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 7, 2011.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
Copyright © Cengage Learning. All rights reserved. 4 Integrals.
Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Principled Probabilistic Inference and Interactive Activation Psych209 January 25, 2013.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
INTEGRALS We saw in Section 5.1 that a limit of the form arises when we compute an area. We also saw that it arises when we try to find the distance traveled.
5 INTEGRALS.
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Basic Review - continued tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Neural Codes. Neuronal codes Spiking models: Hodgkin Huxley Model (brief repetition) Reduction of the HH-Model to two dimensions (general) FitzHugh-Nagumo.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Probabilistic Inference: Conscious and Unconscious J. McClelland SymSys 100 April 5, 2011.
1 Neural Codes. 2 Neuronal Codes – Action potentials as the elementary units voltage clamp from a brain cell of a fly.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Bayesian inference in neural networks
3. The X and Y samples are independent of one another.
Bayesian inference in neural networks
Chapter 9 Hypothesis Testing.
Choosing Goals, Not Rules: Deciding among Rule-Based Action Plans
Connectionist Units, Probabilistic and Biologically Inspired
Some Basic Aspects of Perceptual Inference Under Uncertainty
Probabilistic Population Codes for Bayesian Decision Making
The Naïve Bayes (NB) Classifier
Confidence as Bayesian Probability: From Neural Origins to Behavior
Mathematical Foundations of BME Reza Shadmehr
Volume 23, Issue 21, Pages (November 2013)
Certainty Factor Model
Presentation transcript:

Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010

How Neurons in Perceptual Systems Might Carry Out Perceptual ‘Inferences’ Each neuron (or collection of neurons) is treated as standing for an hypothesis about what is out there in the world: –An oriented line segment at a particular point in space –Something moving in a certain direction –A monkey’s paw Note that a given object or scene might be characterized by a number of hypotheses; there might or might not be a separate ‘grandmother’ hypothesis. We treat the firing rate of each neuron as corresponding to the degree of belief in the hypothesis it participates in representing, given the available evidence, expressed mathematically as P(H|E) Question: How can we compute P(H|E)?

Example H = “it has just been raining” E = “the ground is wet” Assume we already believe: –P(H) =.2; P(~H) =.8 –P(E|H) =.9; P(E|~H) =.01 Then what is p(H|E), the probability that it has just been raining, given that the ground is wet?

Theory of Perceptual Inference: How Can we Compute p(H|E)? Bayes’ Rule provides a formula: P(H|E) = p(E|H)p(H) p(E|H)p(H) + p(E|~H)p(~H) where –P(H) is the prior probability of the hypothesis, H –P(E|H) is the probability of the evidence, given H –P(~H) is the prior probability that the hypothesis is false (and is equal to (1-P(H)) –P(E|~H) is the probability of the evidence, given that the hypothesis is false. Bayes rule follows from the definition of conditional probability

Derivation of Bayes Rule p(H|E) = p(H&E)/p(E) p(E|H) = p(H&E)/p(H) So p(E|H)p(H) = p(H&E) Substituting in the first line, we obtain p(H|E) = p(E|H)p(H)/p(E) What is p(E)? P(E) = p(H&E) + p(~H&E) = p(E|H)p(H) + p(E|~H)p(~H)

Example H = “it has just been raining” E = “the ground is wet” Assume we believe: –P(H) =.2; P(~H) =.8 –P(E|H) =.9; P(E|~H) =.01 Then what is p(H|E), the probability that it has just been raining, given that the ground is wet? (.9*.2)/((.9*.2) + (.01*.8)) = (.18)/( ) = ~.96 What happens if we change our beliefs about: –P(H)? P(E|H)? p(E|~H)?

How Should we Combine Two or More Sources of Evidence? Two different sources of evidence E 1 and E 2 are conditionally independent given the state of H, iff p(E 1 &E 2 |H) = p(E 1 |H)p(E 2 |H) p(E 1 &E 2 |~H) = p(E 1 |~H)p(E 2 |~H) Suppose p(H), p(E 1 |H) and p(E 1 |~H) are as before and E 2 = ‘The sky is blue’; p(E 2 |H) =.02; p(E 2 |~H) =.5 Assuming conditional independence we can substitute into Bayes’ rule to determine that: p(H|E 1 &E 2 ) =.9 x.02 x.2 =.47.9 x.02 x x.5 X.8 In case of N sources of evidence, all conditionally independent under H, then we get: p(E|H) =  j p(E j |H)

Conditional and Unconditional Independence Two variables (here, x and y) are ‘(unconditionally) independent’ iff p(x&y) = p(x)p(y) for all x,y. Two variables are ‘conditionally independent’ given a third (z) iff p(x&y|z) = p(x|z)p(y|z). The variables x and y are unconditionally independent in one of the graphs above. In the other graph, they are conditionally independent given the ‘category’ they are chosen from, where this is represented by the symbol used on the data point, but they are not unconditionally independent.

How this relates to neurons It is common to consider a neuron to have an activation value corresponding to its instantaneous firing rate or p(spike) per unit time. The baseline firing rate of the neuron is thought to depend on a constant background input called its ‘bias’. When other neurons are active, their influences are combined with the bias to yield a quantity called the ‘net input’. The influence of a neuron j on another neuron i depends on the activation of j and the weight or strength of the connection to i from j. Note that connection weights can be positive (excitatory) or negative (inhibitory). These influences are summed to determine the net input to neuron i: net i = bias i +  j a j w ij where a j is the activation of neuron j, and w ij is the strength of the connection to unit i from unit j. Neuron i Input from neuron j w ij

A Neuron’s Activation can Reflect P(H|E) The activation of neuron i given its net input net i is assumed to be given by: a i = exp(net i ) 1 + exp(net i ) This function is called the ‘logistic function’. It is easy to show that a i = p(H i |E) iff a j = 1 when E j is present, or 0 when E j is absent; w ij = log(p(E j |H i )/p(E j |~H i ); bias i = log(p(H i )/p(~H i )) In short, idealized neurons using the logistic activation function can compute the probability of the hypothesis they stand for, given the evidence represented in their inputs, if their weights and biases have the appropriate values. aiai net i

Accurately Coding Probability in a Short Interval of Time If p(spike per 10 msec) = p(H|E) then having a single neuron to represent a hypothesis would make it difficult to get a clear estimate of P(H|E) within, say, 100 msec. However, suppose many (say, 10,000) neurons each encode the same hypothesis, and suppose that they produce spikes independently of each other (but based on the same p(H|E)). Then the number of spikes summed over the population would provide a very close approximation of p(H|E) even in a brief interval such as 10 msec.

Information Integration in Human Perception: The McGurk Effect (McGurk & MacDonald, 1976, Nature 264, ) First listen to clip with your eyes closed. Then listen again with eyes open. What you see appears to influence what you hear. What your hear probably sounds like ‘ba’ by itself. What does it sound like when you watch the face? Most people hear ‘da’ or ‘tha’. McGurk effect movie from USC.

Application of the model to a McGurk experiment Massaro et al (2001) performed an experiment in which subjects received auditory inputs ranging from a good “ba” sound to a good “da” sound and visual speech inputs ranging from a good “ba” to a good “da”. The results are consistent with the model we have been describing, with auditory and visual input treated as conditionally independent sources of evidence for the identity of the spoken syllable. Note that when the auditory input is at either extreme, the visual input has little or no effect. These are examples of ‘floor’ and ‘ceiling’ effects that are often found in experiments. The model explains why the effect of each variable is only found at moderate values of the other variable.

Choosing between N alternatives Often we are interested in cases where there are several alternative hypotheses (e.g., different directions of motion of a field of dots). Here we have a situation in which the alternatives to a given H, say H 1, are the other hypotheses, H 2, H 3, etc. In this case, the probability of a particular hypothesis given the evidence becomes: P(H i |E) = p(E|H i )p(H i )  i’ p(E|H i’ )p(H i’ ) The normalization implied here can be performed by computing net inputs as before but now setting each unit’s activation according to: a i = exp(net i )  i’ exp(net i’ ) This normalization effect is approximated by lateral inhibition mediated by inhibitory interneurons (shaded unit in illustration). H E

‘Cue’ Integration in Monkeys Saltzman and Newsome (1994) combined two cues to the perception of motion: –Partially coherent motion in a specific direction –Direct electrical stimulation of neurons in area MT They measured the probability of choosing each direction with and without stimulation at different levels of coherence (next slide).

Model used by S&N: S&N applied a model that is structurally identical to the one we have been discussing: –P j = exp(y j )/  i’ exp(y j’ ) –y j = b j +  j z j +  d x –b j = bias for direction j –  j = effect of micro-stimulation –z i = 1 if stimulation was applied, 0 otherwise –  d = support for j when motion is in that direction (d=1) or other more disparate directions (d=2,3,4,5) –x = motion coherence Open circles above show effect of presenting visual stimulation in one direction (using an intermediate coherence) together with electrical stimulation favoring a direction 135° away from the visual stimulus. Dip between peaks rules out simple averaging of the directions cued by visual and electrical stimulation but is approximately consistent with the Bayesian model (filled circles).