Some Basic Aspects of Perceptual Inference Under Uncertainty

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
AI – CS364 Uncertainty Management Introduction to Uncertainty Management 21 st September 2006 Dr Bogdan L. Vrusias
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
AI – CS364 Uncertainty Management 26 th September 2006 Dr Bogdan L. Vrusias
Visual Recognition Tutorial
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 22, 2010.
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
Statistical Background
Inference: Conscious and Unconscious J. McClelland SymSys 100 April 9, 2009.
Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Inference: Conscious and Unconscious J. McClelland SymSys 100 April 20, 2010.
The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 7, 2011.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Uncertainty Management in Rule-based Expert Systems
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.
Probabilistic Inference: Conscious and Unconscious J. McClelland SymSys 100 April 5, 2011.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Bayesian inference in neural networks
Oliver Schulte Machine Learning 726
Chapter 7. Classification and Prediction
Does Naïve Bayes always work?
Copyright © Cengage Learning. All rights reserved.
Bayes Net Learning: Bayesian Approaches
Data Mining Lecture 11.
Bayesian inference in neural networks
Simple learning in connectionist networks
Chapter 9 Hypothesis Testing.
Hidden Markov Models Part 2: Algorithms
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Copyright © Cengage Learning. All rights reserved.
Chapter 9 Hypothesis Testing.
Discrete Event Simulation - 8
Probability Topics Random Variables Joint and Marginal Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Choosing Goals, Not Rules: Deciding among Rule-Based Action Plans
CS 188: Artificial Intelligence Fall 2008
Integration of sensory modalities
Connectionist Units, Probabilistic and Biologically Inspired
Probabilistic Population Codes for Bayesian Decision Making
The Naïve Bayes (NB) Classifier
Banburismus and the Brain
Confidence as Bayesian Probability: From Neural Origins to Behavior
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Probabilistic Reasoning With Bayes’ Rule
28th September 2005 Dr Bogdan L. Vrusias
Simple learning in connectionist networks
Machine Learning: UNIT-3 CHAPTER-1
Recuperação de Informação B
Mathematical Foundations of BME Reza Shadmehr
Independence and Counting
Probabilistic Reasoning With Bayes’ Rule
Mathematical Foundations of BME Reza Shadmehr
Independence and Counting
Probabilistic Reasoning With Bayes’ Rule
Independence and Counting
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
Certainty Factor Model
Presentation transcript:

Some Basic Aspects of Perceptual Inference Under Uncertainty Psych 209 Jan 9, 2013

Example H = “it has just been raining” E = “the ground is wet” What is the probability that H is true, given E? Assume we already believe: P(H) = .2; P(~H) = .8 P(E|H) = .9; P(E|~H) = .01 We want to calculate p(H|E) Can we derive a formula to do so?

Derivation of Bayes Formula By the definition of conditional probability: p(H|E) = p(H&E)/p(E) p(E|H) = p(H&E)/p(H) So p(E|H)p(H) = p(H&E) Substituting in the first line, we obtain p(H|E) = p(E|H)p(H)/p(E) (1) What is p(E)? p(E) = p(H&E) + p(~H&E) = p(E|H)p(H) + p(E|~H)p(~H) Substitute the last expression into (1) and we have Bayes formula: P(H|E) = p(E|H)p(H) p(E|H)p(H) + p(E|~H)p(~H)

Example Assumptions: P(H) = .2; P(~H) = .8 P(E|H) = .9; P(E|~H) = .01 Then what is p(H|E), the probability that it has just been raining, given that the ground is wet? (.9*.2)/((.9*.2) + (.01*.8)) = (.18)/(.18+.008) = ~.96 Visualization (on board) What happens if we change our beliefs about: P(H)? P(E|H)? p(E|~H)?

Extension to N Alternatives

Posterior Ratios The ratio p(hi|e)/p(hj|e) can be expressed as: p(hi|e)/p(hj|e) = (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) These ratios are indifferent to the number of alternatives taking logs log(p(hi|e)/p(hj|e)) = log(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj))

Morton’s use of the logit

Odds Ratio Version of Bayes Formula For the 2-alternative case we can re-express p(hi|e): p(hi|e) = (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) +1 Using logs and exponentials: p(hi|e) = elog(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj)) elog(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj)) + 1

How Should we Combine Two or More Sources of Evidence? Two different sources of evidence E1 and E2 are conditionally independent given the state of H, iff p(E1&E2|H) = p(E1|H)p(E2|H) and p(E1&E2|~H) = p(E1|~H)p(E2|~H) Suppose p(H), p(E1|H) and p(E1|~H) are as before and E2 = ‘The sky is blue’; p(E2|H) = .02; p(E2|~H) = .5 Assuming conditional independence we can substitute into Bayes’ rule to determine that: p(H|E1&E2) = .9 x .02 x .2 = .47 .9 x .02 x .2 + .01 x .5 X .8 In case of N sources of evidence, all conditionally independent under H, then we get: p(E|H) = Pj p(Ej|H)

Conditional Independence in the Generative Model of Letter Feature Displays A letter is chosen for display Features are then chosen for display independently for each letter, but with noise

How this relates to connectionist units (or populations of neurons) We treat the activation of the unit as corresponding to the instantaneous normalized firing rate of a neural population. The baseline activation of the unit is thought to depend on a constant background input called its ‘bias’. When other units are active, their influences are combined with the bias to yield a quantity called the ‘net input’. The influence of a unit j on another unit i depends on the activation of j and the weight or strength of the connection to i from j. Connection weights can be positive (excitatory) or negative (inhibitory). These influences are summed to determine the net input to unit i: neti = biasi + Sjajwij where aj is the activation of unit j, and wij is the strength of the connection to unit i from unit j. Input from unit j wij unit i

A Unit’s Activation can Reflect P(H|E) The activation of unit i given its net input neti is assumed to be given by: ai = exp(neti) 1 + exp(neti) This function is called the ‘logistic function’. It is usually written in the numerically identical form: ai = 1/[1 + exp(-neti)] In the reading we showed that ai = p(Hi|E) iff aj = 1 when Ej is present, or 0 when Ej is absent; wij = log(p(Ej|Hi)/p(Ej|~Hi); biasi = log(p(Hi)/p(~Hi)) This assumes the evidence is conditionally independent given the state of H. ai neti

Choosing between N alternatives Often we are interested in cases where there are several alternative hypotheses (e.g., different directions of motion of a field of dots). Here we have a situation in which the alternatives to a given H, say H1, are the other hypotheses, H2, H3, etc. In this case, the probability of a particular hypothesis given the evidence becomes: P(Hi|E) = p(E|Hi)p(Hi) Si’p(E|Hi’)p(Hi’) The normalization implied here can be performed by computing net inputs as before but now setting each unit’s activation according to: ai = exp(neti) Si’exp(neti’) This normalization effect is approximated by lateral inhibition mediated by inhibitory interneurons (shaded unit in illustration). H E

‘Cue’ Integration in Monkeys Saltzman and Newsome (1994) combined two ‘cues’ to the perception of motion: Partially coherent motion in a specific direction Direct electrical stimulation of neurons in area MT They measured the probability of choosing each direction with and without stimulation at different levels of coherence (next slide).

Model used by S&N: S&N applied a model that is structurally identical to the one we have been discussing: Pj = exp(yj)/Si’exp(yj’) yj = bj + mjzj + gdx bj = bias for direction j mj = effect of micro-stimulation zi = 1 if stimulation was applied, 0 otherwise gd = support for j when motion is in that direction (d=1) or other more disparate directions (d=2,3,4,5) x = motion coherence Open circles above show effect of presenting visual stimulation in one direction (using an intermediate coherence) together with electrical stimulation favoring a direction 135° away from the visual stimulus. Dip between peaks rules out simple averaging of the directions cued by visual and electrical stimulation but is approximately consistent with the Bayesian model (filled circles).