Some Basic Aspects of Perceptual Inference Under Uncertainty

Slides:

Advertisements

Similar presentations

Tests of Hypotheses Based on a Single Sample

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

AI – CS364 Uncertainty Management Introduction to Uncertainty Management 21 st September 2006 Dr Bogdan L. Vrusias

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

AI – CS364 Uncertainty Management 26 th September 2006 Dr Bogdan L. Vrusias

Visual Recognition Tutorial

Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 22, 2010.

CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.

Statistical Background

Inference: Conscious and Unconscious J. McClelland SymSys 100 April 9, 2009.

Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Inference: Conscious and Unconscious J. McClelland SymSys 100 April 20, 2010.

The Interactive Activation Model. Ubiquity of the Constraint Satisfaction Problem In sentence processing –I saw the grand canyon flying to New York –I.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Brain Mechanisms of Unconscious Inference J. McClelland Symsys 100 April 7, 2011.

1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.

Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,

Uncertainty Management in Rule-based Expert Systems

Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.

Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.

Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.

Probabilistic Inference: Conscious and Unconscious J. McClelland SymSys 100 April 5, 2011.

Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.

Bayesian inference in neural networks

Oliver Schulte Machine Learning 726

Chapter 7. Classification and Prediction

Does Naïve Bayes always work?

Copyright © Cengage Learning. All rights reserved.

Bayes Net Learning: Bayesian Approaches

Data Mining Lecture 11.

Bayesian inference in neural networks

Simple learning in connectionist networks

Chapter 9 Hypothesis Testing.

Hidden Markov Models Part 2: Algorithms

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Copyright © Cengage Learning. All rights reserved.

Chapter 9 Hypothesis Testing.

Discrete Event Simulation - 8

Probability Topics Random Variables Joint and Marginal Distributions

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Choosing Goals, Not Rules: Deciding among Rule-Based Action Plans

CS 188: Artificial Intelligence Fall 2008

Integration of sensory modalities

Connectionist Units, Probabilistic and Biologically Inspired

Probabilistic Population Codes for Bayesian Decision Making

The Naïve Bayes (NB) Classifier

Banburismus and the Brain

Confidence as Bayesian Probability: From Neural Origins to Behavior

Parametric Methods Berlin Chen, 2005 References:

Machine Learning: Lecture 6

Probabilistic Reasoning With Bayes’ Rule

28th September 2005 Dr Bogdan L. Vrusias

Simple learning in connectionist networks

Machine Learning: UNIT-3 CHAPTER-1

Recuperação de Informação B

Mathematical Foundations of BME Reza Shadmehr

Independence and Counting

Probabilistic Reasoning With Bayes’ Rule

Mathematical Foundations of BME Reza Shadmehr

Independence and Counting

Probabilistic Reasoning With Bayes’ Rule

Independence and Counting

Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.

Certainty Factor Model

Presentation transcript:

Some Basic Aspects of Perceptual Inference Under Uncertainty Psych 209 Jan 9, 2013

Example H = “it has just been raining” E = “the ground is wet” What is the probability that H is true, given E? Assume we already believe: P(H) = .2; P(~H) = .8 P(E|H) = .9; P(E|~H) = .01 We want to calculate p(H|E) Can we derive a formula to do so?

Derivation of Bayes Formula By the definition of conditional probability: p(H|E) = p(H&E)/p(E) p(E|H) = p(H&E)/p(H) So p(E|H)p(H) = p(H&E) Substituting in the first line, we obtain p(H|E) = p(E|H)p(H)/p(E) (1) What is p(E)? p(E) = p(H&E) + p(~H&E) = p(E|H)p(H) + p(E|~H)p(~H) Substitute the last expression into (1) and we have Bayes formula: P(H|E) = p(E|H)p(H) p(E|H)p(H) + p(E|~H)p(~H)

Example Assumptions: P(H) = .2; P(~H) = .8 P(E|H) = .9; P(E|~H) = .01 Then what is p(H|E), the probability that it has just been raining, given that the ground is wet? (.9*.2)/((.9*.2) + (.01*.8)) = (.18)/(.18+.008) = ~.96 Visualization (on board) What happens if we change our beliefs about: P(H)? P(E|H)? p(E|~H)?

Extension to N Alternatives

Posterior Ratios The ratio p(hi|e)/p(hj|e) can be expressed as: p(hi|e)/p(hj|e) = (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) These ratios are indifferent to the number of alternatives taking logs log(p(hi|e)/p(hj|e)) = log(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj))

Morton’s use of the logit

Odds Ratio Version of Bayes Formula For the 2-alternative case we can re-express p(hi|e): p(hi|e) = (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) (p(hi)/p(hj)) (p(e|hi)/p(e|hj)) +1 Using logs and exponentials: p(hi|e) = elog(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj)) elog(p(hi)/p(hj)) + log(p(e|hi)/p(e|hj)) + 1

How Should we Combine Two or More Sources of Evidence? Two different sources of evidence E1 and E2 are conditionally independent given the state of H, iff p(E1&E2|H) = p(E1|H)p(E2|H) and p(E1&E2|~H) = p(E1|~H)p(E2|~H) Suppose p(H), p(E1|H) and p(E1|~H) are as before and E2 = ‘The sky is blue’; p(E2|H) = .02; p(E2|~H) = .5 Assuming conditional independence we can substitute into Bayes’ rule to determine that: p(H|E1&E2) = .9 x .02 x .2 = .47 .9 x .02 x .2 + .01 x .5 X .8 In case of N sources of evidence, all conditionally independent under H, then we get: p(E|H) = Pj p(Ej|H)

Conditional Independence in the Generative Model of Letter Feature Displays A letter is chosen for display Features are then chosen for display independently for each letter, but with noise

How this relates to connectionist units (or populations of neurons) We treat the activation of the unit as corresponding to the instantaneous normalized firing rate of a neural population. The baseline activation of the unit is thought to depend on a constant background input called its ‘bias’. When other units are active, their influences are combined with the bias to yield a quantity called the ‘net input’. The influence of a unit j on another unit i depends on the activation of j and the weight or strength of the connection to i from j. Connection weights can be positive (excitatory) or negative (inhibitory). These influences are summed to determine the net input to unit i: neti = biasi + Sjajwij where aj is the activation of unit j, and wij is the strength of the connection to unit i from unit j. Input from unit j wij unit i

A Unit’s Activation can Reflect P(H|E) The activation of unit i given its net input neti is assumed to be given by: ai = exp(neti) 1 + exp(neti) This function is called the ‘logistic function’. It is usually written in the numerically identical form: ai = 1/[1 + exp(-neti)] In the reading we showed that ai = p(Hi|E) iff aj = 1 when Ej is present, or 0 when Ej is absent; wij = log(p(Ej|Hi)/p(Ej|~Hi); biasi = log(p(Hi)/p(~Hi)) This assumes the evidence is conditionally independent given the state of H. ai neti

Choosing between N alternatives Often we are interested in cases where there are several alternative hypotheses (e.g., different directions of motion of a field of dots). Here we have a situation in which the alternatives to a given H, say H1, are the other hypotheses, H2, H3, etc. In this case, the probability of a particular hypothesis given the evidence becomes: P(Hi|E) = p(E|Hi)p(Hi) Si’p(E|Hi’)p(Hi’) The normalization implied here can be performed by computing net inputs as before but now setting each unit’s activation according to: ai = exp(neti) Si’exp(neti’) This normalization effect is approximated by lateral inhibition mediated by inhibitory interneurons (shaded unit in illustration). H E

‘Cue’ Integration in Monkeys Saltzman and Newsome (1994) combined two ‘cues’ to the perception of motion: Partially coherent motion in a specific direction Direct electrical stimulation of neurons in area MT They measured the probability of choosing each direction with and without stimulation at different levels of coherence (next slide).

Model used by S&N: S&N applied a model that is structurally identical to the one we have been discussing: Pj = exp(yj)/Si’exp(yj’) yj = bj + mjzj + gdx bj = bias for direction j mj = effect of micro-stimulation zi = 1 if stimulation was applied, 0 otherwise gd = support for j when motion is in that direction (d=1) or other more disparate directions (d=2,3,4,5) x = motion coherence Open circles above show effect of presenting visual stimulation in one direction (using an intermediate coherence) together with electrical stimulation favoring a direction 135° away from the visual stimulus. Dip between peaks rules out simple averaging of the directions cued by visual and electrical stimulation but is approximately consistent with the Bayesian model (filled circles).