Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Lecture 5 Reminder of probability theory Bayes rule Bayesian networks
So why is Bayes rule relevant to Cognitive CV? Provides a well-founded methodology for reasoning with uncertainty These methods are the basis for our model of perception guided by expectation We can develop well-founded methods of learning rather than just being stuck with hand- coded models
Bayes rule: dealing with uncertainty Rev. THOMAS BAYES Sources of uncertainty e.g.: – ignorance – complexity – physical randomness – vagueness Use probability theory to reason about uncertainty Be careful to understand what you mean by probability and use it consistently – frequency analysis – belief
Probability theory - reminder p(x): single continuous value in the range [0,1]. Think of either as “x is true in 0.7 of cases” (frequentist) of “I believe x = true with probability 0.7” P(X): often (but not always) used to denote a distribution over a set of values, e.g. if X is discrete {x=true, x=false} then P(X) encompasses knowledge of both values. p(x=true) is then a single value.
Probability theory - reminder Joint probability Conditional probability
Probability theory - reminder Conditional independence Marginalising
Bayes rule – the basics Y X BAYES RULE
Bayes rule – the basics As an illustration, let’s look at the conditional probability of a hypothesis H based on some evidence E
Bayes rule – example Consider a vision system used to detect zebra in static images It has a “stripey area” operator to help it do this (the evidence E) Let p(h=zebra present) = 0.02 (prior established during training) Assume the “stripey area” operator is discrete valued (true/false) Let p(e=true|h=true)=0.8 (it’s a fairly good detector) Let p(e=true|h=false)=0.1 (there are non-zebra items with stripes in the data set – like the gate) Given e, we can establish p(h=true|e=true) …
Bayes rule – example Note that this is an increase over the prior = 0.02 due to the evidence e
Interpretation Despite our intuition, our detector does not seem very “good” Remember, only 1 in 50 images had a zebra That means that 49 out of 50 do not contain a zebra and the detector is not 100% reliable. Some of these images will be incorrectly determined as having a zebra Failing to account for “negative” evidence properly is a typical failing of human intuitive reasoning
Moving on … Human intuition is not very Bayesian (e.g. Kahneman et al., 1982). Be sure to apply Bayes theory correctly Bayesian networks help us to organise our thinking clearly Causality and Bayesian networks are related
Bayesian networks A E D B C Compact representation of the joint probability over a set of variables Each variable is represented as a node. Each variable can be discrete or continuous Conditional independence assumptions are encoded using a set of arcs Set of nodes and arcs is referred to as a graph No arcs imply nodes are conditionally independent of each other Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG)
Bayesian networks - terminology A E D B C A is called a root node and has a prior only B,D, and E are called leaf nodes A “causes” B and “causes” C. So value of A determines value of B and C A is the parent nodes of B and C B and C are child nodes of A To determine E, you need only to know C. E is conditionally independent of A given C
Encoding conditional independence ABC FACTORED REPRESENTATION
Specifying the Conditional Probability Terms (1) For a discrete node C with discrete parents A and B, the conditional probability term P(C|A,B) can be represented as a value table a=b=p(c=T|A,B) redT0.2 redF0.1 greenT0.6 greenF0.3 blueT0.99 blueF0.05 A C B {red,green,blue} {true,false}
Specifying the Conditional Probability Terms (2) For a continuous node C with continuous parents A and B, the conditional probability term P(C|A,B) can be represented as a function A C B A B p(c|A,B)
Specifying the Conditional Probability Terms (3) For a continuous node C with 1 continuous parent A and and 1 discrete parent B, the conditional probability term P(C|A,B) can be represented as a set of functions (the continuous function is selected according to a “context” determined by B A p(c|A,B) A C B {true,false}
Directed Acyclic Graph (DAG) A E D B C Arcs encode “causal” relationships between nodes No more than 1 path (regardless of arc direction) between any node and any other node If we added dotted red arc, we would have a loopy graph Loopy graphs can be approximated by acyclic ones for inference, but this is outside the scope of this course
Inference and Learning Inference – Calculating a probability over a set of nodes given the values of other nodes – Two most useful modes of inference are PREDICTIVE (from root to leaf) and DIAGNOSTIC (from leaf to root) Exact and approximate methods – Exact methods exist for Directed Acyclic Graphs (DAGs) – Approximations exists for other graph types
Summary Bayes rule allows us to deal with uncertain data Bayesian networks encode conditional independence. Simple DAGs can be used n causal and diagnostic modes
Next time … Examples of inference using Bayesian Networks A lot of excellent reference material on Bayesian reasoning can be found at: