CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks
Joint Probability
Marginal Probability
Conditional Probability
The Chain Rule I
Bayes’ Rule
More Bayes’ Rule
The Chain Rule II
Independence
Example: Independence
Example: Independence?
Conditional Independence
The Chain Rule III
Expectations
Expectations
Estimation
Estimation Problems with maximum likelihood estimates: - If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? - What if I flip it 50 times with 27 heads? - What if I flip 10M times with 8M heads? Basic idea: - We have some prior expectation about parameters (here, the probability of heads) - Given little evidence, we should skew toward prior - Given lots of evidence, we should listen to data How can we accomplish this? Stay tuned!
Lewis Carroll's Pillow Problem
Bayesian Networks: Big Picture Two big problems with joint probability distributions: - Unless there are only a few variables, the distribution is too big to represent explicitly (Why?) - Hard to estimate anything empirically about more than a few variables at a time (Why?) Hard to compute answers to queries of the form P(y | a) (Why?) Bayesian networks are a technique for describing complex joint distributions (models) using a bunch of simple, local distributions - It describes how variables interact locally - Local interactions chain together to give global, indirect interactions - For about 10 min, we’ll be very vague about how these interactions are specified
Graphical Model Notation
Example: Coin Flips
Example: Traffic
Example: Traffic II
Example: Alarm Network
Bayesian Network Semantics
Example: Alarm Network
Size of a Bayes’ Net How big is a joint distribution over N Boolean variables? 2N How big is a Bayes net if each node has k parents? N 2k Both give you the power to calculate P(X1,X2,…,Xn) Bayesian Networks = Huge space savings! Also easier to elicit local CPTs Also turns out to be faster to answer queries (future class)
Building the (Entire) Joint
Example: Traffic
Example: Reverse Traffic
Causality? When Bayes’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and RoofDrips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology really encodes conditional independencies
Creating Bayes’ Nets So far, we talked about how any fixed Bayes’ net encodes a joint distribution Next: how to represent a fixed distribution as a Bayes’ net Key ingredient: conditional independence The exercise we did in “causal” assembly of BNs was a kind of intuitive use of conditional independence Now we have to formalize the process After that: how to answer queries (inference)
Conditional Independence