Probabilistic Reasoning We’ve looked at reasoning using logic expressions. The search space is exponential. Probabilistic reasoning uses other techniques that allow faster execution and estimate the solutions using probability theory. We’ll start with Enumeration, Markov Models, and Hidden Markov Models.
Example: Inference in Ghostbusters A ghost is in the grid somewhere Sensor readings tell how close a square is to the ghost On the ghost: red 1 or 2 away: orange 3 or 4 away: yellow 5+ away: green Sensors are noisy, but we know P(Color | Distance) P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3 [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Uncertainty General situation: Observed variables (evidence): Agent knows certain things about the state of the world (e.g., sensor readings or symptoms) Unobserved variables: Agent needs to reason about other aspects (e.g. where an object is or what disease is present) Model: Agent knows something about how the known variables relate to the unknown variables Probabilistic reasoning gives us a framework for managing our beliefs and knowledge
Where could this be used instead of Ghostbusters? computer vision
Probabilistic Models Distribution over T,W Constraint over T,W A probabilistic model is a joint distribution over a set of random variables Probabilistic models: (Random) variables with domains Assignments are called outcomes Joint distributions: say whether assignments (outcomes) are likely Normalized: sum to 1.0 Ideally: only certain variables directly interact vs Constraint satisfaction problems: Variables with domains Constraints: state whether assignments are possible T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 Constraint over T,W T W P hot sun rain F cold
Events An event is a set E of outcomes W P hot sun 0.4 rain 0.1 cold 0.2 0.3 An event is a set E of outcomes From a joint distribution, we can calculate the probability of any event Probability that it’s hot AND sunny? Probability that it’s hot? Probability that it’s hot OR sunny? Typically, the events we care about are partial assignments, like P(T=hot)
Marginal Distributions Marginal distributions are sub-tables which eliminate variables Marginalization (summing out): Combine collapsed rows by adding T P hot 0.5 cold T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.6 rain 0.4
Conditional Distributions Conditional distributions are probability distributions over some variables given fixed values of others Conditional Distributions Joint Distribution W P sun 0.8 rain 0.2 T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.4 rain 0.6
Normalization SELECT the joint probabilities matching the evidence NORMALIZE the selection (make it sum to one) T W P hot sun 0.4 rain 0.1 cold 0.2 0.3 W P sun 0.4 rain 0.6 T W P cold sun 0.2 rain 0.3
Probability Recap Conditional probability Product rule Chain rule X, Y independent if and only if: X and Y are conditionally independent given Z if and only if:
Inference by Enumeration * Works fine with multiple query variables, too General case: Evidence variables: Query* variable: Hidden variables: We want: All variables Step 1: Select the entries consistent with the evidence Step 2: Sum out H to get joint of Query and evidence Step 3: Normalize
Inference by Enumeration S T W P summer hot sun 0.30 rain 0.05 cold 0.10 winter 0.15 0.20 P(W)? P(W | winter)? P(W | winter, hot)?
Inference by Enumeration Obvious problems: If there are n variables and each one has d values. Worst-case time complexity O(dn) Space complexity O(dn) to store the joint distribution
Bayes’ Rule Two ways to factor a joint distribution over two variables: Dividing, we get: Why is this at all helpful? Lets us build one conditional from its reverse Often one conditional is tricky but the other one is simple Foundation of many systems In the running for most important AI eqn! That’s my rule!
Inference with Bayes’ Rule Example: Diagnostic probability from causal probability: Example: M: meningitis, S: stiff neck Example givens use of Bayes’ rule = .00008/.01007 = .0079
Ghostbusters, Revisited We are given two distributions: Prior distribution over ghost location: P(G) Let’s say this is uniform Sensor reading model: P(R | G) Given: we know what our sensors do R = reading color measured at square (1,1) And if we know P(R = yellow | G=(1,1)) = 0.1 We can calculate the posterior distribution P(G|r) over ghost locations given a reading using Bayes’ rule:
Now we have the machinery for: Markov Models Please retain proper attribution, including the reference to ai.berkeley.edu. Thanks! Note: pretty short lecture, good one to present mini-contest results or anything else not exactly lecture. [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Reasoning over Time or Space Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Need to introduce time (or space) into our models
Markov Models Value of X at a given time is called the state Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) Stationarity assumption: transition probabilities the same at all times X1 X2 X3 X4
Joint Distribution of a Markov Model X1 X2 X3 X4 Joint distribution: More generally:
Chain Rule and Markov Models X1 X2 X3 X4 chain of 4 states From the chain rule, every joint distribution over can be written as: Assuming that and results in the expression posited on the previous slide:
Chain Rule and Markov Models X1 X2 X3 X4 chain of T states From the chain rule, every joint distribution over can be written as: Assuming that for all t: gives us the expression posited on the earlier slide:
Implied Conditional Independencies X1 X2 X3 X4 We assumed: and Do we also have ? Yes! Proof:
Markov Models Recap Explicit assumption for all t : Consequence, joint distribution can be written as: Implied conditional independencies: Past variables independent of future variables given the present i.e., if or then:
Example Markov Chain: Weather States: X = {rain, sun} Initial distribution: 1.0 sun CPT P(Xt | Xt-1): Two new ways of representing the same CPT Xt-1 Xt P(Xt|Xt-1) sun 0.9 rain 0.1 0.3 0.7 0.9 0.3 0.9 rain sun sun rain 0.1 0.3 0.7 0.7 0.1
Example Markov Chain: Weather States: X = {rain, sun} Initial distribution: 1.0 sun CPT P(Xt | Xt-1): Two new ways of representing the same CPT Xt-1 Xt P(Xt|Xt-1) sun 0.9 rain 0.1 0.3 0.7 0.9 0.3 0.9 0.3 rain sun 0.1 0.7 Engineering Representation 0.7 0.1
Example Markov Chain: Weather Initial distribution: 1.0 sun (Given) What is the probability distribution after one step? What happens for the second state? Engineering: 0.9 0.3 rain sun 0.7 0.1 1.0 0.0 .9 .3 .1 .7 .9 .1 =
Mini-Forward Algorithm Question: What’s P(X) on some day t? X1 X2 X3 X4 Forward simulation
Example Run of Mini-Forward Algorithm From initial observation of sun From initial observation of rain From yet another initial distribution P(X1): P(X1) P(X2) P(X3) P(X4) P(X) P(X1) P(X2) P(X3) P(X4) P(X) … P(X1) P(X)
Stationary Distributions For most chains: Influence of the initial distribution gets less and less over time. The distribution we end up in is independent of the initial distribution Stationary distribution: The distribution we end up with is called the stationary distribution of the chain It satisfies
Example: Stationary Distributions Question: What’s P(X) at time t = infinity? X1 X2 X3 X4 Xt-1 Xt P(Xt|Xt-1) sun 0.9 rain 0.1 0.3 0.7 Also:
Application of Stationary Distribution: Web Link Analysis PageRank over a web graph Each web page is a state Initial distribution: uniform over pages Transitions: With prob. c, uniform jump to a random page (dotted lines, not all shown) With prob. 1-c, follow a random outlink (solid lines) Stationary distribution Will spend more time on highly reachable pages E.g. many ways to get to the Acrobat Reader download page Somewhat robust to link spam Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link analysis along with many other factors (rank actually getting less important over time) Before: most search engines were merely based on how well a page matches your search words. Note: currently dominated by clickstreams
Coming Up Next up is Hidden Markov Models Then Particle Filters Then Bayesian Networks