Hidden Markov Models Lirong Xia.

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Advanced Artificial Intelligence
QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.
… Hidden Markov Models Markov assumption: Transition model:
CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Announcements Homework 8 is out Final Contest (Optional)
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.
QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
CS 188: Artificial Intelligence Fall 2008 Lecture 19: HMMs 11/4/2008 Dan Klein – UC Berkeley 1.
CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.
CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.
Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 573: Artificial Intelligence Autumn 2012 Particle Filters for Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell,
QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Probabilistic reasoning over time
Probabilistic Reasoning
Instructor: Vincent Conitzer
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Artificial Intelligence
CS 4/527: Artificial Intelligence
Introduction to particle filter
CAP 5636 – Advanced Artificial Intelligence
Probabilistic Reasoning over Time
Probabilistic Reasoning over Time
Probabilistic Reasoning
Hidden Markov Autoregressive Models
CS 188: Artificial Intelligence Spring 2007
Probability Topics Random Variables Joint and Marginal Distributions
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Introduction to particle filter
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
Probabilistic Reasoning
Probabilistic Reasoning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Hidden Markov Models Markov chains not so useful for most agents
CS 188: Artificial Intelligence Fall 2008
Speech recognition, machine learning
CS 188: Artificial Intelligence Fall 2007
Announcements Assignments HW10 Due Wed 4/17 HW11
Instructor: Vincent Conitzer
Instructor: Vincent Conitzer
HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.
Probabilistic reasoning over time
Speech recognition, machine learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

Hidden Markov Models Lirong Xia

The “Markov”s we have learned so far Markov decision process (MDP) transition probability only depends on (state,action) in the previous step Reinforcement learning unknown probability/rewards Markov models Hidden Markov models

Markov Models A Markov model is a chain-structured BN p(X1) p(X|X-1) Conditional probabilities are the same (stationarity) Value of X at a given time is called the state As a BN: Parameters: called transition probabilities p(X1) p(X|X-1)

Computing the stationary distribution p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+ p(X=sun|X-1=rain)p(X=rain) p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+ p(X=rain|X-1=rain)p(X=rain)

Hidden Markov Models Hidden Markov models (HMMs) Underlying Markov chain over state X Effects (observations) at each time step As a Bayes’ net:

Example An HMM is defined by: Initial distribution: p(X1) Rt-1 p(Rt) t 0.7 f 0.3 Rt p(Ut) t 0.9 f 0.2 An HMM is defined by: Initial distribution: p(X1) Transitions: p(X|X-1) Emissions: p(E|X)

Filtering / Monitoring Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time B(Xt) = p(Xt|e1:t) We start with B(X) in an initial setting, usually uniform As time passes, or we get observations, we update B(X)

Example: Robot Localization Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

HMM weather example: a question .6 s p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 .1 .3 .4 .3 .2 c r .3 .3 .5 You have been stuck in the lab for three days (!) On those days, your labmate was dry, wet, wet, respectively What is the probability that it is now raining outside? p(X3 = r | E1 = d, E2 = w, E3 = w)

p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 Filtering .6 s p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 .1 .3 .4 .3 .2 c r .3 .3 .5 Computationally efficient approach: first compute p(X1 = i, E1 = d) for all states i p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt | xt-1)

Today Formal algorithm for filtering Introduction to sampling Elapse of time compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) Observe compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) Renormalization Introduction to sampling

Inference Recap: Simple Cases

Elapse of Time B(Xt-1)=p(Xt-1|e1:t-1) Assume we have current belief p(Xt-1|evidence to t-1) B(Xt-1)=p(Xt-1|e1:t-1) Then, after one time step passes: p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1) Or, compactly B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) With the “B” notation, be careful about what time step t the belief is about, what evidence it includes

Observe and renormalization Assume we have current belief p(Xt| previous evidence): B’(Xt)=p(Xt|e1:t-1) Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1) Or: B(Xt) ∝p(et|Xt)B’(Xt) Basic idea: beliefs reweighted by likelihood of evidence Need to renormalize B(Xt)

Recap: The Forward Algorithm We are given evidence at each time and want to know We can derive the following updates We can normalize as we go if we want to have p(x|e) at each time step, or just once at the end…

Example HMM Rt-1 p(Rt) t 0.7 f 0.3 Rt-1 p(Ut) t 0.9 f 0.2

Observe and time elapse Want to know B(Rain2)=p(Rain2|+u1,+u2) Time elapse and renormalize Observe

Online Belief Updates Each time step, we start with p(Xt-1 | previous evidence): Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) Observe B(Xt) ∝p(et|Xt)B’(Xt) Renormalize B(Xt) Problem: space is |X| and time is |X|2 per time step what if the state is continuous?

Continuous probability space Real-world robot localization

Sampling

Approximate Inference Sampling is a hot topic in machine learning, and it’s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Why sample? Learning: get samples from a distribution you don’t know Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)

Prior Sampling +c 0.5 -c +c +s -s -c +c +r -r -c +s +r +w -w -r -s 0.1 -s 0.9 -c 0.5 +c +r 0.8 -r 0.2 -c +s +r +w 0.99 -w 0.01 -r 0.90 0.10 -s Samples: +c, -s, +r, +w -c, +s, -r, +w

Prior Sampling (w/o evidences) This process generates samples with probability: i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

Example We’ll get a bunch of samples from the BN: If we want to p(W) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w -c, -s, -r, +w If we want to p(W) We have counts <+w:4, -w:1> Normalize to get p(W) = <+w:0.8, -w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)? Fast: can use fewer samples if less time (what’s the drawback?)

Rejection Sampling Let’s say we want p(C) Let’s say we want p(C|+s) No point keeping all samples around Just tally counts of C as we go Let’s say we want p(C|+s) Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w -c, -s, -r, +w

Likelihood Weighting Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don’t exploit your evidence as you sample Consider p(B|+a) Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents -b, -a +b, +a -b, +a +b, +a

Likelihood Weighting +c 0.5 -c +c +s -s -c +c +r -r -c +s +r +w -w -r 0.1 -s 0.9 -c 0.5 +c +r 0.8 -r 0.2 -c +s +r +w 0.99 -w 0.01 -r 0.90 0.10 -s Samples: +c, +s, +r, +w ……

Likelihood Weighting Sampling distribution if z sampled and e fixed evidence now, samples have weights Together, weighted sampling distribution is consistent

Ghostbusters HMM p(X1) p(X|X’=<1,2>) p(X1) = uniform p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place p(Rij|X) = same sensor model as before: red means close, green means far away. 1/9 p(X1) 1/6 1/2 p(X|X’=<1,2>)

Example: Passage of Time As time passes, uncertainty “accumulates” T = 1 T = 2 T= 5 Transition model: ghosts usually go clockwise

Before observation After observation Example: Observation As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation