Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models Lirong Xia.

Similar presentations


Presentation on theme: "Hidden Markov Models Lirong Xia."— Presentation transcript:

1 Hidden Markov Models Lirong Xia

2 The “Markov”s we have learned so far
Markov decision process (MDP) transition probability only depends on (state,action) in the previous step Reinforcement learning unknown probability/rewards Markov models Hidden Markov models

3 Markov Models A Markov model is a chain-structured BN p(X1) p(X|X-1)
Conditional probabilities are the same (stationarity) Value of X at a given time is called the state As a BN: Parameters: called transition probabilities p(X1) p(X|X-1)

4 Computing the stationary distribution
p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+ p(X=sun|X-1=rain)p(X=rain) p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+ p(X=rain|X-1=rain)p(X=rain)

5 Hidden Markov Models Hidden Markov models (HMMs)
Underlying Markov chain over state X Effects (observations) at each time step As a Bayes’ net:

6 Example An HMM is defined by: Initial distribution: p(X1)
Rt-1 p(Rt) t 0.7 f 0.3 Rt p(Ut) t 0.9 f 0.2 An HMM is defined by: Initial distribution: p(X1) Transitions: p(X|X-1) Emissions: p(E|X)

7 Filtering / Monitoring
Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time B(Xt) = p(Xt|e1:t) We start with B(X) in an initial setting, usually uniform As time passes, or we get observations, we update B(X)

8 Example: Robot Localization
Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

9 HMM weather example: a question
.6 s p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 .1 .3 .4 .3 .2 c r .3 .3 .5 You have been stuck in the lab for three days (!) On those days, your labmate was dry, wet, wet, respectively What is the probability that it is now raining outside? p(X3 = r | E1 = d, E2 = w, E3 = w)

10 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8
Filtering .6 s p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 .1 .3 .4 .3 .2 c r .3 .3 .5 Computationally efficient approach: first compute p(X1 = i, E1 = d) for all states i p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt | xt-1)

11 Today Formal algorithm for filtering Introduction to sampling
Elapse of time compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) Observe compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) Renormalization Introduction to sampling

12 Inference Recap: Simple Cases

13 Elapse of Time B(Xt-1)=p(Xt-1|e1:t-1)
Assume we have current belief p(Xt-1|evidence to t-1) B(Xt-1)=p(Xt-1|e1:t-1) Then, after one time step passes: p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1) Or, compactly B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) With the “B” notation, be careful about what time step t the belief is about, what evidence it includes

14 Observe and renormalization
Assume we have current belief p(Xt| previous evidence): B’(Xt)=p(Xt|e1:t-1) Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1) Or: B(Xt) ∝p(et|Xt)B’(Xt) Basic idea: beliefs reweighted by likelihood of evidence Need to renormalize B(Xt)

15 Recap: The Forward Algorithm
We are given evidence at each time and want to know We can derive the following updates We can normalize as we go if we want to have p(x|e) at each time step, or just once at the end…

16 Example HMM Rt-1 p(Rt) t 0.7 f 0.3 Rt-1 p(Ut) t 0.9 f 0.2

17 Observe and time elapse
Want to know B(Rain2)=p(Rain2|+u1,+u2) Time elapse and renormalize Observe

18 Online Belief Updates Each time step, we start with p(Xt-1 | previous evidence): Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) Observe B(Xt) ∝p(et|Xt)B’(Xt) Renormalize B(Xt) Problem: space is |X| and time is |X|2 per time step what if the state is continuous?

19 Continuous probability space
Real-world robot localization

20 Sampling

21 Approximate Inference
Sampling is a hot topic in machine learning, and it’s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Why sample? Learning: get samples from a distribution you don’t know Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)

22 Prior Sampling +c 0.5 -c +c +s -s -c +c +r -r -c +s +r +w -w -r -s
0.1 -s 0.9 -c 0.5 +c +r 0.8 -r 0.2 -c +s +r +w 0.99 -w 0.01 -r 0.90 0.10 -s Samples: +c, -s, +r, +w -c, +s, -r, +w

23 Prior Sampling (w/o evidences)
This process generates samples with probability: i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

24 Example We’ll get a bunch of samples from the BN: If we want to p(W)
+c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w -c, -s, -r, +w If we want to p(W) We have counts <+w:4, -w:1> Normalize to get p(W) = <+w:0.8, -w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)? Fast: can use fewer samples if less time (what’s the drawback?)

25 Rejection Sampling Let’s say we want p(C) Let’s say we want p(C|+s)
No point keeping all samples around Just tally counts of C as we go Let’s say we want p(C|+s) Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w -c, -s, -r, +w

26 Likelihood Weighting Problem with rejection sampling:
If evidence is unlikely, you reject a lot of samples You don’t exploit your evidence as you sample Consider p(B|+a) Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents -b, -a +b, +a -b, +a +b, +a

27 Likelihood Weighting +c 0.5 -c +c +s -s -c +c +r -r -c +s +r +w -w -r
0.1 -s 0.9 -c 0.5 +c +r 0.8 -r 0.2 -c +s +r +w 0.99 -w 0.01 -r 0.90 0.10 -s Samples: +c, +s, +r, +w ……

28 Likelihood Weighting Sampling distribution if z sampled and e fixed evidence now, samples have weights Together, weighted sampling distribution is consistent

29 Ghostbusters HMM p(X1) p(X|X’=<1,2>) p(X1) = uniform
p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place p(Rij|X) = same sensor model as before: red means close, green means far away. 1/9 p(X1) 1/6 1/2 p(X|X’=<1,2>)

30 Example: Passage of Time
As time passes, uncertainty “accumulates” T = T = T= 5 Transition model: ghosts usually go clockwise

31 Before observation After observation
Example: Observation As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation


Download ppt "Hidden Markov Models Lirong Xia."

Similar presentations


Ads by Google