Lirong Xia Hidden Markov Models Tue, March 28, 2014.

Slides:

Advertisements

Similar presentations

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.

Advertisements

Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.

Dynamic Bayesian Networks (DBNs)

Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Advanced Artificial Intelligence

QUIZ!!  T/F: Rejection Sampling without weighting is not consistent. FALSE  T/F: Rejection Sampling (often) converges faster than Forward Sampling. FALSE.

HMMs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew.

… Hidden Markov Models Markov assumption: Transition model:

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

CS 188: Artificial Intelligence Fall 2009 Lecture 17: Bayes Nets IV 10/27/2009 Dan Klein – UC Berkeley.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.

CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.

Announcements Homework 8 is out Final Contest (Optional)

CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

CS 188: Artificial Intelligence Fall 2008 Lecture 18: Decision Diagrams 10/30/2008 Dan Klein – UC Berkeley 1.

QUIZ!!  T/F: The forward algorithm is really variable elimination, over time. TRUE  T/F: Particle Filtering is really sampling, over time. TRUE  T/F:

Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.

Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.

CS 188: Artificial Intelligence Fall 2008 Lecture 19: HMMs 11/4/2008 Dan Klein – UC Berkeley 1.

CS 188: Artificial Intelligence Fall 2006 Lecture 18: Decision Diagrams 10/31/2006 Dan Klein – UC Berkeley.

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.

CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.

CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.

Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.

Inference Algorithms for Bayes Networks

Quick Warm-Up Suppose we have a biased coin that comes up heads with some unknown probability p; how can we use it to produce random bits with probabilities.

CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.

CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks: Inference Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart.

Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Reasoning over Time  Often, we want to reason about a sequence of observations  Speech recognition  Robot localization  User attention  Medical monitoring.

CSE 473: Artificial Intelligence Spring 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart.

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

CSE 573: Artificial Intelligence Autumn 2012 Particle Filters for Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell,

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

QUIZ!!  T/F: You can always (theoretically) do BNs inference by enumeration. TRUE  T/F: In VE, always first marginalize, then join. FALSE  T/F: VE is.

HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Probabilistic reasoning over time

Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule

Artificial Intelligence

CS 4/527: Artificial Intelligence

CAP 5636 – Advanced Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

Advanced Artificial Intelligence

Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CS 188: Artificial Intelligence

Probabilistic Reasoning

Hidden Markov Models Markov chains not so useful for most agents

CS 188: Artificial Intelligence Fall 2008

Hidden Markov Models Lirong Xia.

Speech recognition, machine learning

CS 188: Artificial Intelligence Fall 2007

Announcements Assignments HW10 Due Wed 4/17 HW11

HMM: Particle filters Lirong Xia. HMM: Particle filters Lirong Xia.

Probabilistic reasoning over time

Speech recognition, machine learning

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12

Presentation transcript:

Lirong Xia Hidden Markov Models Tue, March 28, 2014

Markov decision process (MDP) –transition probability only depends on (state,action) in the previous step Reinforcement learning –unknown probability/rewards Markov models Hidden Markov models 1 The “Markov”s we have learned so far

Markov Models 2 A Markov model is a chain-structured BN –Conditional probabilities are the same (stationarity) –Value of X at a given time is called the state –As a BN: –Parameters: called transition probabilities p(X 1 ) p(X|X -1 )

p(X=sun)=p(X=sun|X -1 =sun)p(X=sun)+ p(X=sun|X -1 =rain)p(X=rain) p(X=rain)=p(X=rain|X -1 =sun)p(X=sun)+ p(X=rain|X -1 =rain)p(X=rain) 3 Computing the stationary distribution

Hidden Markov Models 4 Hidden Markov models (HMMs) –Underlying Markov chain over state X –Effects (observations) at each time step –As a Bayes’ net:

Example 5 An HMM is defined by: –Initial distribution: p(X 1 ) –Transitions:p(X|X -1 ) –Emissions:p(E|X) R t-1 p(R t ) t0.7 f0.3 RtRt p(U t ) t0.9 f0.2

Filtering / Monitoring 6 Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time B(X t ) = p(X t |e 1:t ) We start with B(X) in an initial setting, usually uniform As time passes, or we get observations, we update B(X)

Example: Robot Localization 7 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

HMM weather example: a question s c r You have been stuck in the lab for three days (!) On those days, your labmate was dry, wet, wet, respectively What is the probability that it is now raining outside? p(X 3 = r | E 1 = d, E 2 = w, E 3 = w) p(w|s) =.1 p(w|c) =.3 p(w|r) =.8

Filtering Computationally efficient approach: first compute p(X 1 = i, E 1 = d) for all states i p(X t, e 1:t ) = p(e t | X t )Σ x t-1 p(x t-1, e 1:t-1 ) p(X t | x t-1 ) s c r p(w|s) =.1 p(w|c) =.3 p(w|r) =.8

Formal algorithm for filtering –Elapse of time compute p(X t+1 |X t,e 1:t ) from p(X t |e 1:t ) –Observe compute p(X t+1 |e 1:t+1 ) from p(X t+1 |e 1:t ) –Renormalization Introduction to sampling 10 Today

Inference Recap: Simple Cases 11

Elapse of Time 12 Assume we have current belief p(X t-1 |evidence to t- 1) B(X t-1 )=p(X t-1 |e 1:t-1 ) Then, after one time step passes: p(X t |e 1:t-1 )= Σ x t-1 p(X t |x t-1 )p(X t-1 |e 1:t-1 ) Or, compactly B’(X t )= Σ x t-1 p(X t |x t-1 )B(x t-1 ) With the “B” notation, be careful about –what time step t the belief is about, –what evidence it includes

Observe and renormalization 13 Assume we have current belief p(X t | previous evidence): B’(X t )=p(X t |e 1:t-1 ) Then: p(X t |e 1:t ) ∝ p(e t |X t )p(X t |e 1:t-1 ) Or: B(X t ) ∝ p(e t |X t )B’(X t ) Basic idea: beliefs reweighted by likelihood of evidence Need to renormalize B(X t )

Recap: The Forward Algorithm 14 We are given evidence at each time and want to know We can derive the following updates We can normalize as we go if we want to have p(x|e) at each time step, or just once at the end…

Example HMM 15 R t-1 p(R t ) t0.7 f0.3 R t-1 p(U t ) t0.9 f0.2

Observe and time elapse 16 Observe Time elapse and renormalize Want to know B(Rain 2 )=p(Rain 2 |+u 1,+u 2 )

Online Belief Updates 17 Each time step, we start with p(X t-1 | previous evidence): Elapse of time B’(X t )= Σ x t-1 p(X t |x t-1 )B(x t-1 ) Observe B(X t ) ∝ p(e t |X t )B’(X t ) Renormalize B(X t ) Problem: space is |X| and time is |X| 2 per time step –what if the state is continuous?

Real-world robot localization 18 Continuous probability space

19 Sampling

Approximate Inference 20 Sampling is a hot topic in machine learning, and it’s really simple Basic idea: –Draw N samples from a sampling distribution S –Compute an approximate posterior probability –Show this converges to the true probability P Why sample? –Learning: get samples from a distribution you don’t know –Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)

Prior Sampling 21 +c +s 0.1 -s 0.9 -c +s 0.5 -s 0.5 +c +r 0.8 -r 0.2 -c +r 0.2 -r 0.8 +s +r +w w r +w w s +r +w w r +w w c0.5 -c0.5 Samples: +c, -s, +r, +w -c, +s, -r, +w

Prior Sampling (w/o evidences) 22 This process generates samples with probability: i.e. the BN’s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

Example 23 We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w If we want to p(W) –We have counts –Normalize to get p(W) = –This will get closer to the true distribution with more samples –Can estimate anything else, too –What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)? –Fast: can use fewer samples if less time (what’s the drawback?)

Rejection Sampling 24 Let’s say we want p(C) –No point keeping all samples around –Just tally counts of C as we go Let’s say we want p(C|+s) –Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s –This is called rejection sampling –It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w

Likelihood Weighting 25 Problem with rejection sampling: –If evidence is unlikely, you reject a lot of samples –You don’t exploit your evidence as you sample –Consider p(B|+a) Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents -b, -a +b, +a -b, +a +b, +a

Likelihood Weighting 26 +c +s 0.1 -s 0.9 -c +s 0.5 -s 0.5 +c +r 0.8 -r 0.2 -c +r 0.2 -r 0.8 +s +r +w w r +w w s +r +w w r +w w c0.5 -c0.5 Samples: +c, +s, +r, +w ……

Likelihood Weighting 27 Sampling distribution if z sampled and e fixed evidence now, samples have weights Together, weighted sampling distribution is consistent

Ghostbusters HMM 28 –p(X 1 ) = uniform –p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place –p(R ij |X) = same sensor model as before: red means close, green means far away. 1/9 p(X 1 ) 1/6 1/2 01/ p(X|X’= )

Example: Passage of Time 29 As time passes, uncertainty “accumulates” Transition model: ghosts usually go clockwise T = 1 T = 2 T= 5

Example: Observation 30 As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation