CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Albert Gatt Corpora and Statistical Methods Lecture 8.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
Albert Gatt Corpora and Statistical Methods Lecture 9.
1 Sequence Labeling Raymond J. Mooney University of Texas at Austin.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
Hidden Markov Models for Information Extraction CSE 454.
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Word classes and part of speech tagging Chapter 5.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CSA3202 Human Language Technology HMMs for POS Tagging.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
CPSC 422, Lecture 19Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.
Overview  Modelling Evolving Worlds with DBNs  Simplifying Assumptions Stationary Processes, Markov Assumption  Inference Tasks in Temporal Models Filtering.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Today.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CSCI 5832 Natural Language Processing
Probabilistic Reasoning over Time
CSCI 5822 Probabilistic Models of Human and Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
CS 188: Artificial Intelligence Spring 2007
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14
Presentation transcript:

CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015

CPSC 422, Lecture 152 Lecture Overview Probabilistic temporal Inferences Filtering Prediction Smoothing (forward-backward) Most Likely Sequence of States (Viterbi)

Smoothing  Smoothing: Compute the posterior distribution over a past state given all evidence to date P(X k | e 0:t ) for 1 ≤ k < t E0E0  To revise your estimates in the past based on more recent evidence

Smoothing  P(X k | e 0:t ) = P(X k | e 0:k,e k+1:t ) dividing up the evidence = α P(X k | e 0:k ) P(e k+1:t | X k, e 0:k ) using… = α P(X k | e 0:k ) P(e k+1:t | X k ) using… backward message, b k+1:t computed by a recursive process that runs backwards from t forward message from filtering up to state k, f 0:k A. Bayes Rule B. Cond. Independence C. Product Rule

Smoothing  P(X k | e 0:t ) = P(X k | e 0:k,e k+1:t ) dividing up the evidence = α P(X k | e 0:k ) P(e k+1:t | X k, e 0:k ) using Bayes Rule = α P(X k | e 0:k ) P(e k+1:t | X k ) By Markov assumption on evidence backward message, b k+1:t computed by a recursive process that runs backwards from t forward message from filtering up to state k, f 0:k

Backward Message P(e k+1:t | X k ) = ∑ x k+1 P(e k+1:t, x k+1 | X k ) = ∑ x k+1 P(e k+1:t |x k+1, X k ) P( x k+1 | X k ) = = ∑ x k+1 P(e k+1:t |x k+1 ) P( x k+1 | X k ) by Markov assumption on evidence = ∑ x k+1 P(e k+1,e k+2:t |x k+1 ) P( x k+1 | X k ) = ∑ x k+1 P(e k+1 |x k+1 ) P(e k+2:t |x k+1 ) P( x k+1 | X k ) sensor model transition model recursive call  In message notation b k+1:t = BACKWARD (b k+2:t, e k+1 ) Product Rule because e k+1 and e k+2:t, are conditionally independent given x k+1

Proof of equivalent statements

More Intuitive Interpretation (Example with three states) CPSC 422, Lecture 15Slide 8 P(e k+1:t | X k ) = ∑ x k+1 P( x k+1 | X k )P(e k+1 |x k+1 ) P(e k+2:t |x k+1 )

Forward-Backward Procedure  Thus, P(X k | e 0:t ) = α f 0:k b k+1:t and this value can be computed by recursion through time, running forward from 0 to k and backwards from t to k+1

How is it Backward initialized?  The backwards phase is initialized with making an unspecified observation e t+1 at t+ 1…… b t+1:t = P(e t+1 | X t ) = P( unspecified | X t ) = ? A. 0 B. 0.5 C. 1

How is it Backward initialized?  You will observe something for sure! It is only when you put some constraints on the observations that the probability becomes less than 1  The backwards phase is initialized with making an unspecified observation e t+1 at t+ 1…… b t+1:t = P(e t+1 | X t ) = P( unspecified | X t ) = 1

Rain Example Rain 0 Rain 1 Umbrella 1 Rain 2 Umbrella 2  Let’s compute the probability of rain at t = 1, given umbrella observations at t=1 and t =2  From P(X k | e 1:t ) = α P(X k | e 1:k ) P(e k+1:t | X k ) we have P(R 1 | e 1:2 ) = P(R 1 | u 1: u 2 ) = α P(R 1 | u 1 ) P(u 2 | R 1 )  P(R 1 | u 1 ) = as it is the filtering to t =1 that we did in lecture 14 TRUE 0.5 FALSE backward message for propagating evidence backward from time 2 forward message from filtering up to state 1

Rain Example Rain 0 Rain 1 Umbrella 1 Rain 2 Umbrella 2  From P(e k+1:t | X k ) = ∑ x k+1 P(e k+1 |x k+1 ) P(e k+2:t |x k+1 ) P( x k+1 | X k )  P(u 2 | R 1 ) = ∑ P(u 2 |r ) P(|r ) P( r | R 1 ) =  P(u 2 |r 2 ) P(|r 2 ) + P(u 2 | ┐r 2 ) P(| ┐r 2 ) = (0.9 * 1 * ) + (0.2 * 1 * ) = Thus  α P(R 1 | u 1 ) P(u 2 | R 1 ) = α * ~ TRUE 0.5 FALSE Term corresponding to the Fictitious unspecified observation sequence e 3:2

Forward-Backward Algorithm  Simple smoothing estimates the posterior of a single state given past and future evidence  But the procedure can be used to estimate the posterior distribution of the complete sequence of states.  One way to do this is to (not the most efficient) compute and store the results of forward filtering over the whole sequence. compute and store the results of backward procedure over the whole sequence Combine the two

CPSC 422, Lecture 15Slide 15

CPSC 422, Lecture 1516 Lecture Overview Probabilistic temporal Inferences Filtering Prediction Smoothing (forward-backward) Most Likely Sequence of States (Viterbi)

Most Likely Sequence  Suppose that in the rain example we have the following umbrella observation sequence [true, true, false, true, true]  Is the most likely state sequence? [rain, rain, no-rain, rain, rain]  In this case you may have guessed right… but if you have more states and/or more observations, with complex transition and observation models…..

CPSC 322, Lecture 32Slide 18 HMMs : most likely sequence (from 322) Natural Language Processing: e.g., Speech Recognition States: phoneme \ word Observations: acoustic signal \ phoneme Bioinformatics: Gene Finding States: coding / non-coding region Observations: DNA Sequences For these problems the critical inference is: find the most likely sequence of states given a sequence of observations

Part-of-Speech (PoS) Tagging  Given a text in natural language, label (tag) each word with its syntactic category E.g, Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction  Input Brainpower, not physical plant, is now a firm's chief asset.  Output Brainpower_NN,_, not_RB physical_JJ plant_NN,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN._. Tag meanings  NNP (Proper Noun singular), RB (Adverb), JJ (Adjective), NN (Noun sing. or mass), VBZ (Verb, 3 person singular present), DT (Determiner), POS (Possessive ending),. (sentence-final punctuation)

POS Tagging is very useful As a basis for parsing in NL understanding Information Retrieval Quickly finding names or other phrases for information extraction Select important words from documents (e.g., nouns) Word-sense disambiguation I made her duck (how many meanings does this sentence have)? Speech synthesis: Knowing PoS produce more natural pronunciations E.g,. Content (noun) vs. content (adjective); object (noun) vs. object (verb)

CPSC 422, Lecture 15Slide 21

Most Likely Sequence (Explanation)  Most Likely Sequence: argmax x 1:T P(X 1:T | e 1:T )  Idea find the most likely path to each state in X T As for filtering etc. we will develop a recursive solution

Most Likely Sequence (Explanation)  Most Likely Sequence: argmax x 1:T P(X 1:T | e 1:T )  Idea find the most likely path to each state in X T As for filtering etc. let’s try to develop a recursive solution CPSC 422, Lecture 16Slide 23

Joint vs. Conditional Prob  You have two binary random variables X and Y B. = A. > D. It depends argmax x P(X | Y=t) ? argmax x P(X, Y=t) C. < XYP(X, Y) tt.4 ft.2 tf.1 ff.3

Most Likely Sequence: Formal Derivation  Suppose we want to find the most likely path to state x t+1 given e 1:t+1. max x 1,...x t P(x 1,.... x t,x t+1 | e 1:t+1 ) but this is…. max x 1,...x t P(x 1,.... x t,x t+1, e 1:t+1 )= max x 1,...x t P(x 1,.... x t,x t+1,e 1:t, e t+1 )= = max x 1,...x t P(e t+1 |e 1:t, x 1,.... x t,x t+1 ) P(x 1,.... x t,x t+1,e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x 1,.... x t,x t+1,e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x t+1 | x 1,.... x t, e 1:t )P(x 1,.... x t, e 1:t )= = max x 1,...x t P(e t+1 |x t+1 ) P(x t+1 |x t ) P(x 1,.... x t-1,x t, e 1:t ) = P(e t+1 |x t+1 ) max x t (P(x t+1 |x t ) max x 1,...x t-1 P(x 1,.... x t-1,x t, e 1:t )) Markov Assumption Move outside the max CPSC 422, Lecture 16Slide 25 Cond. Prob

CPSC 422, Lecture 15Slide 26 Learning Goals for today’s class  You can: Describe the smoothing problem and derive a solution by manipulating probabilities Describe the problem of finding the most likely sequence of states (given a sequence of observations) Derive recursive solution (if time)

CPSC 422, Lecture 15Slide 27 TODO for Fri Keep working on Assignment-2: new due date Wed Oct 21 Midterm new date Oct 26