Introduction to Sequence Models. Sequences Many types of information involve sequences: -Financial data: -DNA: -Robot motionRobot motion -Text: “Jack.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Introduction to Artificial Intelligence
Exponential Functions Logarithmic Functions
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Continuous-Time Markov Chains Nur Aini Masruroh. LOGO Introduction  A continuous-time Markov chain is a stochastic process having the Markovian property.
Lirong Xia Probabilistic reasoning over time Tue, March 25, 2014.
CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Hidden Markov Models Tue, March 28, 2014.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Operations Research: Applications and Algorithms
Cognitive Computer Vision
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Fundamentals and applications to bioinformatics.
Operations Research: Applications and Algorithms
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Advanced Artificial Intelligence
Hidden Markov Models First Story! Majid Hajiloo, Aria Khademi.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Planning under Uncertainty
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
Doug Downey, adapted from Bryan Pardo,Northwestern University
City College of New York 1 Dr. John (Jizhong) Xiao Department of Electrical Engineering City College of New York A Taste of Localization.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Does Naïve Bayes always work?
CHAPTER 15 SECTION 3 – 4 Hidden Markov Models. Terminology.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Homework 7: Sequence Models. 1. What model is most appropriate? For each prediction problem below, what model (e.g., linear regression, linear classifier,
Reasoning Under Uncertainty: Bayesian networks intro CPSC 322 – Uncertainty 4 Textbook §6.3 – March 23, 2011.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 6.1: Bayes Filter Jürgen Sturm Technische Universität München.
Homework 1 Reminder Due date: (till 23:59) Submission: – – Write the names of students in your team.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Courtesy of J. Akinpelu, Anis Koubâa, Y. Wexler, & D. Geiger
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
HMMs and Particle Filters. Observations and Latent States Markov models don’t get used much in AI. The reason is that Markov models assume that you know.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Probability and Time. Overview  Modelling Evolving Worlds with Dynamic Baysian Networks  Simplifying Assumptions Stationary Processes, Markov Assumption.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
CPSC 7373: Artificial Intelligence Lecture 12: Hidden Markov Models and Filters Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Does Naïve Bayes always work?
Probabilistic reasoning over time
Course: Autonomous Machine Learning
Instructor: Vincent Conitzer
Probabilistic reasoning over time
Presentation transcript:

Introduction to Sequence Models

Sequences Many types of information involve sequences: -Financial data: -DNA: -Robot motionRobot motion -Text: “Jack Flash sat on a candle stick.”

Sequence Models Sequence models try to describe how an element of a sequence depends on previous (or sometimes following) elements. For instance, financial models might try to predict a stock price tomorrow, given the stock prices for the past few weeks. As another example, robot motion models try to predict where a robot will be, given current location and the commands given to the motors.

Types of Sequence Models “Continuous-time” models: these try to describe situations where things change continously, or smoothly, as a function of time. For instance, weather models, models from physics and engineering describing how gases or liquids behave over time, some financial models, … Typically, these involve differential equations We won’t be talking about these

Types of Sequence Models “Discrete-time” models: These try to describe situations where the environment provides information periodically, rather than continuously. – For instance, if stock prices are quoted once per day, or once per hour, or once per time period T, then it’s a discrete sequence of data. – The price of a stock as it fluctuates all day, at any time point, is a continuous sequence of data. We’ll cover 2 examples of Discrete-time sequence models: – Hidden Markov Models (used in NLP, machine learning) – Particle Filters (primarily used in robotics)

Hidden Markov Models How students spend their time (observed once per time interval T): SleepStudy Video games Markov Model: -A set of states -A set of transitions (edges) from one state to the next -A conditional probability P(destination state | source state)

Quiz: Markov Models How students spend their time (observed once per time interval T): SleepStudy Video games Suppose a student starts in the Study state. What is P(Study) in the next time step? What about P(Study) after two time steps? And P(Study) after three time steps?

Answer: Markov Models How students spend their time (observed once per time interval T): SleepStudy Video games Suppose a student starts in the Study state. What is P(Study) in the next time step?0.4 What about P(Study) after two time steps?0.4* * *0.1 = =.25 And P(Study) after three time steps?… complicated

Simpler Example Suppose the student starts asleep. What is P(Sleep) after 1 time step? What is P(Sleep) after 2 time steps? What is P(Sleep) after 3 time steps? SleepStudy

Answer: Simpler Example Suppose the student starts asleep. What is P(Sleep) after 1 time step? 0.5 What is P(Sleep) after 2 time steps? 0.5* *1 = 0.75 What is P(Sleep) after 3 time steps? 0.5*0.5* *1* *0.5* *0*1 = SleepStudy

Stationary Distribution What happens after many, many time steps? We’ll make three assumptions about the transition probabilities: 1.It’s possible to get from any state to any other state. 2.On average, the number of time steps it takes to get from one state back to itself is finite. 3.There are no cycles (or periods). Any Markov chains in this course will have these properties; in practice, most do anyway. SleepStudy

Stationary Distribution What happens after many, many time steps? If those assumptions are true, then: -After enough time steps, the probability of each state converges to a stationary distribution. -This means that the probability at one time step is the same as the probability at the next time step, and the one after that, and the one after that, … SleepStudy

Stationary Distribution Let’s compute the stationary distribution for this Markov chain: Let P t be the probability distribution for Sleep at time step t. For big enough t, P t (Sleep) = P t-1 (Sleep). P t (Sleep) = P t-1 (Sleep)*0.5 + P t-1 (Study)*1 x = 0.5x + 1*(1-x) 1.5x = 1 x = 2/3 SleepStudy

Quiz: Stationary Distribution Compute the stationary distribution for this Markov chain. AB

Answer: Stationary Distribution Compute the stationary distribution for this Markov chain. P t (A) = P t-1 (A) P t (A) = P t-1 (A) * P t-1 (B) * 0.6 x = 0.75 x (1-x) 0.85x = 0.6 x = 0.6 / 0.85 ~= 0.71 AB

Learning Markov Model Parameters There are six probabilities associated with Markov models: 1.Initial state probabilities P 0 (A), P 0 (B) 2.Transition probabilities P(A|A), P(B|A), P(A|B), and P(B|B). AB ? ? ? ? Initial state is A: ? is B: ?

Learning Markov Model Parameters Here is a sequence of observations from our Markov model: BAAABABBAAA Use maximum likelihood to estimate these parameters. 1.P 0 (A) = 0/1, P 0 (B) = 1/1 2.P(A|A) = 4/6 = 2/3. P(B|A) = 2/6=1/3. 3.P(A|B) = 3/4. P(B|B) = 1/4. AB ? ? ? ? Initial state is A: ? is B: ?

Quiz: Learning Markov Model Parameters Here is a sequence of observations from our Markov model: AAABBBBBABBBA Use maximum likelihood to estimate these parameters. AB ? ? ? ? Initial state is A: ? is B: ?

Answer: Learning Markov Model Parameters Here is a sequence of observations from our Markov model: AAABBBBBABBBA Use maximum likelihood to estimate these parameters. 1.P 0 (A) = 1/1. P 0 (B) = 0/1. 2.P(A|A) = 2/4. P(B|A) = 2/4. 3.P(A|B) = 2/8 = 1/4. P(B|B) = 6/8 = 3/4. AB ? ? ? ? Initial state is A: ? is B: ?

Restrictions on Markov Models SleepStudy Video games Probability only depends on previous state, not any of the states before that (called the Markov assumption) -Transition probabilities cannot change over time (called the stationary assumption)

Observations and Latent States Markov models don’t get used much in AI. The reason is that Markov models assume that you know exactly what state you are in, at each time step. This is rarely true for AI agents. Instead, we will say that the agent has a set of possible latent states – states that are not observed, or known to the agent. In addition, the agent has sensors that allow it to sense some aspects of the environment, to take measurements or observations.

Hidden Markov Models Suppose you are the parent of a college student, and would like to know how studious your child is. You can’t observe them at all times, but you can periodically call, and see if your child answers. SleepStudy H1H1 H2H2 H3H3 … SleepStudy SleepStudy O1O1 O2O2 O3O3 Answer call or not? Answer call or not? Answer call or not?

Hidden Markov Models H1H1 H2H2 H3H3 … O1O1 O2O2 O3O3 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H2H2 H3H3 P(H 3 |H 2 ) Sleep 0.6 StudySleep0.5 H4H4 H3H3 P(H 4 |H 3 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H2H2 O2O2 P(O 2 |H 2 ) SleepAns0.1 StudyAns0.8 H3H3 O3O3 P(O 3 |H 3 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5 Here’s the same model, with probabilities in tables.

Hidden Markov Models HMMs (and MMs) are a special type of Bayes Net. Everything you have learned about BNs applies here. H1H1 H2H2 H3H3 … O1O1 O2O2 O3O3 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H2H2 H3H3 P(H 3 |H 2 ) Sleep 0.6 StudySleep0.5 H4H4 H3H3 P(H 4 |H 3 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H2H2 O2O2 P(O 2 |H 2 ) SleepAns0.1 StudyAns0.8 H3H3 O3O3 P(O 3 |H 3 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5

Quick Review of BNs for HMMs H1H1 O1O1 H1H1 H2H2

Hidden Markov Models H1H1 … O1O1 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5

Hidden Markov Models H1H1 O1O1 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5 H2H2 O2O2

Quiz: Hidden Markov Models H1H1 O1O1 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5 H2H2 O2O2 Suppose a parent calls twice, once at time step 1 and once at time step 2. The first time, the child does not answer, and the second time the child does. Now what is P(H 2 =Sleep)?

Answer: Hidden Markov Models H1H1 O1O1 H1H1 H2H2 P(H 2 |H 1 ) Sleep 0.6 StudySleep0.5 H1H1 O1O1 P(O 1 |H 1 ) SleepAns0.1 StudyAns0.8 H1H1 P(H 1 ) Sleep0.5 Study0.5 H2H2 O2O2