Conditional & Joint Probability A brief digression back to joint probability: i.e. both events O and H occur Again, we can express joint probability in.

Slides:



Advertisements
Similar presentations
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Learning HMM parameters
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 7 part 2 By Herb I. Gross and Richard A. Medeiros next.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Best-First Search: Agendas
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Visual Recognition Tutorial
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Chapter 4 Multiple Regression.
Simulation.
Hidden Markov Models.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Probabilistic Prediction Algorithms Jon Radoff Biophysics 101 Fall 2002.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
Inferential Statistics
Floating point numbers in Python Floats in Python are platform dependent, but usually equivalent to an IEEE bit C “double” However, because the significand.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
MATH 224 – Discrete Mathematics
1 CS 177 Week 16 Recitation Recursion. 2 Objective To understand and be able to program recursively by breaking down a problem into sub problems and joining.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
1 Chapter 16 Random Variables. 2 Expected Value: Center A random variable assumes a value based on the outcome of a random event.  We use a capital letter,
1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.
What does a computer program look like: a general overview.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Numerical Methods.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
previous next 12/1/2015 There’s only one kind of question on a reading test, right? Book Style Questions Brain Style Questions Definition Types of Questions.
CS 416 Artificial Intelligence Lecture 17 Reasoning over Time Chapter 15 Lecture 17 Reasoning over Time Chapter 15.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
STA 2023 Module 5 Discrete Random Variables. Rev.F082 Learning Objectives Upon completing this module, you should be able to: 1.Determine the probability.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
BIT 115: Introduction To Programming Professor: Dr. Baba Kofi Weusijana (say Doc-tor Way-oo-see-jah-nah, Doc-tor, or Bah-bah)
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Statistics 16 Random Variables. Expected Value: Center A random variable assumes a value based on the outcome of a random event. –We use a capital letter,
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Hidden Markov Models Part 2: Algorithms
Discrete Event Simulation - 4
Coding Concepts (Basics)
Recursion Taken from notes by Dr. Neil Moore
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Conditional & Joint Probability A brief digression back to joint probability: i.e. both events O and H occur Again, we can express joint probability in terms of their separate conditional and unconditional probabilities This key result turns out to be exceedingly useful!

Conditional Probability Converting expressions of joint probability We can therefore express everything only in terms of reciprocal conditional and unconditional probabilities: The intersection operator makes no assertion regarding order: This is usually expressed in a slightly rearranged form…

Conditional Probability Bayes’ theorem expresses the essence of inference We can think of this as allowing us to compute the probability of some hidden event H given that some observable event O has occurred, provided we know the probability of the observed event O assuming that hidden event H has occurred Bayes’ theorem is a recipe for problems involving conditional probability

Conditional Probability Normalizing the probabilities For convenience, we often replace the probability of the observed event O with the sum over all possible values of H of the joint probabilities of O and H. Whew! But consider that if we now calculated Pr{H|O} for every H, the sum of these would be one, which is just as a probability should behave… This summing of the expression in numerator “normalizes” the probabilities

Conditional Probability Bayes’ theorem is so important that each part of this recipe has a special name Bayes’ theorem as a recipe for inference The posterior Think of this perhaps as the evidence for some specific model H given the set of observations O. We are making an inference about H on the basis of O The prior Our best guess about H before any observation is made. Often we will make neutral assumptions, resulting in an uninformative prior. But priors can also come from the posterior results from some earlier experiment. How best to choose priors is an essential element of Bayesian analysis The likelihood model We’ve seen already that the probability of an observation given a hidden parameter is really a likelihood. Choosing a likelihood model is akin to proposing some process, H, by which the observation, O, might have come about The observable probability Generally, care must be taken to ensure that our observables have no uncertainty, otherwise they are really hidden!!!

Many paths give rise to the same sequence X The Backward Algorithm P(x) = We would often like to know the total probability of some sequence: But wait! Didn’t we already solve this problem using the forward algorithm!? Sometimes, the trip isn’t about the destination. Stick with me! Well, yes, but we’re going to solve it again by iterating backwards through the sequence instead of forwards

Defining the backward variable The Backward Algorithm Since we are effectively stepping “backwards” through the event sequence, this is formulated as a statement of conditional probability rather than in terms of joint probability as are forward variables b k (i) = P(x i+1 … x L |  i = k) “The backward variable for state k at position i ” “the probability of the sequence from the end to the symbol at position i, given that the path at position i is k ”

What if we had in our possession all of the backward variables for 1, the first position of the sequence? The Backward Algorithm We’ll obtain these “initial position” backward variables in a manner directly analogous to the method in the forward algorithm… To get the overall probability of the sequence, we would need only sum the backward variables for each state (after correcting for the probability of the initial transition from Start ) P(X) =

A recursive definition for backward variables The Backward Algorithm As always with a dynamic programming algorithm, we recursively define the variables.. But this time in terms of their own values at later positions in the sequence… The termination condition is satisfied and the basis case provided by virtue of the fact that sequences are finite and in this case we must eventually (albeit implicitly) come to the End state b k (i-1) = b k (i) = P(x i+1 … x L |  i = k)

If you understand forward, you already understand backward! The Backward Algorithm Initialization: b k (L) = 1 for all states k We usually don’t need the termination step, except to check that the result is the same as from forward… So why bother with the backward algorithm in the first place? Recursion ( i = L-1,…,1 ): Termination: P(X) = b k (i) =

The Backward Algorithm The backward algorithm takes its name from its backwards iteration through the sequence S State “+”State “-” A: 0.30 C: 0.25 G: 0.15 T: A: 0.20 C: 0.35 G: 0.25 T: 0.20 The probability of a sequence summed across all possible state paths End ACG _ * 0.15 * * 0.25 * 1  = * 0.25 * * 0.35 * 0.19  = * 0.25 * * 0.35 * 0.19  * 0.3 * * 0.2 *  = P(x) = * 0.25 * * 0.15 * 1  = 0.2

The most probable state path The Decoding Problem The Viterbi algorithm will calculate the most probable state path, thereby allowing us to “decode” the state path in cases where the true state path is hidden The Viterbi algorithm generally does a good job of recovering the state path…  *= argmax P(x,  )  _ SFFFFFFFFFFFFFLLLLFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLFFFFFFFFF SFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLL LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL Sequence True state path Viterbi path a.k.a MPSP  *= argmax P(x,  θ) 

Limitations of the most probable state path The Decoding Problem …but the most probable state path might not always be the best choice for further inference on the sequence This is the posterior probability of state k at position i when sequence x is known P(  i = k|x) There may be other paths, sometimes several, that result in probabilities nearly as good as the MPSP The MPSP tells us about the probability of the entire path, but it doesn’t actually tell us what the most probable state might be for some particular observation x i More specifically, we might want to know: “the probability that observation x i resulted from being in state k, given the observed sequence x ”

The approach is a little bit indirect…. Calculating the Posterior Probabilities We can sometimes ask a slightly different or related question, and see if that gets us closer to the result we seek Does anything here look like something we may have seen before?? P(x,  i = k) Maybe we can first say something about the probability of producing the entire observed sequence with observation x i resulting from having been in state k ….. P(x 1 …x i,  i = k)·P(x i+1 …x L |x 1 …x i,  i = k) = = P(x 1 …x i,  i = k)·P(x i+1 …x L |  i = k)

Calculating the Posterior Probabilities These terms are exactly our now familiar forward and backward variables! P(x,  i = k) = P(x 1 …x i,  i = k)·P(x i+1 …x L |  i = k) Limitations of the most probable state path = f k (i)·b k (i)

Calculating the Posterior Probabilities OK, but what can we really do with these posterior probabilities? P(x,  i = k) Putting it all together using Baye’s theorem f k (i)·b k (i) P(  i = k|x) = P(x) We now have all of the necessary ingredients required to apply Baye’s theorem… We can now therefore find the posterior probability of being in state k for each position i in the sequence! We need only run both the forward and the backward algorithms to generate the values we need. Python: we probably want to store our forward and backward values as instance variables rather than method variables Remember, we can get P(x) directly from either the forward or backward algorithm! P(x|  i = k)·P(  i = k)

Making use of the posterior probabilities Posterior Decoding Two primary applications of the posterior state path probabilities In some scenarios, the overall path might not even be a permitted path through the model! We can define an alternative state path to the most probable state path:  i = argmax P(  i = k | x) k ˰ This alternative to Viterbi decoding is most useful when we are most interested in the what the state might be at some particular point or points. It’s possible that the overall path suggested by this might not be particularly likely

Plotting the posterior probabilities Posterior Decoding Key: true path / Viterbi path / posterior path _ SFFFFFFFFLLLLLLLLLLLFFLLLLLFFLLLLLLLLFFFFFFFFFFFFFFFFFFFLLLL SFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFLLLLLLLLLLLLFFFFLLLLLLLLLLLLFFFFFFFFFFFFFFFFFFFFFFF Since we know individual probabilities at each position we can easily plot the posterior probabilities myHMM.generate(60, )

Plotting the posterior probabilities with matplotlib Posterior Decoding Assuming that we have list variables self._x containing the range of the sequence and self._y containing the posterior probabilities… Note: you may need to convert the log_float probabilities back to normal floats I found it more convenient to just define the __float__() method in log_float from pylab import * # this line at the beginning of the file... class HMM(object):... # all your other stuff def show_posterior(self): if self._x and self._y: plot (self._x, self._y) show() return