11 - Markov Chains Jim Vallandingham.

Slides:

Advertisements

Similar presentations

Discrete time Markov Chain

Advertisements

Markov chains Assume a gene that has three alleles A, B, and C. These can mutate into each other. Transition probabilities Transition matrix Probability.

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Lecture 6  Calculating P n – how do we raise a matrix to the n th power?  Ergodicity in Markov Chains.  When does a chain have equilibrium probabilities?

CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.

Link Analysis: PageRank

Markov Chains Modified by Longin Jan Latecki

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)

Bayesian Methods with Monte Carlo Markov Chains III

Markov Chains 1.

. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

TCOM 501: Networking Theory & Fundamentals

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

1 Part III Markov Chains & Queueing Systems 10.Discrete-Time Markov Chains 11.Stationary Distributions & Limiting Probabilities 12.State Classification.

Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…

Lecture 3: Markov processes, master equation

Entropy Rates of a Stochastic Process

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.

Suggested readings Historical notes Markov chains MCMC details

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo

What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.

Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 60 Chapter 8 Markov Processes.

Stochastic Process1 Indexed collection of random variables {X t } t   for each t  T  X t is a random variable T = Index Set State Space = range.

Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.

Monte Carlo Methods in Partial Differential Equations.

Introduction to Monte Carlo Methods D.J.C. Mackay.

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Entropy Rate of a Markov Chain

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.

Module 1: Statistical Issues in Micro simulation Paul Sousa.

Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.

Vaida Bartkutė, Leonidas Sakalauskas

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Discrete Time Markov Chains

Markov Chains Part 4. The Story so far … Def: Markov Chain: collection of states together with a matrix of probabilities called transition matrix (p ij.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.

15 October 2012 Eurecom, Sophia-Antipolis Thrasyvoulos Spyropoulos / Absorbing Markov Chains 1.

Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 60 Chapter 8 Markov Processes.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.

Markov Chain Monte Carlo methods --the final project of stat 6213

Advanced Statistical Computing Fall 2016

Industrial Engineering Dep

Much More About Markov Chains

Intro to Sampling Methods

Markov Chains Mixing Times Lecture 5

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

6. Markov Chain.

Haim Kaplan and Uri Zwick

Markov Chains Part 5.

Markov Chain Monte Carlo: Metropolis and Glauber Chains

Lecture 4: Algorithmic Methods for G/M/1 and M/G/1 type models

Presentation transcript:

11 - Markov Chains Jim Vallandingham

Outline Irreducible Markov Chains Monte Carlo Methods Outline of Proof of Convergence to Stationary Distribution Convergence Example Reversible Markov Chain Monte Carlo Methods Hastings-Metropolis Algorithm Gibbs Sampling Simulated Annealing Absorbing Markov Chains

Stationary Distribution As approaches Each row is the stationary distribution

Stationary Dist. Example

Stationary Dist. Example Long Term averages: 24% time spent in state E1 39% time spent in state E2 21% time spent in state E3 17% time spent in state E4

Stationary Distribution Any finite, aperiodic irreducible Markov chain will converge to a stationary distribution Regardless of starting distribution Outline of Proof requires linear algebra Appendix B.19

L.A. : Eigenvalues Let P be an s x s matrix. P has s eigenvalues Found as the s solutions to Assume all eigenvalues of P are distinct

L.A. : left & right eigenvectors Corresponding to each eigenvalue Is a right eigenvector - And a left eigenvector - For which: Assume they are normalized:

L.A. : Spectral Expansion Can express P in terms of its eigenvectors and eigenvalues: Called a spectral expansion of P

L.A. : Spectral Expansion If is an eigenvalue of P with corresponding left and right eigenvectors & Then is an eigenvalue of Pn with same left and right eigenvectors &

L.A. : Spectral Expansion Implies spectral expansion of Pn can be written as:

Outline of Proof Going back to proof… P has one eigenvalue, equal to 1 P is transition matrix for finite aperiodic irreducible Markov chain P has one eigenvalue, equal to 1 All other eigenvalues have absolute value < 1

Outline of Proof Choosing left and right eigenvectors of Requirements: Also satisfies : & = 1 Probability vector (sum to 1) Normalization (definition of left eigenvector as eigenvalue of 1)

Outline of Proof Same equation satisfied by the stationary distribution Also: Can be shown that there is a unique solution of this equation that also satisfies so so that

Outline of Proof Pn gives the n-step transition probabilities. Spectral Expansion of Pn is: So as n increases Pn approaches Only one eigenvalue is = 1. Rest are < 1

Convergence Example

Convergence Example Has Eigenvalues of :

Convergence Example Has Eigenvalues of : Less than 1

Convergence Example Left & Right eigenvectors satisfying

Convergence Example Left & Right eigenvectors satisfying Stationary distribution

Convergence Example Spectral expansion Stationary distribution

Reversible Markov Chains

Reversible Markov Chains Typically moving forward in ‘time’ in a Markov chain 1  2  3  … t What about moving backward in this chain? t  t-1  t-2 …  1

Reversible Markov Chains Ancestor Back in time Forward in time Species A Species B

Reversible Markov Chains Have a finite irreducible aperiodic Markov chain with stationary distribution During t transitions, chain will move through states: Reverse chain Define Then reverse chain will move through states:

Reversible Markov Chains Want to show structure determining the reverse chain sequence is also a Markov chain Typical element found from typical element of P, using:

Reversible Markov Chains Shown by using Bayes rule to invert conditional probability Intuitively: The future is independent of the past, given the present The past is independent of the future, given the present

Reversible Markov Chains Stationary distribution of reverse chain is still Follows from Stationary distribution property

Reversible Markov Chains Markov chain is said to be reversible if This only holds if

Monte Carlo Methods

Markov Chain Monte Carlo Class of algorithms for sampling from probability distributions Involve constructing a Markov Chain Want to have stationary distribution State of chain after large number of steps is used as a sample of desired distribution We discuss 2 algorithms Gibbs Sampling Simulated Annealing

Basic Problem Find transition matrix P such that Its stationary distribution is the target distribution Know that Markov chain will converge to stationary distribution, regardless of initial distribution How can we find such a P with its stationary distribution as the target distribution?

Basic Idea Construct transition matrix Q “candidate generating matrix” Modify to have correct stationary distribution Modification involves inserting factors So that Various ways to picking a’s

Hastings-Metropolis Goal: construct aperiodic irreducible Markov chain Having prescribed stationary distribution Produces a correlated sequence of draws from the target density that may be difficult to sample using a classical independence method.

Hastings-Metropolis Choose set of constants Define Such that And Process: Choose set of constants Such that And Define Accept state change Reject state change Chain doesn’t change value

Hastings-Metropolis Example = (.4 .6) 1 2 .5 .9 .1 Q =

Hastings-Metropolis Example 1 2 .5 .9 .1 = (.4 .6) Q = 1 2 .5 .33 .67 P=

Hastings-Metropolis Example = (.4 .6) 1 2 .5 .33 .67 P= 1 2 .415 .585 .386 .614 P2= 1 2 .398 .602 P50=

Algorithmic Description Start with State E1, then iterate Propose E’ from q(Et,E’) Calculate ratio If a > 1, Accept E(t+1) = E’ Else Accept with probability of a If rejected, E(t+1) = Et

Gibbs Sampling

Gibbs Sampling Definitions Be the random vector Be the distribution of Assume We define a Markov chain whose states are the possible values of Y

Gibbs Sampling Enumerate vectors in some order Process Enumerate vectors in some order 1, 2,…,s Pick vector j with jth state in chain pij : 0 : if vectors i & j differ by more than 1 component If they differ by at most 1 component, y1*

Gibbs Sampling Assume Joint distribution p(X,Y) Looking to sample k values of X Begin with value of y0 Sample xi using p(X | Y = yi-1) Once xi is found use it to find yi p(Y | X = xi) Repeat k times

Visual Example

Gibbs Sampling Allows us to deal with univariate conditional distributions Instead of complex joint distributions Chain has stationary distribution of

Why is is Hastings-Metropolis ? If we define Can see that for Gibbs: When a is always 1

Simulated Annealing

Simulated Annealing Goal: Find (approximate) minimum of some positive function Function defined on an extremely large number of states, s And to find those states where this function is minimized Value of the function for state is:

Simulated Annealing Construct neighborhood of each state Process Construct neighborhood of each state Set of states “close” to the state Variable in Markov chain can move to a neighbor in one step Moves outside neighborhood not allowed

Simulated Annealing Requirements of neighborhood If is in neighborhood of then is in the neighborhood of Number of states in a neighborhood (N) is independent of that state Neighborhoods are linked so that chain can eventually make it from any Ej to any Em. If in state Ej, then the next move must be in neighborhood of Ej.

Simulated Annealing Uses a positive parameter T Aim is to have the stationary distribution of each Markov chain state being: Constant to ensure sum of probabilities is 1 Visit often enough to allow those states with low value of f() to become recognizable

Simulated Annealing

Simulated Annealing Large T values Small T values All states in current states neighborhood are chosen with ~ equal probability Stationary distribution of chain tends to be uniform Small T values Different states in neighborhoods have much different stationary distribution probabilities Too small might get stuck in local maxima

Simulated Annealing Art of picking T value Want rapid movement from one neighborhood to another (Large T) Picks out states in neighborhoods with large stationary probabilities (Small T)

SA Example

Absorbing Markov Chains

Absorbing Markov Chains Absorbing state: State which is impossible to leave pii = 1 Transient state: Non-absorbing state in absorbing chain

Absorbing Markov Chains Questions to answer: Given chain starts at a particular state, what is the expected number of steps before being absorbed? Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state?

General Process Use Explanation from Introduction to Probability – Grinstead Convert matrix into canonical form Uses conversions to answer these questions Use simple example throughout

Canonical Form Rearrange states so that the transient states come first in P t x t matrix t x r matrix r x r identity matrix r x t zero matrix t : # of transient states r : # of absorbing states

Drunkard’s Walk Example Man walking home from a bar 4 blocks to walk 5 states total Absorbing states: Corner 4 – Home Corner 0 – Bar Each block he has an equal probability of going forward or backward

Drunkard’s Walk Example

Drunkard’s Walk : Canonical Form

Fundamental Matrix For an absorbing Markov Chain P Fundamental Matrix for P is: nij entry gives expected number of times that the process is in the transient state sj if started in transient state si (Before being absorbed)

Proof

Proof Let si and sj be two transient states Let be random variable 1 : if chain is in state sj after k steps 0 : otherwise

Proof Expected # of times chain is in state sj in the first n steps: As n goes to infinity

Example Fundamental Matrix Canonical form

Time to Absorption Expected number of steps before chain is absorbed. ti is expected number of steps before chain is absorbed, Given it started in si. Vector with elements ti Column vector of 1’s

Proof Sum of the ith row of N: Expected number of times in any transient state for a given starting state si Expected time required before absorption This is what each value of t is

Example: Time to Absorption

Absorption Probabilities bij – probability that chain will be absorbed in absorbing state sj if starts in transient state si B – t x r matrix with entries of bij Other component of canonical matrix

Proof

Example: Absorption Probabilities

Absorbing Markov Chains Given chain starts at a particular state, what is the expected number of steps before being absorbed? Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state?

Interesting Markov Chain use

Sentence Creator Feed text into Markov chain to create transition matrix Holds the probability of going from word i to word j in a sentence Start at a particular word in the chain and use distributions to create new sentences

Dracula + Huckleberry Finn: Sentence Creator Dracula + Huckleberry Finn: This afternoon I don't know of humbug talky-talk, just set in, and perpetually violent. Then I saw, and looking tired them pens was a few minutes our sight.

End