11 - Markov Chains Jim Vallandingham
Outline Irreducible Markov Chains Monte Carlo Methods Outline of Proof of Convergence to Stationary Distribution Convergence Example Reversible Markov Chain Monte Carlo Methods Hastings-Metropolis Algorithm Gibbs Sampling Simulated Annealing Absorbing Markov Chains
Stationary Distribution As approaches Each row is the stationary distribution
Stationary Dist. Example
Stationary Dist. Example Long Term averages: 24% time spent in state E1 39% time spent in state E2 21% time spent in state E3 17% time spent in state E4
Stationary Distribution Any finite, aperiodic irreducible Markov chain will converge to a stationary distribution Regardless of starting distribution Outline of Proof requires linear algebra Appendix B.19
L.A. : Eigenvalues Let P be an s x s matrix. P has s eigenvalues Found as the s solutions to Assume all eigenvalues of P are distinct
L.A. : left & right eigenvectors Corresponding to each eigenvalue Is a right eigenvector - And a left eigenvector - For which: Assume they are normalized:
L.A. : Spectral Expansion Can express P in terms of its eigenvectors and eigenvalues: Called a spectral expansion of P
L.A. : Spectral Expansion If is an eigenvalue of P with corresponding left and right eigenvectors & Then is an eigenvalue of Pn with same left and right eigenvectors &
L.A. : Spectral Expansion Implies spectral expansion of Pn can be written as:
Outline of Proof Going back to proof… P has one eigenvalue, equal to 1 P is transition matrix for finite aperiodic irreducible Markov chain P has one eigenvalue, equal to 1 All other eigenvalues have absolute value < 1
Outline of Proof Choosing left and right eigenvectors of Requirements: Also satisfies : & = 1 Probability vector (sum to 1) Normalization (definition of left eigenvector as eigenvalue of 1)
Outline of Proof Same equation satisfied by the stationary distribution Also: Can be shown that there is a unique solution of this equation that also satisfies so so that
Outline of Proof Pn gives the n-step transition probabilities. Spectral Expansion of Pn is: So as n increases Pn approaches Only one eigenvalue is = 1. Rest are < 1
Convergence Example
Convergence Example Has Eigenvalues of :
Convergence Example Has Eigenvalues of : Less than 1
Convergence Example Left & Right eigenvectors satisfying
Convergence Example Left & Right eigenvectors satisfying Stationary distribution
Convergence Example Spectral expansion Stationary distribution
Reversible Markov Chains
Reversible Markov Chains Typically moving forward in ‘time’ in a Markov chain 1 2 3 … t What about moving backward in this chain? t t-1 t-2 … 1
Reversible Markov Chains Ancestor Back in time Forward in time Species A Species B
Reversible Markov Chains Have a finite irreducible aperiodic Markov chain with stationary distribution During t transitions, chain will move through states: Reverse chain Define Then reverse chain will move through states:
Reversible Markov Chains Want to show structure determining the reverse chain sequence is also a Markov chain Typical element found from typical element of P, using:
Reversible Markov Chains Shown by using Bayes rule to invert conditional probability Intuitively: The future is independent of the past, given the present The past is independent of the future, given the present
Reversible Markov Chains Stationary distribution of reverse chain is still Follows from Stationary distribution property
Reversible Markov Chains Markov chain is said to be reversible if This only holds if
Monte Carlo Methods
Markov Chain Monte Carlo Class of algorithms for sampling from probability distributions Involve constructing a Markov Chain Want to have stationary distribution State of chain after large number of steps is used as a sample of desired distribution We discuss 2 algorithms Gibbs Sampling Simulated Annealing
Basic Problem Find transition matrix P such that Its stationary distribution is the target distribution Know that Markov chain will converge to stationary distribution, regardless of initial distribution How can we find such a P with its stationary distribution as the target distribution?
Basic Idea Construct transition matrix Q “candidate generating matrix” Modify to have correct stationary distribution Modification involves inserting factors So that Various ways to picking a’s
Hastings-Metropolis Goal: construct aperiodic irreducible Markov chain Having prescribed stationary distribution Produces a correlated sequence of draws from the target density that may be difficult to sample using a classical independence method.
Hastings-Metropolis Choose set of constants Define Such that And Process: Choose set of constants Such that And Define Accept state change Reject state change Chain doesn’t change value
Hastings-Metropolis Example = (.4 .6) 1 2 .5 .9 .1 Q =
Hastings-Metropolis Example 1 2 .5 .9 .1 = (.4 .6) Q = 1 2 .5 .33 .67 P=
Hastings-Metropolis Example = (.4 .6) 1 2 .5 .33 .67 P= 1 2 .415 .585 .386 .614 P2= 1 2 .398 .602 P50=
Algorithmic Description Start with State E1, then iterate Propose E’ from q(Et,E’) Calculate ratio If a > 1, Accept E(t+1) = E’ Else Accept with probability of a If rejected, E(t+1) = Et
Gibbs Sampling
Gibbs Sampling Definitions Be the random vector Be the distribution of Assume We define a Markov chain whose states are the possible values of Y
Gibbs Sampling Enumerate vectors in some order Process Enumerate vectors in some order 1, 2,…,s Pick vector j with jth state in chain pij : 0 : if vectors i & j differ by more than 1 component If they differ by at most 1 component, y1*
Gibbs Sampling Assume Joint distribution p(X,Y) Looking to sample k values of X Begin with value of y0 Sample xi using p(X | Y = yi-1) Once xi is found use it to find yi p(Y | X = xi) Repeat k times
Visual Example
Gibbs Sampling Allows us to deal with univariate conditional distributions Instead of complex joint distributions Chain has stationary distribution of
Why is is Hastings-Metropolis ? If we define Can see that for Gibbs: When a is always 1
Simulated Annealing
Simulated Annealing Goal: Find (approximate) minimum of some positive function Function defined on an extremely large number of states, s And to find those states where this function is minimized Value of the function for state is:
Simulated Annealing Construct neighborhood of each state Process Construct neighborhood of each state Set of states “close” to the state Variable in Markov chain can move to a neighbor in one step Moves outside neighborhood not allowed
Simulated Annealing Requirements of neighborhood If is in neighborhood of then is in the neighborhood of Number of states in a neighborhood (N) is independent of that state Neighborhoods are linked so that chain can eventually make it from any Ej to any Em. If in state Ej, then the next move must be in neighborhood of Ej.
Simulated Annealing Uses a positive parameter T Aim is to have the stationary distribution of each Markov chain state being: Constant to ensure sum of probabilities is 1 Visit often enough to allow those states with low value of f() to become recognizable
Simulated Annealing
Simulated Annealing Large T values Small T values All states in current states neighborhood are chosen with ~ equal probability Stationary distribution of chain tends to be uniform Small T values Different states in neighborhoods have much different stationary distribution probabilities Too small might get stuck in local maxima
Simulated Annealing Art of picking T value Want rapid movement from one neighborhood to another (Large T) Picks out states in neighborhoods with large stationary probabilities (Small T)
SA Example
Absorbing Markov Chains
Absorbing Markov Chains Absorbing state: State which is impossible to leave pii = 1 Transient state: Non-absorbing state in absorbing chain
Absorbing Markov Chains Questions to answer: Given chain starts at a particular state, what is the expected number of steps before being absorbed? Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state?
General Process Use Explanation from Introduction to Probability – Grinstead Convert matrix into canonical form Uses conversions to answer these questions Use simple example throughout
Canonical Form Rearrange states so that the transient states come first in P t x t matrix t x r matrix r x r identity matrix r x t zero matrix t : # of transient states r : # of absorbing states
Drunkard’s Walk Example Man walking home from a bar 4 blocks to walk 5 states total Absorbing states: Corner 4 – Home Corner 0 – Bar Each block he has an equal probability of going forward or backward
Drunkard’s Walk Example
Drunkard’s Walk : Canonical Form
Fundamental Matrix For an absorbing Markov Chain P Fundamental Matrix for P is: nij entry gives expected number of times that the process is in the transient state sj if started in transient state si (Before being absorbed)
Proof
Proof Let si and sj be two transient states Let be random variable 1 : if chain is in state sj after k steps 0 : otherwise
Proof Expected # of times chain is in state sj in the first n steps: As n goes to infinity
Example Fundamental Matrix Canonical form
Time to Absorption Expected number of steps before chain is absorbed. ti is expected number of steps before chain is absorbed, Given it started in si. Vector with elements ti Column vector of 1’s
Proof Sum of the ith row of N: Expected number of times in any transient state for a given starting state si Expected time required before absorption This is what each value of t is
Example: Time to Absorption
Absorption Probabilities bij – probability that chain will be absorbed in absorbing state sj if starts in transient state si B – t x r matrix with entries of bij Other component of canonical matrix
Proof
Example: Absorption Probabilities
Absorbing Markov Chains Given chain starts at a particular state, what is the expected number of steps before being absorbed? Given chain starts at a particular state, what is the probability it will be absorbed by a particular absorbing state?
Interesting Markov Chain use
Sentence Creator Feed text into Markov chain to create transition matrix Holds the probability of going from word i to word j in a sentence Start at a particular word in the chain and use distributions to create new sentences
Dracula + Huckleberry Finn: Sentence Creator Dracula + Huckleberry Finn: This afternoon I don't know of humbug talky-talk, just set in, and perpetually violent. Then I saw, and looking tired them pens was a few minutes our sight.
End