The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer.

Slides:



Advertisements
Similar presentations
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Advertisements

ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Chapter 4. Discrete Probability Distributions Section 4.11: Markov Chains Jiaping Wang Department of Mathematical.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Continuous-Time Markov Chains Nur Aini Masruroh. LOGO Introduction  A continuous-time Markov chain is a stochastic process having the Markovian property.
CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.
Markov Chains.
COUNTING AND PROBABILITY
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Sections 4.1 and 4.2 Overview Random Variables. PROBABILITY DISTRIBUTIONS This chapter will deal with the construction of probability distributions by.
Markov Chains 1.
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
1.2 Row Reduction and Echelon Forms
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Probability Distributions
Evaluating Hypotheses
13. The Weak Law and the Strong Law of Large Numbers
Chapter 4: Probability (Cont.) In this handout: Total probability rule Bayes’ rule Random sampling from finite population Rule of combinations.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 60 Chapter 8 Markov Processes.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
ON MULTIVARIATE POLYNOMIAL INTERPOLATION
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Chapter 11: Random Sampling and Sampling Distributions
1. Conditional Probability 2. Conditional Probability of Equally Likely Outcomes 3. Product Rule 4. Independence 5. Independence of a Set of Events 1.
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Copyright © Cengage Learning. All rights reserved.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Advanced Counting Techniques
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Entropy Rate of a Markov Chain
Random Walks and Markov Chains Nimantha Thushan Baranasuriya Girisha Durrel De Silva Rahul Singhal Karthik Yadati Ziling Zhou.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Chapter 8. Section 8. 1 Section Summary Introduction Modeling with Recurrence Relations Fibonacci Numbers The Tower of Hanoi Counting Problems Algorithms.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology The Weak Law and the Strong.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
To be presented by Maral Hudaybergenova IENG 513 FALL 2015.
By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
MATH Section 3.1.
Central Limit Theorem Let X 1, X 2, …, X n be n independent, identically distributed random variables with mean  and standard deviation . For large n:
Search by quantum walk and extended hitting time Andris Ambainis, Martins Kokainis University of Latvia.
Goldstein/Schnieder/Lay: Finite Math & Its Applications, 9e 1 of 60 Chapter 8 Markov Processes.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Counting and Probability. Imagine tossing two coins and observing whether 0, 1, or 2 heads are obtained. Below are the results after 50 tosses Tossing.
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Let E denote some event. Define a random variable X by Computing probabilities by conditioning.
Markov Chains and Random Walks
What is Probability? Quantification of uncertainty.
Markov Chains Mixing Times Lecture 5
Discrete-time markov chain (continuation)
Independence and Counting
Independence and Counting
Presentation transcript:

The Rate of Concentration of the stationary distribution of a Markov Chain on the Homogenous Populations. Boris Mitavskiy and Jonathan Rowe School of Computer Science, University of Birmingham

Notation denotes a set, usually finite, called a search space is the fitness function

How Does an Evolutionary Computation Algorithm Work? is chosen randomly.

Selection:

Recombination is just some probabilistic rule which sends a given population

Recombination is just some probabilistic rule which sends a given population to the population

In our framework we only assume the following “weak purity” (in the sense of Radcliffe) about recombination: Recombination with probability 1.

Recombination In other words, homogenous populations stay homogenous with probability 1.

Mutation For every individual of the transformation population select a mutation where is the family of mutation transformations

Replace every individual of with the individual

This once again gives us a new population

Quotients of Markov Chains: is the state space of our irreducible Markov chain

Quotients of Markov Chains: is the state space of our irreducible Markov chain Partition into equivalence classes:

Quotients of Markov Chains: is the state space of our irreducible Markov chain How do we define the transition probabilities among the equivalence classes? ? ? ? ?? ? ? ? ?

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at.

Imagine that the chain runs for a very long time. Let denote the stationary distribution of this Markov chain. Suppose we run the original Markov chain For an extensive period of time. We are interested in computing the probability Of reaching given that we are at. An element arises with frequency. On the other hand, a given inside of occurs with relative frequency among the states inside of. We therefore, obtain the following transition probability formula:

Where is the probability of getting somewhere inside of Starting from. Computing is also quite easy:

Where is the probability of getting somewhere inside of Starting from. Computing is also quite easy:

Where is the probability of getting somewhere inside of Starting from. Computing is also quite easy:

Where is the probability of getting somewhere inside of Starting from. Computing is also quite easy: And so we finally obtain:

It is not surprising then that the quotient Markov chain is also irreducible and its stationary distribution is coherent with the original one. The irreducibility is left as an exercise. Let denote the distribution obtained from as follows: For every equivalence class we let.

It is not surprising then that the quotient Markov chain is also irreducible and its stationary distribution is coherent with the original one. The irreducibility is left as an exercise. Let denote the distribution obtained from as follows: For every equivalence class we let. It can be verified By direct computation that is the stationary distribution.

Although the transition probabilities of the quotient chain are defined in terms of the stationary distribution of the original chain they may still give us some useful information:

Although the transition probabilities of the quotient chain are defined in terms of the stationary distribution of the original chain they may still give us some useful information:

Although the transition probabilities of the quotient chain are defined in terms of the stationary distribution of the original chain they may still give us some useful information:

Although the transition probabilities of the quotient chain are defined in terms of the stationary distribution of the original chain they may still give us some useful information:

The quotient chain has the transition matrix

The unique stationary distribution is then given by the formulas

where and

And so where and

And so where and

What does this say about Markov chains modelling EAs? Recall that, the family of mutation transformations, is just the family of functions on. In practice, the transformations in are sampled independently with respect to some probability distribution. In practice, mutation happens with a small rate. This means that is very small but positive, and. The Markov chain modelling an EA involving such type of mutation is irreducible and, hence, has a unique stationary distribution.

What happens to the stationary distribution of a Markov chain modelling such an EA as the “mutation rate” ? Let denote the set of homogenous populations (i.e. populations consisting of repeated copies of the same individual). The state space of the Markov chain modelling EA is the set of all populations. Here we consider ordered populations so that the state space is.

What happens to the stationary distribution of a Markov chain modelling such an EA as the “mutation rate” ? Let denote the set of homogenous populations (i.e. populations consisting of repeated copies of the same individual). The state space of the Markov chain modelling EA is the set of all populations. Here we consider ordered populations so that the state space is. Let’s apply our previous general result to this Markov chain with and :

Destroying a given homogenous population is equivalent to applying the non-identity map to at least one of the individuals in the population. There are individuals in the population. And so it follows that

Destroying a given homogenous population is equivalent to applying the non-identity map to at least one of the individuals in the population. There are individuals in the population. And so it follows that The probability of passing from a non-homogenous population to a homogenous one is at least as large as the probability of consecutive drawings of The fittest individual in the population. The probability of drawing the best individual is bounded below by. Doing so independently times gives us a lower bound on the transition probability to the homogenous population upon the completion of selection. Afterwards, with probability everyone stays the same which means that the homogenous population is preserved. Thus

So, and It follows then that

So, and It follows then that How can we improve the bound?

So, and It follows then that How can we improve the bound? Well, a power of a Markov transition matrix has the same stationary distribution as the original matrix. We can therefore apply the general “quotient” inequality to the power of the Markov transition matrix instead of the original one. Combining this with Markov inequality finally gives us the following: