Download presentation
Presentation is loading. Please wait.
Published byDella Gordon Modified over 6 years ago
1
ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains
September 1, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 2010
2
“States” can be labeled (0,)1,2,3,…
Simple DTMCs e 1 p q 1-q 1-p 1 a c b f d 2 “States” can be labeled (0,)1,2,3,… At every time slot a “jump” decision is made randomly based on the current state (Sometimes the arrow pointing back to the same state is omitted)
3
The walker flips a coin every time slot to decide which way to go
1-D Random Walk 1-p p X(t) Time is slotted The walker flips a coin every time slot to decide which way to go
4
Consider a queue at a supermarket In every time slot:
Single Server Queue Geom(q) Bernoulli(p) Consider a queue at a supermarket In every time slot: A customer arrives with probability p The HoL customer leaves with probability q We’d like to learn about the behavior of such a system
5
Birth-Death Chain 1 2 3 Our queue can be modeled by a Birth-Death Chain (a.k.a. Geom/Geom/1 queue) Want to know: Queue size distribution Average waiting time, etc. Markov Property The “Future” is independent of the “Past” given the “Present” In other words: Memoryless We’ve mentioned memoryless distributions: Exponential and Geometric Useful for modeling and analyzing real systems
6
Discrete Time Random Process (DTRP)
Random Process: An indexed family of random variables Let Xn be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random variables with common cdf FX(x). This sequence is called the i.i.d. random process. Example: Sequence of Bernoulli trials (flip of coin) In networking: traffic may obey a Bernoulli i.i.d. arrival Pattern In reality, some degree of dependency/correlation exists between consecutive elements in a DTRP Example: Correlated packet arrivals (video/audio stream)
7
Discrete Time Markov Chains
A sequence of random variables {Xn} is called a Markov Chain if it has the Markov property : States are usually labeled {(0,)1,2,…} State space can be finite or infinite Transition Probability Matrix Probability of transitioning from state i to state j We will assume the MC is homogeneous/stationary: independent of time Transition probability matrix: P = {pij} Two state MC:
8
Stationary Distribution
Define then pk+1 = pk P (p is a row vector) Stationary distribution: if the limit exists If p exists, we can solve it by These are called balance equations Transitions in and out of state i are balanced
9
General Comment & Conditions for p to Exist (I)
If we partition all the states into two sets, then transitions between the two sets must be “balanced”. This can be easily derived from the balance equations Definitions: State j is reachable by state i if State i and j commute if they are reachable by each other The Markov chain is irreducible if all states commute
10
Conditions for p to Exist (I) (cont’d)
Condition: The Markov chain is irreducible Counter-examples: Aperiodic Markov chain … Counter-example: p=1 1 2 3 4 1 2 3 1 1 1 1 2
11
Conditions for p to Exist (II)
For the Markov chain to be recurrent… All states i must be recurrent, i.e. Otherwise transient With regards to a recurrent MC … State i is recurrent if E(Ti)<1, where Ti is time between visits to state i Otherwise the state is considered null-recurrent
12
Solving for p: Example for two-state Markov Chain
1 1-q q p
13
Arrival w.p. p ; departure w.p. q Let u = p(1-q), d = q(1-p), r = u/d
Birth-Death Chain 1-u-d 1-u-d 1-u-d u 1-u u u u 1 2 3 d d d d Arrival w.p. p ; departure w.p. q Let u = p(1-q), d = q(1-p), r = u/d Balance equations:
14
Birth-Death Chain (cont’d)
Continuing this pattern, we observe that: p(i-1)u = p(i)d Equivalently, we can draw a bi-section between state i and state i-1 Therefore, we have where r = u/d. What we are interested in is the stationary distribution of the states, so …
15
Birth-Death Chain (cont’d)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.