Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haim Kaplan and Uri Zwick

Similar presentations


Presentation on theme: "Haim Kaplan and Uri Zwick"— Presentation transcript:

1 Haim Kaplan and Uri Zwick
Introduction to Markov chains (part 1) Haim Kaplan and Uri Zwick Algorithms in Action Tel Aviv University Last updated: April

2 (Finite, Discrete time) Markov chain
A sequence 𝑋 0 ,𝑋 1 , 𝑋 2 ,… of random variables Each 𝑋 𝑗 takes a value from some finite set 𝑆 Satisfy the Markov property: 𝑃 𝑋 𝑗 = 𝑠 𝑗 ∣ 𝑋 𝑗−1 = 𝑠 𝑗−1 , 𝑋 𝑗−2 = 𝑠 𝑗−2 ,…, 𝑋 0 = 𝑠 0 = 𝑃 𝑋 𝑗 = 𝑠 𝑗 ∣ 𝑋 𝑗−1 = 𝑠 𝑗−1

3 Homogenous Markov chain
𝑃 𝑋 𝑛 = 𝑠 𝑗 ∣ 𝑋 𝑛−1 = 𝑠 𝑖 = 𝑃 𝑖𝑗 (𝑛) The chain in homogenous if this probability does not depend on 𝑛 𝑃 𝑋 𝑛 = 𝑠 𝑗 ∣ 𝑋 𝑛−1 = 𝑠 𝑖 = 𝑃 𝑖𝑗 Transition matrix: 𝑃= 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 44 We will use 𝑃 also as the name of the Markov chain

4 Properties of the transition matrix
Stochastic: Sum of each row is 1 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 =

5 Properties of the transition matrix
So 1 is an eigenvalue with as an eigenvector How about other eigenvalues ? Any eigenvalue 𝜆 must satisfy 𝜆 ≤1 Prove by using L_infty

6 Random walk on a graph We can think of an homogenous Markov chain as a random walk in a graph where each node corresponds to a state and there is an edge between state 𝑖 to state 𝑗 if 𝑃 𝑖𝑗 >0 1 2 4 3 1/2 0 1/2 0 1/2 1/2 0 1/ /2 0 1/2 1/2 0 1/2 0 1/2 1/2

7 Note: Self loops are allowed
More examples 1/2 9/10 1 2 1/2 1/10 rain sun Note: Self loops are allowed

8 Pick an outgoing edge uniformly at random
Random walk on a graph Pick an outgoing edge uniformly at random 1/4 1/4 1/4 1/4 The Web graph: Nodes are pages, from page 𝑝 we go to one of the pages that 𝑝 links to with equal probability. Let 𝑻 denote this Markov chain

9 The Web (with restarts)
Modify this Markov chain as follows: 𝑃=𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 𝐸 is the all 1’s matrix, 0<𝛼<1 This is the chain Google use for ranking pages

10 Card shuffling (motivation)
We want to start the game with a uniform random permutation of the deck Each permutation should appear with probability 1/52! ≈ 1/2257 ≈ 1/1077. How do we draw such a permutation ?

11 Card shuffling (motivation)
- Random Transpositions Pick two cards i and j uniformly at random with replacement, and switch cards i and j; repeat. - Top-in-at-Random: Take the top card and insert it at one of the n positions in the deck chosen uniformly at random; repeat. - Riffle Shuffle:

12 Card shuffling – Markov chain model
Each state is a permutation of a deck of cards We have a transition from a state 𝑠 1 to a state 𝑠 2 if we can obtain 𝑠 2 from 𝑠 1 by a step of the shuffle The probability of the transition can be derived from the definition of the shuffle

13 Card shuffling – Markov chain model
Top in at random – 3 card example 1 3 1 3 C1 C2 C3 C2 C3 C1 C2 C1 C3 C3 C1 C2 C3 C2 C1 C1 C3 C2

14 Card Shuffling - Riffle Shuffle:
a. Split the deck into two parts according to the binomial distribution Bin(n, 1/2). b. Drop the cards in a sequence where the next card comes from the left hand 𝐿 (resp. right hand 𝑅) with probability 𝐿 𝐿 + 𝑅 (rep. 𝑅 𝐿 + 𝑅 ) Step b if the Riffle Shuffle: L and R are the running sizes of the left and right hand side stacks during the riffle Step b of the riffle shuffle is equivalent to taking a random interleaving of the two decks (exercise)

15 Number of repetitions to sample a uniform permutation?
Best shuffle ? Number of repetitions to sample a uniform permutation? almost (say within 20% from uniform) - Top-in-at-Random: - Riffle Shuffle: - Random Transpositions < 411 repetitions 20% is just an arbitrary constant; the precise number does not really matter (wait a few slides) < 278 repetitions 8 repetitions

16 Independent sets We have a fixed graph 𝐺=(𝑉,𝐸)
Each state corresponds to an independent set of 𝐺

17 Independent sets Transitions: Pick a vertex 𝑣 uniformly at random, flip a coin. Heads  switch to 𝐼∪ 𝑣 if 𝐼∪ 𝑣 is an independent set Tails  switch to 𝐼∖ 𝑣 1 2𝑛

18 q-colorings We have a fixed graph 𝐺=(𝑉,𝐸)
Each state corresponds to a valid q-coloring of 𝐺 (coloring by {1,…,𝑞}, the endpoints of each edge are colored differently) Transitions: Pick a vertex 𝑣 uniformly at random, pick a (new) color for 𝑣 uniformly at random from the set of colors not attained by a neighbor of 𝑣

19 q-colorings Transitions: Pick a vertex 𝑣 uniformly at random, pick a (new) color for 𝑣 uniformly at random from the set of colors not attained by a neighbor of 𝑣 𝑞=5 1 4𝑛

20 Walking on a Markov chain
What is the probability of going from state 4 to state 3 in exactly three steps ? 4 2 1 3 1 4 1 3 3 4 1 2 1 6 𝑃=

21 Walking on a Markov chain
What is the probability of going from state 4 to state 3 in exactly three steps ? 1 1 4 4 2 3 1 3 3 4 1 2 1 6 1 4 1 4 3 4 4 1 3

22 Walking on a Markov chain
What is the probability of going from state 4 to state 3 in exactly three steps ? 1 1 4 4 2 3 1 3 3 4 1 2 1 6 1 4 1 4 3 4 4 1 3 1 2 1 6 1 4 1 2 1 3 1 4 4 2 3 1 2 3

23 Walking on a Markov chain
What is the probability of going from state 4 to state 3 in exactly three steps ? 1 1 4 4 2 3 1 3 3 4 1 2 1 6 1 4 1 4 3 4 4 1 3 1 2 1 6 1 4 1 2 1 3 1 4 4 2 3 1 2 3 1 2 1 4 1 2 1 4 1 6 1 2 1 4 3 4 1 4 1 3 1 4 1 3 2 1 2 3 4 2 3 2 1 2 3

24 Walking on a Markov chain
What is the probability of going from state 4 to state 3 in exactly three steps ? 1 1 4 4 2 3 1 3 3 4 1 2 1 6 1 4 1 4 3 4 4 1 3 1 2 1 6 1 4 1 2 1 3 1 4 4 2 3 1 2 3 1 2 1 4 1 2 1 4 1 6 1 2 1 4 3 4 1 4 1 3 1 4 1 3 2 1 2 3 4 2 3 2 1 2 3

25 Walking on a Markov chain
What is the probability of going from state 0 to state 3 in exactly three steps ? 1 1 4 4 2 3 1 3 3 4 1 2 1 6 1 4 1 4 3 4 4 1 3 1 2 1 6 1 4 1 2 1 3 1 4 4 2 3 1 2 3 1 2 1 4 1 2 1 4 1 6 1 2 1 4 3 4 1 4 1 3 1 4 1 3 2 1 2 3 4 2 3 2 1 2 3 =

26 Multiplying the transition matrix
Let 𝑥 0 =x=( x 1 , x 2 , x 3 , x 4 ) be the distribution of the starting position x 1 , x 2 , x 3 , x 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 44 =( x 1 1 , x 2 1 , x 3 1 , x 4 1 ) 0,0,0, =( 1 4 ,0, 3 4 ,0)

27 Multiplying the transition matrix
( x 1 1 , x 2 1 , x 3 1 , x 4 1 ) 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 44 =( x 1 2 , x 2 2 , x 3 2 , x 4 2 ) 1 4 ,0, 3 4 , = 3 8 , , , 1 8

28 Multiplying the transition matrix
The general rule: 𝑥 𝑃 𝑘 = 𝑥 𝑘 𝑥 𝑘 is our distribution after 𝑘 steps if the initial distribution is 𝑥 Entry 𝑖𝑗 of 𝑃 𝑘 gives the probability to move from state 𝑖 to state 𝑗 in 𝑘 steps.

29 Stationary distribution
A distribution 𝜋 such that 𝜋𝑃=𝜋 𝜋 is a left-eigenvector of the eigenvalue 1 If the initial distribution is 𝜋 then it remains 𝜋 after any number of steps Desired properties: Stationary dist. 𝜋 exists (always) Unique (irreducible) Convergence (irreducible + aperiodic)

30 Distance between distributions
Total variation distance between 𝑝 1 and 𝑝 2 𝑝 1 − 𝑝 2 𝑣𝑑 ≡ 1 2 𝑠 𝑝 1 𝑠 − 𝑝 2 𝑠 = 𝑝 1 − 𝑝 2 1 1 I will just use |𝑥| for 𝑥 1 𝑝 1 𝑝 2 𝑆

31 We denote this also by 𝑥 𝑃 𝑘 𝑘→∞ 𝜋
Convergence We want that lim 𝑘→∞ 𝑥 𝑃 𝑘 −𝜋 𝑣𝑑 = ∀𝑥 1 𝑆 𝑥 𝑃 𝑘 𝜋 We denote this also by 𝑥 𝑃 𝑘 𝑘→∞ 𝜋

32 A simple situation A simple special case where there is a stationary distribution with the desired properties: Assume that 𝑃 has 𝑛 distinct eigenvalues and 1> 𝜆 1 > 𝜆 2 >…> 𝜆 𝑛 >−1 So we have 𝑛 corresponding eigenvectors that span ℝ 𝑛 (eigenvectors of different eigenvalues are linearly independent)

33 Power iteration 𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 =
𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 = (𝑐 1 𝑣 1 + 𝑐 2 𝑣 2 +…+ 𝑐 𝑛 𝑣 𝑛 ) 𝑃 𝑘 = (𝑐 1 𝜆 1 𝑘 𝑣 1 + 𝑐 2 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 )= 𝜆 1 𝑘 𝑐 1 𝑣 1 + 𝑐 𝜆 2 𝜆 1 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝜆 1 𝑘 𝑣 𝑛 = 𝜆 1 =1 𝑐 1 𝑣 1 + 𝑐 2 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 Rearrange, take absolute value:

34 Power iteration ≤ 𝑐 2 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 →0 
𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 − 𝑐 1 𝑣 1 = 𝑐 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 ≤ 𝑐 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 →0 lim 𝑘→∞ 𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 − 𝑐 1 𝑣 1 =0 Since 𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 is a distribution so is 𝑐 1 𝑣 1 =𝜋 The limit is unique

35 Conditions for convergence to a stationary distribution

36 Irreducible Markov chain
A Markov chain is irreducible if for every pair (𝑖,𝑗) of states ∃𝑛 such that 𝑃 𝑖𝑗 𝑛 >0 ⇔ The transition graph representing the chain is strongly connected This already guarantees a unique stationary distribution

37 Aperiodic Markov chain
𝑑 𝑠 𝑖 =gcd{n≥1∣ 𝑃 𝑛 𝑖𝑖 >0} The gcd of the lengths of all walks from 𝑠 𝑖 to 𝑠 𝑖 𝑠 𝑖

38 Examples 4 1 2 3 𝑑 𝑠 4 =gcd{2,3,…}=1 𝑑 𝑠 2 =gcd{1,2,…}=1 1 4 1 3 1 1 2
1 6 𝑑 𝑠 4 =gcd{2,3,…}=1 3 4 1 4 𝑑 𝑠 2 =gcd{1,2,…}=1 3 1 4

39 Examples 1 2 4 3 1/2 𝑑 𝑠 4 =gcd{2,4,…}=2

40 Aperiodic Markov chain
𝑠 𝑖 is aperiodic if 𝑑 𝑠 𝑖 =1 The Markov chain is aperiodic if 𝑑 𝑠 𝑖 =1 ∀𝑖

41 The fundamental theorem
If 𝑃 is irreducible and aperiodic then it has a unique stationary distribution 𝜋 and 𝑥 𝑃 𝑘 𝑘→∞ 𝜋 ∀𝑥

42 The Web (with restarts)
𝑃=𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 𝐸 is the all 1’s matrix and 𝑇 is defined by the graph induced by the Web This chain is irreducible and aperiodic (why?) So there is a unique stationary distribution 𝜋 The distribution 𝜋 is used by Google for ranking pages: A page with higher probability by 𝜋 is more important

43 Google’s page ranking Crawl and collect all Web pages
Compute a reverse index: For each word prepare a list of all pages containing it Compute the stationary distribution 𝜋 associated with the Web Markov chain Query: Use the reverse index to find all the answers, sort then by 𝜋, present the results

44 Computing a stationary distribution
𝜋𝑃=𝜋 1) Can solve a system of linear equations 2) Use power iteration: 𝑥 1 , 𝑥 2 , ………, 𝑥 𝑛 𝑃 𝑘 −𝜋 ≤ 𝑐 𝜆 2 𝑘 𝑣 2 +…+ 𝑐 𝑛 𝜆 𝑛 𝑘 𝑣 𝑛 →0

45 The Web (with restarts)
𝑃=𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 So what is the second largest eigenvalue of 𝑃 ? Call this eigenvalue 𝜆 (<1) An easy calculation shows that 𝜆 ≤𝛼 We may expect faster convergence as 𝛼 decreases But we do not want to decrease 𝛼 too much as we want out chain to resemble the Web graph

46 Let 𝑣 be an (left) eigenvector of an eigenvalue 𝜆<1
𝜆 ≤𝛼 Observation: Let 𝑣 be an (left) eigenvector of an eigenvalue 𝜆<1 (𝑣𝑃=𝜆𝑣) then 𝑣 1 1 … 1 =0 Why?

47 𝜆 ≤𝛼 Proof: 𝑣 1 1 … 1 =𝑣𝑃 1 1 … 1 =𝜆𝑣 1 1 … 1 𝜆<1 𝑣 1 1 … 1 =0

48 𝜆 ≤𝛼 By the observation 𝑣𝑃=𝑣 𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 =𝛼𝑣𝑇
𝑃=𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 By the observation 𝑣𝑃=𝑣 𝛼𝑇+ 1−𝛼 1 𝑛 𝐸 =𝛼𝑣𝑇 Since v is an eigenvector of P with eigenvalue 𝜆: 𝜆𝑣=𝛼𝑣𝑇 𝜆 𝛼 𝑣=𝑣𝑇 ⇒ 𝜆 𝛼 is an eigenvalue of T But T is stochastic 𝜆 𝛼 ≤1 ⇒ 𝜆 ≤𝛼

49 Mixing time 𝑑 𝑡 = max 𝑥 𝑥 𝑃 𝑡 −𝜋 𝑣𝑑
We can prove that 𝑑(𝑡) is monotonic decreasing in 𝑡 𝑡 𝑚𝑖𝑥 𝜖 = min 𝑡 𝑑 𝑡 ≤𝜖 𝑡 𝑚𝑖𝑥 =𝑡 𝑚𝑖𝑥 ≡ min 𝑡 𝑑 𝑡 ≤ 1 4 We can prove that 𝑡 𝑚𝑖𝑥 𝜖 = log 2 (1/𝜖) 𝑡 𝑚𝑖𝑥

50 Back to shuffling (n cards)
- Top-in-at-Random: - Riffle Shuffle: - Random Transpositions ≤2𝑛ln(𝑛) ≤𝑛ln𝑛 + ln⁡(4)𝑛 20% is just an arbitrary constant; the precise number does not really matter (wait a few slides) ≤2 log 𝑛 3


Download ppt "Haim Kaplan and Uri Zwick"

Similar presentations


Ads by Google