Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 9 EM and Beta distribution

Similar presentations


Presentation on theme: "Tutorial 9 EM and Beta distribution"β€” Presentation transcript:

1 Tutorial 9 EM and Beta distribution
What is the expectation maximization algorithm – Chuong B Do & Serfaim Batzoglou Computational Biology course – lecture 8 Tal Shor

2 Basic EM example Let there be 2 coins, A and B, with πœƒ 𝐴 , πœƒ 𝐡 chances of Head respectively. We randomly select a coin between A and B and flip it 10 times in a row. We do so 5 times. Let π‘₯= π‘₯ 1 ,…, π‘₯ 5 , π‘₯ 𝑖 ∈[10] be the number of Heads observed for coin 𝑖 and 𝑧= 𝑧 1 ,…, 𝑧 5 , 𝑧 𝑖 ∈{𝐴,𝐡} be the coins type.

3 Maximum Likelihood In case we know which coins were flipped at each time, and the number of Heads for each coin, we can use ML. Recall that 𝐿 πœƒ = πœƒ #𝐻 1βˆ’πœƒ #𝑇 Derives that πœƒ = #𝐻 #𝐻+#𝑇 . Coin B Coin A 5H / 5T 9H / 1T 8H / 2T 4H / 6T 7H / 3T 9H / 11T 24H / 6T #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A πœƒ 𝐡 = =0.45 πœƒ 𝐴 = =0.8

4 Expectation maximization (EM)
Yet having all the information we have is not a common scenario. It is more likely to have parts for it. For example not knowing 𝑧, or in other word – which coin was flipped at each step. We do know the outcomes (π‘₯) yet we can’t assume anything about the coins that were flipped. In cases as such, we would use EM. #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H

5 Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.55 =0.45 0.2 =0.8 0.27 0.73 0.65 0.35 Coin B Coin A πœƒ 𝐡 0 =0.5 πœƒ 𝐴 0 =0.6 E-step Coin B Coin A 2.8H / 2.8T 2.2H / 2.2T 1.8H / 0.2T 7.2H / 0.8T 2.1H / 0.5T 5.9H / 1.5T 2.6H / 3.9T 1.4H / 2.1T 2.5H / 1.1T 4.5H / 1.9T 11.7H / 8.4T 21.3H / 8.6T Coin B Coin A πœƒ 𝐡 1 =0.58 πœƒ 𝐴 1 =0.71 Coin B Coin A πœƒ 𝐡 1 = πœƒ 𝐴 1 = M-Step

6 Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.7 =0.3 0.19 =0.81 0.30 0.70 0.8 0.2 0.42 0.58 E-step Coin B Coin A 3.5H / 3.5T 1.5H / 1.5T 1.7H / 0.2T 7.3H / 0.8T 2.4H / 0.6T 5.6H / 1.4T 3.2H / 4.8T 0.8H / 1.2T 2.9H / 1.3T 4.1H / 1.7T 13.7H / 10.4T 19.3H / 6.6T Coin B Coin A πœƒ 𝐡 1 =0.58 πœƒ 𝐴 1 =0.71 Coin B Coin A πœƒ 𝐡 1 = πœƒ 𝐴 1 = M-Step

7 Results This series converges around 10 iterations at
πœƒ 𝐴 10 β‰ˆ0.80, πœƒ 𝐡 10 β‰ˆ0.52 Which is close to our ML results of πœƒ 𝐴 β‰ˆ0.80, πœƒ 𝐡 β‰ˆ0.45 Even though we know quite a lot less.

8 EM example – blood type As you may recall from the 2nd tutorial, there are 4 blood-types (phenotype) {O, A, B, AB} And there are 6 blood-type genotype – {o/o, o/a, a/a, o/b, b/b, a/b} While phenotype is a deterministic function of genotype, genotype cannot be determined by phenotype alone.

9 Blood type model Assume that the probability for a random individual having a, b, or o allels are πœƒ π‘Ž , πœƒ 𝑏 , πœƒ π‘œ respectively. Probabilities of the genotypes are now πœƒ π‘Žb =2 πœƒ π‘Ž πœƒ 𝑏 ; πœƒ π‘Ža = πœƒ π‘Ž 2 ; πœƒ π‘Žo =2 πœƒ π‘Ž πœƒ π‘œ πœƒ 𝑏b = πœƒ 𝑏 2 ; πœƒ π‘π‘œ = πœƒ π‘œ πœƒ π‘Ž ; πœƒ π‘œo = πœƒ π‘œ 2 That way we get the conditional probabilities of Pr π‘β„Žπ‘’=𝐴 Θ = πœƒ π‘Žπ‘Ž + πœƒ π‘œπ‘Ž = πœƒ π‘Ž 2 +2 πœƒ π‘œ πœƒ π‘Ž Pr π‘β„Žπ‘’=𝐡 Θ = πœƒ 𝑏𝑏 + πœƒ π‘œπ‘ = πœƒ 𝑏 2 +2 πœƒ π‘œ πœƒ 𝑏 Pr π‘β„Žπ‘’=𝐴𝐡 Θ = πœƒ π‘Žπ‘ =2 πœƒ π‘Ž πœƒ 𝑏 Pr π‘β„Žπ‘’=𝑂 Θ = πœƒ π‘œπ‘œ = πœƒ π‘œ 2

10 Expectation for an individual
o count b count a count Phenotype Prob Genotype 1βˆ— 2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ π‘Ž 1βˆ— 2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ π‘Ž +2βˆ— πœƒ π‘Ž 2 πœƒ π‘œ + πœƒ π‘Ž A 2 πœƒ π‘Ž πœƒ π‘œ a/o πœƒ π‘Ž 2 a/a 1βˆ— 2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ 𝑏 1βˆ— 2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ 𝑏 +2βˆ— πœƒ π‘œ 2 πœƒ π‘œ + πœƒ 𝑏 B 2 πœƒ 𝑏 πœƒ π‘œ b/o πœƒ 𝑏 2 b/b 1 AB 2 πœƒ π‘Ž πœƒ 𝑏 a/b 2 O πœƒ π‘œ 2 o/o

11 Expectation formulas Let 𝑛 𝐴 , 𝑛 𝐡 , 𝑛 𝑂 , 𝑛 𝐴𝐡 be the number of individual with the corresponding phenotypes. 𝐸 #π‘Ž = 𝑛 𝐴 Γ— 2πœƒ π‘œ + πœƒ π‘Ž 2 πœƒ π‘œ + πœƒ π‘Ž + 𝑛 𝐴𝐡 Γ—1 𝐸 #𝑏 = 𝑛 𝐡 Γ— 2 πœƒ π‘œ + πœƒ 𝑏 2 πœƒ π‘œ + πœƒ 𝑏 + 𝑛 𝐴𝐡 Γ—1 𝐸 #π‘œ = 𝑛 𝐴 Γ—2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ 𝑏 + 𝑛 𝐡 Γ—2 πœƒ π‘œ 2 πœƒ π‘œ + πœƒ π‘Ž + 𝑛 𝑂 Γ—2

12 Maximization Formulas
πœƒ π‘Ž = 𝐸 #π‘Ž 2𝑛 ; πœƒ 𝑏 = 𝐸 #𝑏 2𝑛 ; πœƒ π‘œ = 𝐸 #π‘œ 2𝑛 ; Combining the E and M formulas, gives us the iterative update formula πœƒ π‘Ž 𝑖+1 = 𝑛 𝐴 Γ—2 πœƒ π‘œ 𝑖 + πœƒ π‘Ž 𝑖 2 πœƒ π‘œ 𝑖 + πœƒ π‘Ž 𝑖 + 𝑛 𝐴𝐡 Γ—1 2𝑛 πœƒ 𝑏 𝑖+1 = 𝑛 𝐡 Γ—2 πœƒ π‘œ 𝑖 + πœƒ 𝑏 𝑖 2 πœƒ π‘œ 𝑖 + πœƒ 𝑏 𝑖 + 𝑛 𝐴𝐡 Γ—1 2𝑛 πœƒ π‘œ 𝑖+1 =1βˆ’ πœƒ π‘Ž 𝑖+1 βˆ’ πœƒ 𝑏 𝑖+1

13

14 Beta Distribution example
In baseball there is a term called the β€œBatting average” – basically describing the percentage of times the player hit the ball. A .266 Batting average is considered the average hitting rate, and .3 is considered an excellent one. We need a prior distribution that can reasonably range from .21 to .35 while maintaining a mean of ~.27. Note that a season’s worth of games is around 300 swings

15 Prior distribution Under those assumptions (our professional opinion), we can pick 𝛼=81, 𝛽=219

16 With evidence We’ve followed the performance of Joe DiMaggio for a season. He has hit 100 times out of his 300 swings. As we’ve seen in class, his new hitting distribution will now be π΅π‘’π‘‘π‘Ž(181, 419). Notice the curve is now both thinner and shifted to the right (higher batting average) than it used to be- we have a better sense of what the player's batting average is.

17 Beta distribution example 2
Let there be 2 mints The first – the government mint, creates 90% of the coins, and they are fair. The second – A pirate mint, creates the rest with coins that are 55% likely to draw Head. For a government, fair, coin we have quite the assurance, therefore our prior would be π΅π‘’π‘‘π‘Ž(450,450) A pirated coin has a smaller assurance of π΅π‘’π‘‘π‘Ž(45, 55)

18 Prior distribution Let πœƒ be the probability to draw Head.
𝑃 πœƒ =0.9βˆ—π΅π‘’π‘‘π‘Ž 450, βˆ—π΅π‘’π‘‘π‘Ž(55,45) Let’s say that we check whether the coin is pirated or not by checking 𝑃(πœƒ>0.525). In this case we get that 𝑃 πœƒ<0.525 =0.903 meaning that it’s most likely from a government mint (close to the known 90%).

19 Post distribution We’ve flipped a coin for 28 times and got 10 Tails and 18 Heads. Our updated probability would be 𝑃 πœƒ = 0.9βˆ—π΅π‘’π‘‘π‘Ž , βˆ—π΅π‘’π‘‘π‘Ž 55+18,45+10 Given the outcome, our probability is now 𝑃 πœƒ<0.525 =0.838, heavily gnawing at our assurance that our coin is authentic with an almost esoteric number of tosses.


Download ppt "Tutorial 9 EM and Beta distribution"

Similar presentations


Ads by Google