Download presentation
Presentation is loading. Please wait.
1
Tutorial 9 EM and Beta distribution
What is the expectation maximization algorithm β Chuong B Do & Serfaim Batzoglou Computational Biology course β lecture 8 Tal Shor
2
Basic EM example Let there be 2 coins, A and B, with π π΄ , π π΅ chances of Head respectively. We randomly select a coin between A and B and flip it 10 times in a row. We do so 5 times. Let π₯= π₯ 1 ,β¦, π₯ 5 , π₯ π β[10] be the number of Heads observed for coin π and π§= π§ 1 ,β¦, π§ 5 , π§ π β{π΄,π΅} be the coins type.
3
Maximum Likelihood In case we know which coins were flipped at each time, and the number of Heads for each coin, we can use ML. Recall that πΏ π = π #π» 1βπ #π Derives that π = #π» #π»+#π . Coin B Coin A 5H / 5T 9H / 1T 8H / 2T 4H / 6T 7H / 3T 9H / 11T 24H / 6T #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A π π΅ = =0.45 π π΄ = =0.8
4
Expectation maximization (EM)
Yet having all the information we have is not a common scenario. It is more likely to have parts for it. For example not knowing π§, or in other word β which coin was flipped at each step. We do know the outcomes (π₯) yet we canβt assume anything about the coins that were flipped. In cases as such, we would use EM. #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H
5
Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.55 =0.45 0.2 =0.8 0.27 0.73 0.65 0.35 Coin B Coin A π π΅ 0 =0.5 π π΄ 0 =0.6 E-step Coin B Coin A 2.8H / 2.8T 2.2H / 2.2T 1.8H / 0.2T 7.2H / 0.8T 2.1H / 0.5T 5.9H / 1.5T 2.6H / 3.9T 1.4H / 2.1T 2.5H / 1.1T 4.5H / 1.9T 11.7H / 8.4T 21.3H / 8.6T Coin B Coin A π π΅ 1 =0.58 π π΄ 1 =0.71 Coin B Coin A π π΅ 1 = π π΄ 1 = M-Step
6
Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.7 =0.3 0.19 =0.81 0.30 0.70 0.8 0.2 0.42 0.58 E-step Coin B Coin A 3.5H / 3.5T 1.5H / 1.5T 1.7H / 0.2T 7.3H / 0.8T 2.4H / 0.6T 5.6H / 1.4T 3.2H / 4.8T 0.8H / 1.2T 2.9H / 1.3T 4.1H / 1.7T 13.7H / 10.4T 19.3H / 6.6T Coin B Coin A π π΅ 1 =0.58 π π΄ 1 =0.71 Coin B Coin A π π΅ 1 = π π΄ 1 = M-Step
7
Results This series converges around 10 iterations at
π π΄ 10 β0.80, π π΅ 10 β0.52 Which is close to our ML results of π π΄ β0.80, π π΅ β0.45 Even though we know quite a lot less.
8
EM example β blood type As you may recall from the 2nd tutorial, there are 4 blood-types (phenotype) {O, A, B, AB} And there are 6 blood-type genotype β {o/o, o/a, a/a, o/b, b/b, a/b} While phenotype is a deterministic function of genotype, genotype cannot be determined by phenotype alone.
9
Blood type model Assume that the probability for a random individual having a, b, or o allels are π π , π π , π π respectively. Probabilities of the genotypes are now π πb =2 π π π π ; π πa = π π 2 ; π πo =2 π π π π π πb = π π 2 ; π ππ = π π π π ; π πo = π π 2 That way we get the conditional probabilities of Pr πβπ=π΄ Ξ = π ππ + π ππ = π π 2 +2 π π π π Pr πβπ=π΅ Ξ = π ππ + π ππ = π π 2 +2 π π π π Pr πβπ=π΄π΅ Ξ = π ππ =2 π π π π Pr πβπ=π Ξ = π ππ = π π 2
10
Expectation for an individual
o count b count a count Phenotype Prob Genotype 1β 2 π π 2 π π + π π 1β 2 π π 2 π π + π π +2β π π 2 π π + π π A 2 π π π π a/o π π 2 a/a 1β 2 π π 2 π π + π π 1β 2 π π 2 π π + π π +2β π π 2 π π + π π B 2 π π π π b/o π π 2 b/b 1 AB 2 π π π π a/b 2 O π π 2 o/o
11
Expectation formulas Let π π΄ , π π΅ , π π , π π΄π΅ be the number of individual with the corresponding phenotypes. πΈ #π = π π΄ Γ 2π π + π π 2 π π + π π + π π΄π΅ Γ1 πΈ #π = π π΅ Γ 2 π π + π π 2 π π + π π + π π΄π΅ Γ1 πΈ #π = π π΄ Γ2 π π 2 π π + π π + π π΅ Γ2 π π 2 π π + π π + π π Γ2
12
Maximization Formulas
π π = πΈ #π 2π ; π π = πΈ #π 2π ; π π = πΈ #π 2π ; Combining the E and M formulas, gives us the iterative update formula π π π+1 = π π΄ Γ2 π π π + π π π 2 π π π + π π π + π π΄π΅ Γ1 2π π π π+1 = π π΅ Γ2 π π π + π π π 2 π π π + π π π + π π΄π΅ Γ1 2π π π π+1 =1β π π π+1 β π π π+1
14
Beta Distribution example
In baseball there is a term called the βBatting averageβ β basically describing the percentage of times the player hit the ball. A .266 Batting average is considered the average hitting rate, and .3 is considered an excellent one. We need a prior distribution that can reasonably range from .21 to .35 while maintaining a mean of ~.27. Note that a seasonβs worth of games is around 300 swings
15
Prior distribution Under those assumptions (our professional opinion), we can pick πΌ=81, π½=219
16
With evidence Weβve followed the performance of Joe DiMaggio for a season. He has hit 100 times out of his 300 swings. As weβve seen in class, his new hitting distribution will now be π΅ππ‘π(181, 419). Notice the curve is now both thinner and shifted to the right (higher batting average) than it used to be- we have a better sense of what the player's batting average is.
17
Beta distribution example 2
Let there be 2 mints The first β the government mint, creates 90% of the coins, and they are fair. The second β A pirate mint, creates the rest with coins that are 55% likely to draw Head. For a government, fair, coin we have quite the assurance, therefore our prior would be π΅ππ‘π(450,450) A pirated coin has a smaller assurance of π΅ππ‘π(45, 55)
18
Prior distribution Let π be the probability to draw Head.
π π =0.9βπ΅ππ‘π 450, βπ΅ππ‘π(55,45) Letβs say that we check whether the coin is pirated or not by checking π(π>0.525). In this case we get that π π<0.525 =0.903 meaning that itβs most likely from a government mint (close to the known 90%).
19
Post distribution Weβve flipped a coin for 28 times and got 10 Tails and 18 Heads. Our updated probability would be π π = 0.9βπ΅ππ‘π , βπ΅ππ‘π 55+18,45+10 Given the outcome, our probability is now π π<0.525 =0.838, heavily gnawing at our assurance that our coin is authentic with an almost esoteric number of tosses.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.