Tutorial 9 EM and Beta distribution

Tutorial 9 EM and Beta distribution
What is the expectation maximization algorithm – Chuong B Do & Serfaim Batzoglou Computational Biology course – lecture 8 Tal Shor

Basic EM example Let there be 2 coins, A and B, with 𝜃 𝐴 , 𝜃 𝐵 chances of Head respectively. We randomly select a coin between A and B and flip it 10 times in a row. We do so 5 times. Let 𝑥= 𝑥 1 ,…, 𝑥 5 , 𝑥 𝑖 ∈[10] be the number of Heads observed for coin 𝑖 and 𝑧= 𝑧 1 ,…, 𝑧 5 , 𝑧 𝑖 ∈{𝐴,𝐵} be the coins type.

Maximum Likelihood In case we know which coins were flipped at each time, and the number of Heads for each coin, we can use ML. Recall that 𝐿 𝜃 = 𝜃 #𝐻 1−𝜃 #𝑇 Derives that 𝜃 = #𝐻 #𝐻+#𝑇 . Coin B Coin A 5H / 5T 9H / 1T 8H / 2T 4H / 6T 7H / 3T 9H / 11T 24H / 6T #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 𝜃 𝐵 = =0.45 𝜃 𝐴 = =0.8

Expectation maximization (EM)
Yet having all the information we have is not a common scenario. It is more likely to have parts for it. For example not knowing 𝑧, or in other word – which coin was flipped at each step. We do know the outcomes (𝑥) yet we can’t assume anything about the coins that were flipped. In cases as such, we would use EM. #T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H

Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.55 =0.45 0.2 =0.8 0.27 0.73 0.65 0.35 Coin B Coin A 𝜃 𝐵 0 =0.5 𝜃 𝐴 0 =0.6 E-step Coin B Coin A 2.8H / 2.8T 2.2H / 2.2T 1.8H / 0.2T 7.2H / 0.8T 2.1H / 0.5T 5.9H / 1.5T 2.6H / 3.9T 1.4H / 2.1T 2.5H / 1.1T 4.5H / 1.9T 11.7H / 8.4T 21.3H / 8.6T Coin B Coin A 𝜃 𝐵 1 =0.58 𝜃 𝐴 1 =0.71 Coin B Coin A 𝜃 𝐵 1 = 𝜃 𝐴 1 = M-Step

Coin B Coin A Coin B Coin A Coin B Coin A Coin B Coin A
#T #H 5 H T T T H H T H T H 1 9 H H H H T H H H H 2 8 H T H H H H H T H H 6 4 H T H T T T H H T T 3 7 T H H H T H H H T H Coin B Coin A 0.7 =0.3 0.19 =0.81 0.30 0.70 0.8 0.2 0.42 0.58 E-step Coin B Coin A 3.5H / 3.5T 1.5H / 1.5T 1.7H / 0.2T 7.3H / 0.8T 2.4H / 0.6T 5.6H / 1.4T 3.2H / 4.8T 0.8H / 1.2T 2.9H / 1.3T 4.1H / 1.7T 13.7H / 10.4T 19.3H / 6.6T Coin B Coin A 𝜃 𝐵 1 =0.58 𝜃 𝐴 1 =0.71 Coin B Coin A 𝜃 𝐵 1 = 𝜃 𝐴 1 = M-Step

Results This series converges around 10 iterations at
𝜃 𝐴 10 ≈0.80, 𝜃 𝐵 10 ≈0.52 Which is close to our ML results of 𝜃 𝐴 ≈0.80, 𝜃 𝐵 ≈0.45 Even though we know quite a lot less.

EM example – blood type As you may recall from the 2nd tutorial, there are 4 blood-types (phenotype) {O, A, B, AB} And there are 6 blood-type genotype – {o/o, o/a, a/a, o/b, b/b, a/b} While phenotype is a deterministic function of genotype, genotype cannot be determined by phenotype alone.

Blood type model Assume that the probability for a random individual having a, b, or o allels are 𝜃 𝑎 , 𝜃 𝑏 , 𝜃 𝑜 respectively. Probabilities of the genotypes are now 𝜃 𝑎b =2 𝜃 𝑎 𝜃 𝑏 ; 𝜃 𝑎a = 𝜃 𝑎 2 ; 𝜃 𝑎o =2 𝜃 𝑎 𝜃 𝑜 𝜃 𝑏b = 𝜃 𝑏 2 ; 𝜃 𝑏𝑜 = 𝜃 𝑜 𝜃 𝑎 ; 𝜃 𝑜o = 𝜃 𝑜 2 That way we get the conditional probabilities of Pr 𝑝ℎ𝑒=𝐴 Θ = 𝜃 𝑎𝑎 + 𝜃 𝑜𝑎 = 𝜃 𝑎 2 +2 𝜃 𝑜 𝜃 𝑎 Pr 𝑝ℎ𝑒=𝐵 Θ = 𝜃 𝑏𝑏 + 𝜃 𝑜𝑏 = 𝜃 𝑏 2 +2 𝜃 𝑜 𝜃 𝑏 Pr 𝑝ℎ𝑒=𝐴𝐵 Θ = 𝜃 𝑎𝑏 =2 𝜃 𝑎 𝜃 𝑏 Pr 𝑝ℎ𝑒=𝑂 Θ = 𝜃 𝑜𝑜 = 𝜃 𝑜 2

Expectation for an individual
o count b count a count Phenotype Prob Genotype 1∗ 2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑎 1∗ 2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑎 +2∗ 𝜃 𝑎 2 𝜃 𝑜 + 𝜃 𝑎 A 2 𝜃 𝑎 𝜃 𝑜 a/o 𝜃 𝑎 2 a/a 1∗ 2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑏 1∗ 2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑏 +2∗ 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑏 B 2 𝜃 𝑏 𝜃 𝑜 b/o 𝜃 𝑏 2 b/b 1 AB 2 𝜃 𝑎 𝜃 𝑏 a/b 2 O 𝜃 𝑜 2 o/o

Expectation formulas Let 𝑛 𝐴 , 𝑛 𝐵 , 𝑛 𝑂 , 𝑛 𝐴𝐵 be the number of individual with the corresponding phenotypes. 𝐸 #𝑎 = 𝑛 𝐴 × 2𝜃 𝑜 + 𝜃 𝑎 2 𝜃 𝑜 + 𝜃 𝑎 + 𝑛 𝐴𝐵 ×1 𝐸 #𝑏 = 𝑛 𝐵 × 2 𝜃 𝑜 + 𝜃 𝑏 2 𝜃 𝑜 + 𝜃 𝑏 + 𝑛 𝐴𝐵 ×1 𝐸 #𝑜 = 𝑛 𝐴 ×2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑏 + 𝑛 𝐵 ×2 𝜃 𝑜 2 𝜃 𝑜 + 𝜃 𝑎 + 𝑛 𝑂 ×2

Maximization Formulas
𝜃 𝑎 = 𝐸 #𝑎 2𝑛 ; 𝜃 𝑏 = 𝐸 #𝑏 2𝑛 ; 𝜃 𝑜 = 𝐸 #𝑜 2𝑛 ; Combining the E and M formulas, gives us the iterative update formula 𝜃 𝑎 𝑖+1 = 𝑛 𝐴 ×2 𝜃 𝑜 𝑖 + 𝜃 𝑎 𝑖 2 𝜃 𝑜 𝑖 + 𝜃 𝑎 𝑖 + 𝑛 𝐴𝐵 ×1 2𝑛 𝜃 𝑏 𝑖+1 = 𝑛 𝐵 ×2 𝜃 𝑜 𝑖 + 𝜃 𝑏 𝑖 2 𝜃 𝑜 𝑖 + 𝜃 𝑏 𝑖 + 𝑛 𝐴𝐵 ×1 2𝑛 𝜃 𝑜 𝑖+1 =1− 𝜃 𝑎 𝑖+1 − 𝜃 𝑏 𝑖+1

Beta Distribution example
In baseball there is a term called the “Batting average” – basically describing the percentage of times the player hit the ball. A .266 Batting average is considered the average hitting rate, and .3 is considered an excellent one. We need a prior distribution that can reasonably range from .21 to .35 while maintaining a mean of ~.27. Note that a season’s worth of games is around 300 swings

Prior distribution Under those assumptions (our professional opinion), we can pick 𝛼=81, 𝛽=219

With evidence We’ve followed the performance of Joe DiMaggio for a season. He has hit 100 times out of his 300 swings. As we’ve seen in class, his new hitting distribution will now be 𝐵𝑒𝑡𝑎(181, 419). Notice the curve is now both thinner and shifted to the right (higher batting average) than it used to be- we have a better sense of what the player's batting average is.

Beta distribution example 2
Let there be 2 mints The first – the government mint, creates 90% of the coins, and they are fair. The second – A pirate mint, creates the rest with coins that are 55% likely to draw Head. For a government, fair, coin we have quite the assurance, therefore our prior would be 𝐵𝑒𝑡𝑎(450,450) A pirated coin has a smaller assurance of 𝐵𝑒𝑡𝑎(45, 55)

Prior distribution Let 𝜃 be the probability to draw Head.
𝑃 𝜃 =0.9∗𝐵𝑒𝑡𝑎 450, ∗𝐵𝑒𝑡𝑎(55,45) Let’s say that we check whether the coin is pirated or not by checking 𝑃(𝜃>0.525). In this case we get that 𝑃 𝜃<0.525 =0.903 meaning that it’s most likely from a government mint (close to the known 90%).

Post distribution We’ve flipped a coin for 28 times and got 10 Tails and 18 Heads. Our updated probability would be 𝑃 𝜃 = 0.9∗𝐵𝑒𝑡𝑎 , ∗𝐵𝑒𝑡𝑎 55+18,45+10 Given the outcome, our probability is now 𝑃 𝜃<0.525 =0.838, heavily gnawing at our assurance that our coin is authentic with an almost esoteric number of tosses.

Tutorial 9 EM and Beta distribution

Similar presentations

Presentation on theme: "Tutorial 9 EM and Beta distribution"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tutorial 9 EM and Beta distribution

Similar presentations

Presentation on theme: "Tutorial 9 EM and Beta distribution"— Presentation transcript:

Similar presentations

About project

Feedback