Download presentation
Presentation is loading. Please wait.
Published byCori Agatha Terry Modified over 9 years ago
1
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger
2
2 Example: The ABO locus A locus is a particular place on the chromosome. Each locus’ state (called genotype) consists of two alleles – one paternal and one maternal. Some loci (plural of locus) determine distinguished features. The ABO locus, for example, determines blood type. Suppose we randomly sampled N individuals and found that N a/a have genotype a/a, N a/b have genotype a/b, etc. Then, the MLE is given by: The ABO locus has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. The first two genotypes determine blood type A, the next two determine blood type B, then blood type AB, and finally blood type O. We wish to estimate the proportion in a population of the 6 genotypes.
3
3 The ABO locus (Cont.) However, testing individuals for their genotype is a very expensive test. Can we estimate the proportions of genotype using the common cheap blood test with outcome being one of the four blood types (A, B, AB, O) ? The problem is that among individuals measured to have blood type A, we don’t know how many have genotype a/a and how many have genotype a/o. So what can we do ? We use the Hardy-Weinberg equilibrium rule that tells us that in equilibrium the frequencies of the three alleles a, b, o in the population determine the frequencies of the genotypes as follows: a/b = 2 a b, a/o = 2 a o, b/o = 2 b o, a/a = [ a ] 2, b/b = [ b ] 2, o/o = [ o ] 2. So now we have three parameters that we need to estimate.
4
4 The Likelihood Function Let X be a random variable with 6 values x a/a, x a/o,x b/b, x b/o, x a/b, x o/o denoting the six genotypes. The parameters are = { a, b, o }. The probability P(X= x a/b | ) = 2 a b. The probability P(X= x o/o | ) = o o. And so on for the other four genotypes. What is the probability of Data={B,A,B,B,O,A,B,A,O,B, AB} ? Obtaining the maximum of this function yields the MLE. This can be done by multidimensional Newton’s algorithm.
5
5 The EM algorithm in Bayes’ nets u E-step u Go over the data: u Sum the expectations of a hidden variables that you get from this data element u M-step u For every hidden variable x u Update your belief according to the expectation you calculated in the last E-step
6
6 EM - ABO Example a/b/o (hidden) A / B / AB / O (observed) Data type #people A 100 B 200 AB 50 O 50 We choose a “reasonable” = {0.2,0.2,0.6} = { a, b, o } is the parameter we need to evaluate
7
7 EM - ABO Example E-step: M-step: Phenotype (observed blood type) of person i Genotype in one locus (paternal)
8
8 EM - ABO Example E-step: we compute all the necessary elements
9
9 EM - ABO Example 0 = {0.2,0.2,0.6} n=400 (data size) E-step (1 st step): Data type #people A 100 B 200 AB 50 O 50
10
10 EM - ABO Example 0 = {0.2,0.2,0.6} n=400 (data size) Data type #people A 100 B 200 AB 50 O 50 E-step (1 st step):
11
11 EM - ABO Example 0 = {0.2,0.2,0.6} n=400 (data size) Data type #people A 100 B 200 AB 50 O 50 E-step (1 st step):
12
12 EM - ABO Example 0 = {0.2,0.2,0.6} n=400 (data size) M-step (1 st step):
13
13 EM - ABO Example 1 = {0.2,0.35,0.45} E-step (2 nd step):
14
14 EM - ABO Example M-step (2 nd step): 1 = {0.2,0.35,0.45}
15
15 EM - ABO Example 2 = {0.21,0.38,0.41} E-step (3 rd step):
16
16 EM - ABO Example M-step (3 rd step): 2 = {0.21,0.38,0.41} Small enough change
17
17 Gene Counting Until this point we looked at one of the two alleles from each individual. It is possible to use both alleles, with some modifications in the process. The next two slides are dedicated to this method.
18
18 Gene Counting Had we known the counts n a/a and n a/o (blood type A individuals), we could have estimated a from n individuals as follows (and similarly estimate b and o ): Can we compute what n a/a and n a/o are expected to be ? Using the current estimates of a and o we can as follows: We repeat these two steps until the parameters converge.
19
19 Gene Counting (example of EM) Input: Counts of each blood type n A, n B, n O, n AB of n people. Desired Output: ML estimate of allele frequencies a, b, o. Initialization: Set a, b, and o to arbitrary values (say, 1/3). Repeat E-step (Expectation): M-step (Maximization): Until a, b, and o converge
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.