Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Similar presentations


Presentation on theme: ". Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger."— Presentation transcript:

1 . Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger

2 2 Genotype statistics Mendelian Genetics: locus - a particular location on a chromosome (genome) - Each locus has two copies – alleles (one paternal and one maternal) - Each copy has several relevant states - genotypes locus genotype is determined by the combined genotype of both copies. locus genotype yields phenotype (physical features) We wish to estimate the distribution of all possible genotypes. Suppose we randomly sample N individuals and found the number N s,t.  The MLE is given by: Sampling genotypes is costly Sampling phenotypes is cheap

3 3 Example: The ABO locus ABO locus determines blood-type It has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. They lead to four possible phenotypes: {A, B, AB, O} We wish to estimate the proportion in a population of the 6 genotypes. - Sample genotype – sequence a genomic region - Sample phenotype - checking presence of antibodies (simple blood test) Problem: phenotype doesn’t reveal genotype (in case of A,B)

4 4 Example: The ABO locus Problem: phenotype doesn’t reveal genotype Assuming allele genotypes are distributed independently w.p:  a,  b,  o we determine probabilities for locus genotypes:  a/b =2  a  b ;  a/o =2  a  o ;  b/o =2  b  o  a/a =  a 2 ;  b/b =  b 2 ;  o/o =  o 2 Θ - model parameter set - Θ={  a,  b,  o } X – (hidden) genotype variable - Pr[X=x |Θ] =  x P – (observed) phenotype variable - Pr[P=p |Θ] = Σ (x  p) (  x ) e.g. Pr[P= A |Θ] =  a/a +  a/o =  a 2 +2  a  o Hardy-Weinberg equilibrium

5 5 Example: The ABO locus Given a population phenotype sample: Data = {B,A,B,B,O,A,B,A,O,B, AB} the likelihood of our parameter set Θ={  a,  b,  o } is: A B AB O Maximum of this function yields the MLE  Use EM to obtain this

6 6 EM algorithm Start with some set of parameters- Θ. Iterate until convergence: E-step: calculate expectations of hidden variables implied by data and Θ M-step: For every hidden variable X : Use expectations as statistics to yield MLE Θ’ given Θ Hidden variables – allele genotypes If we knew the count of each allele genotype we could calculate MLE Θ (={  a,  b,  o } ) In the M-step we use the expected counts of allele genotypes (given Θ )

7 7 E-step: E[#(x)] – The expected number of counts of genotype x in the maternal allele of each locus. If the dataset has n phenotypes: p 1 …p n then: #(x)= Σ i (X i =x) By linearity of expectation: E[#(x)]= Σ i ( E[X i =x] ) M-step: EM algorithm – ABO example indicato r hidden genotyp e observed phenotype n

8 8 E-step: compute Pr[X i =x, p i ] E-step calculations hidden genotype (of paternal allele) observed phenotype Pr[X= o, P= AB ] = 0 Pr[X= a, P= AB ] =  a  b Pr[X= b, P= AB ] =  b  a Pr[X= o, P= O ] =  o 2 Pr[X= a, P= O ] = 0 Pr[X= b, P= O ] = 0 Pr[X= o, P= A ] =  o  a Pr[X= a, P= A ] =  a (  a +  o ) Pr[X= b, P= A ] = 0 Pr[X= o, P= B ] =  o  b Pr[X= a, P= B ] = 0 Pr[X= b, P= B ] =  b (  b +  o ) 0½½0½½ 0 0 100100 Pr[X i =x | p i ]

9 9 Data type #people A 100 B 200 AB 50 O 50  = {  a,  b,  o } the parameter set we need to estimate Our initial guess is  0 = { 0.2, 0.2, 0.6} EM algorithm – ABO example

10 10  0 = {0.2, 0.2, 0.6} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (1 st iteration): A B AB O

11 11  0 = {0.2, 0.2, 0.6} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (1 st iteration): A B AB O 400 M-step (1 st iteration):

12 12  1 = {0.205, 0.348, 0.447} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (2 nd iteration): A B AB O 400 M-step (2 st iteration):

13 13 EM algorithm – ABO example E-step + M-step : General update formula: Data type #people A n A B n B AB n AB O n O

14 14 EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 0.20 0.38 0.42  a,  b,  o Learning iteration

15 15 EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 0.20 0.38 0.42  a,  b,  o Learning iteration good convergence

16 16 Gene Counting Current formulation: hidden variables corresponds to single allele genotype Gene-counting: hidden variables corresponds to whole locus genotype If we know the number of locus genotypes: n a/a, n a/o, n a/b, n b/b, n b/o, n o/o, we can estimated all parameters: Instead, we estimate the number of such counts given some initial . n AB nOnO

17 17 E-step: compute Pr[X i =x, p i ] E-step calculations hidden genotyp e observed phenotype Pr[X= a/b, P= AB ] = 2  a  b Pr[X= o/o, P= O ] =  o 2 Pr[X= a/o, P= A ] = 2  o  a Pr[X= a/a, P= A ] =  a 2 Pr[X= b/o, P= B ] = 2  o  b Pr[X= b/b, P= B ] =  b 2 11 Pr[X i =x | p i ]

18 18 Gene Counting EM algorithm for ABO: E-step: M-step: Same as slides 13


Download ppt ". Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger."

Similar presentations


Ads by Google