. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Slides:



Advertisements
Similar presentations
METHODS FOR HAPLOTYPE RECONSTRUCTION
Advertisements

Learning HMM parameters
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
3 rd Place Winning Project, 2009 USPROC Author: Kinjal Basu Sujayam Saha Sponsor Professor: S. Ghosh A.K. Ghosh Indian Statistical Institute, Kolkata,
EM algorithm and applications. Relative Entropy Let p,q be two probability distributions on the same sample space. The relative entropy between p and.
Basics of Linkage Analysis
Parameter Estimation using likelihood functions Tutorial #1
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
DATA ANALYSIS Module Code: CA660 Lecture Block 2.
Lecture 5: Learning models using EM
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Gaussian Mixture Example: Start After First Iteration.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
EM Algorithm Likelihood, Mixture Models and Clustering.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Tutorial #5 by Ma’ayan Fishelson
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Recitation on EM slides taken from:
. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Genetics of Blood Types
Genetics of Blood Types. Genotypes and Phenotypes Type A, Type B, Type AB and Type O blood are phenotypes. It is not always possible to tell the genotype.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Determination of Human Blood Group
Blood Types Probability
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Tutorial 9 EM and Beta distribution
Classification of unlabeled data:
Chapter Six Normal Curves and Sampling Probability Distributions
Population Genetics: Selection and mutation as mechanisms of evolution
Multiple Alleles.
Multiple Alleles- Blood Typing
Blood Typing (3R).
Blood-typing genetics problems
Exploring Mendelian Genetics 11-3
Blood-typing genetics problems
Exploring Mendelian Genetics 11-3
EM for Inference in MV Data
4/11/ 12 Bell Ringer What does incomplete dominance mean?
Markov Chains Tutorial #5
EM for Inference in MV Data
Genetic linkage analysis
Incomplete and Codominance
Presentation transcript:

. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger

2 Genotype statistics Mendelian Genetics: locus - a particular location on a chromosome (genome) - Each locus has two copies – alleles (one paternal and one maternal) - Each copy has several relevant states - genotypes locus genotype is determined by the combined genotype of both copies. locus genotype yields phenotype (physical features) We wish to estimate the distribution of all possible genotypes. Suppose we randomly sample N individuals and found the number N s,t.  The MLE is given by: Sampling genotypes is costly Sampling phenotypes is cheap

3 Example: The ABO locus ABO locus determines blood-type It has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. They lead to four possible phenotypes: {A, B, AB, O} We wish to estimate the proportion in a population of the 6 genotypes. - Sample genotype – sequence a genomic region - Sample phenotype - checking presence of antibodies (simple blood test) Problem: phenotype doesn’t reveal genotype (in case of A,B)

4 Example: The ABO locus Problem: phenotype doesn’t reveal genotype Assuming allele genotypes are distributed independently w.p:  a,  b,  o we determine probabilities for locus genotypes:  a/b =2  a  b ;  a/o =2  a  o ;  b/o =2  b  o  a/a =  a 2 ;  b/b =  b 2 ;  o/o =  o 2 Θ - model parameter set - Θ={  a,  b,  o } X – (hidden) genotype variable - Pr[X=x |Θ] =  x P – (observed) phenotype variable - Pr[P=p |Θ] = Σ (x  p) (  x ) e.g. Pr[P= A |Θ] =  a/a +  a/o =  a 2 +2  a  o Hardy-Weinberg equilibrium

5 Example: The ABO locus Given a population phenotype sample: Data = {B,A,B,B,O,A,B,A,O,B, AB} the likelihood of our parameter set Θ={  a,  b,  o } is: A B AB O Maximum of this function yields the MLE  Use EM to obtain this

6 EM algorithm Start with some set of parameters- Θ. Iterate until convergence: E-step: calculate expectations of hidden variables implied by data and Θ M-step: For every hidden variable X : Use expectations as statistics to yield MLE Θ’ given Θ Hidden variables – allele genotypes If we knew the count of each allele genotype we could calculate MLE Θ (={  a,  b,  o } ) In the M-step we use the expected counts of allele genotypes (given Θ )

7 E-step: E[#(x)] – The expected number of counts of genotype x in the maternal allele of each locus. If the dataset has n phenotypes: p 1 …p n then: #(x)= Σ i (X i =x) By linearity of expectation: E[#(x)]= Σ i ( E[X i =x] ) M-step: EM algorithm – ABO example indicato r hidden genotyp e observed phenotype n

8 E-step: compute Pr[X i =x, p i ] E-step calculations hidden genotype (of paternal allele) observed phenotype Pr[X= o, P= AB ] = 0 Pr[X= a, P= AB ] =  a  b Pr[X= b, P= AB ] =  b  a Pr[X= o, P= O ] =  o 2 Pr[X= a, P= O ] = 0 Pr[X= b, P= O ] = 0 Pr[X= o, P= A ] =  o  a Pr[X= a, P= A ] =  a (  a +  o ) Pr[X= b, P= A ] = 0 Pr[X= o, P= B ] =  o  b Pr[X= a, P= B ] = 0 Pr[X= b, P= B ] =  b (  b +  o ) 0½½0½½ Pr[X i =x | p i ]

9 Data type #people A 100 B 200 AB 50 O 50  = {  a,  b,  o } the parameter set we need to estimate Our initial guess is  0 = { 0.2, 0.2, 0.6} EM algorithm – ABO example

10  0 = {0.2, 0.2, 0.6} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (1 st iteration): A B AB O

11  0 = {0.2, 0.2, 0.6} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (1 st iteration): A B AB O 400 M-step (1 st iteration):

12  1 = {0.205, 0.348, 0.447} n=400 (data size) EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O 50 E-step (2 nd iteration): A B AB O 400 M-step (2 st iteration):

13 EM algorithm – ABO example E-step + M-step : General update formula: Data type #people A n A B n B AB n AB O n O

14 EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O  a,  b,  o Learning iteration

15 EM algorithm – ABO example Data type #people A 100 B 200 AB 50 O  a,  b,  o Learning iteration good convergence

16 Gene Counting Current formulation: hidden variables corresponds to single allele genotype Gene-counting: hidden variables corresponds to whole locus genotype If we know the number of locus genotypes: n a/a, n a/o, n a/b, n b/b, n b/o, n o/o, we can estimated all parameters: Instead, we estimate the number of such counts given some initial . n AB nOnO

17 E-step: compute Pr[X i =x, p i ] E-step calculations hidden genotyp e observed phenotype Pr[X= a/b, P= AB ] = 2  a  b Pr[X= o/o, P= O ] =  o 2 Pr[X= a/o, P= A ] = 2  o  a Pr[X= a/a, P= A ] =  a 2 Pr[X= b/o, P= B ] = 2  o  b Pr[X= b/b, P= B ] =  b 2 11 Pr[X i =x | p i ]

18 Gene Counting EM algorithm for ABO: E-step: M-step: Same as slides 13