. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.

Slides:



Advertisements
Similar presentations
. Exact Inference in Bayesian Networks Lecture 9.
Advertisements

Expected Value Suppose it costs $2 to get a ticket from the parking ticket machine. Suppose if you are caught without a ticket, the fine is $20. Suppose.
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
EM Algorithm Jur van den Berg.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
EM algorithm and applications. Relative Entropy Let p,q be two probability distributions on the same sample space. The relative entropy between p and.
Parameter Estimation using likelihood functions Tutorial #1
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
… Hidden Markov Models Markov assumption: Transition model:
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Learning Bayesian networks Slides by Nir Friedman.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Estimating m and u Probabilities Using EM Based on Winkler 1988 "Using the EM Algorithm for Weight.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Tutorial #5 by Ma’ayan Fishelson
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Recitation on EM slides taken from:
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Genetics of Blood Types. Genotypes and Phenotypes Type A, Type B, Type AB and Type O blood are phenotypes. It is not always possible to tell the genotype.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Genetic design. Testing Mendelian segregation Consider marker A with two alleles A and a BackcrossF 2 AaaaAAAaaa Observationn 1 n 0 n 2 n 1 n 0 Expected.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Flat clustering approaches
1 Tutorial #9 by Ma’ayan Fishelson. 2 Bucket Elimination Algorithm An algorithm for performing inference in a Bayesian network. Similar algorithms can.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
HS-LS-3 Apply concepts of statistics and probability to support explanations that organisms with an advantageous heritable trait tend to increase in proportion.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Tutorial 9 EM and Beta distribution
Classification of unlabeled data:
Hidden Markov Models Part 2: Algorithms
Blood-typing genetics problems
Learning Bayesian networks
Genetic linkage analysis
Presentation transcript:

. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger

2 Example: The ABO locus A locus is a particular place on the chromosome. Each locus’ state (called genotype) consists of two alleles – one paternal and one maternal. Some loci (plural of locus) determine distinguished features. The ABO locus, for example, determines blood type. Suppose we randomly sampled N individuals and found that N a/a have genotype a/a, N a/b have genotype a/b, etc. Then, the MLE is given by: The ABO locus has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. The first two genotypes determine blood type A, the next two determine blood type B, then blood type AB, and finally blood type O. We wish to estimate the proportion in a population of the 6 genotypes.

3 The ABO locus (Cont.) However, testing individuals for their genotype is a very expensive test. Can we estimate the proportions of genotype using the common cheap blood test with outcome being one of the four blood types (A, B, AB, O) ? The problem is that among individuals measured to have blood type A, we don’t know how many have genotype a/a and how many have genotype a/o. So what can we do ? We use the Hardy-Weinberg equilibrium rule that tells us that in equilibrium the frequencies of the three alleles  a,  b,  o in the population determine the frequencies of the genotypes as follows:  a/b = 2  a  b,  a/o = 2  a  o,  b/o = 2  b  o,  a/a = [  a ] 2,  b/b = [  b ] 2,  o/o = [  o ] 2. So now we have three parameters that we need to estimate.

4 The Likelihood Function Let X be a random variable with 6 values x a/a, x a/o,x b/b, x b/o, x a/b, x o/o denoting the six genotypes. The parameters are  = {  a,  b,  o }. The probability P(X= x a/b |  ) = 2  a  b. The probability P(X= x o/o |  ) =  o  o. And so on for the other four genotypes. What is the probability of Data={B,A,B,B,O,A,B,A,O,B, AB} ? Obtaining the maximum of this function yields the MLE. This can be done by multidimensional Newton’s algorithm.

5 The EM algorithm in Bayes’ nets u E-step u Go over the data: u Sum the expectations of a hidden variables that you get from this data element u M-step u For every hidden variable x u Update your belief according to the expectation you calculated in the last E-step

6 EM - ABO Example a/b/o (hidden) A / B / AB / O (observed) Data type #people A 100 B 200 AB 50 O 50 We choose a “reasonable”  = {0.2,0.2,0.6}  = {  a,  b,  o } is the parameter we need to evaluate

7 EM - ABO Example E-step: M-step: Phenotype (observed blood type) of person i Genotype in one locus (paternal)

8 EM - ABO Example E-step: we compute all the necessary elements

9 EM - ABO Example  0 = {0.2,0.2,0.6} n=400 (data size) E-step (1 st step): Data type #people A 100 B 200 AB 50 O 50

10 EM - ABO Example  0 = {0.2,0.2,0.6} n=400 (data size) Data type #people A 100 B 200 AB 50 O 50 E-step (1 st step):

11 EM - ABO Example  0 = {0.2,0.2,0.6} n=400 (data size) Data type #people A 100 B 200 AB 50 O 50 E-step (1 st step):

12 EM - ABO Example  0 = {0.2,0.2,0.6} n=400 (data size) M-step (1 st step):

13 EM - ABO Example  1 = {0.2,0.35,0.45} E-step (2 nd step):

14 EM - ABO Example M-step (2 nd step):  1 = {0.2,0.35,0.45}

15 EM - ABO Example  2 = {0.21,0.38,0.41} E-step (3 rd step):

16 EM - ABO Example M-step (3 rd step):  2 = {0.21,0.38,0.41} Small enough change

17 Gene Counting Until this point we looked at one of the two alleles from each individual. It is possible to use both alleles, with some modifications in the process. The next two slides are dedicated to this method.

18 Gene Counting Had we known the counts n a/a and n a/o (blood type A individuals), we could have estimated  a from n individuals as follows (and similarly estimate  b and  o ): Can we compute what n a/a and n a/o are expected to be ? Using the current estimates of  a and  o we can as follows: We repeat these two steps until the parameters converge.

19 Gene Counting (example of EM) Input: Counts of each blood type n A, n B, n O, n AB of n people. Desired Output: ML estimate of allele frequencies  a,  b,  o. Initialization: Set  a,  b, and  o to arbitrary values (say, 1/3). Repeat E-step (Expectation): M-step (Maximization): Until  a,  b, and  o converge