Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population OR.

Slides:



Advertisements
Similar presentations
QTL Mapping in Natural Populations Basic theory for QTL mapping is derived from linkage analysis in controlled crosses There is a group of species in which.
Advertisements

Image Modeling & Segmentation
Frank Wood - Training Products of Experts by Minimizing Contrastive Divergence Geoffrey E. Hinton presented by Frank Wood.
An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
EM Algorithm Jur van den Berg.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
Joint Linkage and Linkage Disequilibrium Mapping
DATA ANALYSIS Module Code: CA660 Lecture Block 6: Alternative estimation methods and their implementation.
Visual Recognition Tutorial
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
1 QTL mapping in mice, cont. Lecture 11, Statistics 246 February 26, 2004.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Expectation-Maximization
Visual Recognition Tutorial
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
Gene, Allele, Genotype, and Phenotype
EM and expected complete log-likelihood Mixture of Experts
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Functional Mapping of QTL and Recent Developments
Lecture 2: Statistical learning primer for biologists
QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Interval mapping with maximum likelihood Data Files: Marker file – all markers Traits file – all traits Linkage map – built based on markers For example:
Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
- Type of Study Composite Interval Mapping Program - Genetic Design.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
(1) Schedule Mar 15Linkage disequilibrium (LD) mapping Mar 17LD mapping Mar 22Guest speaker, Dr Yang Mar 24Overview Attend ENAR Biometrical meeting in.
Probability Theory and Parameter Estimation I
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Interval Mapping.
Latent Variables, Mixture Models and EM
Expectation-Maximization
More about Posterior Distributions
Mathematical Foundations of BME Reza Shadmehr
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Linkage Disequilibrium Mapping - Natural Population
Mathematical Foundations of BME
EM Algorithm 主講人:虞台文.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Composite Interval Mapping Program
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population OR

Linkage Disequilibrium Mapping - Natural Population Initial value of p11, p10, p01:

Linkage Disequilibrium Mapping - Natural Population

Mixture model-based likelihood Height markers Sample (cm, y) m1 m2 m3 … Linkage disequilibrium mapping – natural population

Association between marker and QTL -Marker, Prob(M)=p, Prob(m)=1-p -QTL, Prob(Q)=q, Prob(q)=1-q Four haplotypes: Prob(MQ)=p 11 =pq+D p=(p 11 +p 10 ) Prob(Mq)=p 10 =p(1-q)-Dq=(p 11 +p 01 ) Prob(mQ)=p 01 =(1-p)q-DD=p 11 p 00 -p 10 p 01 Prob(mq)=p 00 =(1-p)(1-q)+D Estimate p, q, D AND  2,  1,  0 Linkage disequilibrium mapping – natural population

Mixture model-based likelihood L(y,M|  )=  i=1 n [  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )] Sam- Height Marker genotype QTL genotype ple(cm, y) M QQQqqq 1184MM (2)  2|i  1|i  0|i 2185MM (2)  2|i  1|i  0|i 3180Mm (1)  2|i  1|i  0|i 4182Mm (1)  2|i  1|i  0|i 5167Mm (1)  2|i  1|i  0|i 6169Mm (1)  2|i  1|i  0|i 7165mm (0)  2|i  1|i  0|i 8166mm (0)  2|i  1|i  0|i Prior prob. Linkage disequilibrium mapping – natural population

QQQqqqObs MMp p 11 p 10 p 10 2 n 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 mmp p 01 p 00 p 00 2 n 0 MMp p 11 p 10 p 10 2 n 2 p 2 p 2 p 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 2p(1-p)2p(1-p)2p(1-p) mmp p 01 p 00 p 00 2 n 0 (1-p) 2 (1-p) 2 (1-p) 2 Joint and conditional (  j|i ) genotype prob. between marker and QTL

Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(y,M|  ) =  i=1 n [  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )] =  i=1 n2 [  2|2 f 2 (y i ) +  1|2 f 1 (y i ) +  0|2 f 0 (y i )] Conditional on 2 (n 2 )   i=1 n1 [  2|1 f 2 (y i ) +  1|1 f 1 (y i ) +  0|1 f 0 (y i )] Conditional on 1 (n 1 )   i=1 n0 [  2|0 f 2 (y i ) +  1|0 f 1 (y i ) +  0|0 f 0 (y i )] Conditional on 0 (n 0 ) Linkage disequilibrium mapping – natural population

Normal distributions of phenotypic values for each QTL genotype group f 2 (y i ) = 1/(2  2 ) 1/2 exp[-(y i -  2 ) 2 /(2  2 )],  2 =  + a f 1 (y i ) = 1/(2  2 ) 1/2 exp[-(y i -  1 ) 2 /(2  2 )],  1 =  + d f 0 (y i ) = 1/(2  2 ) 1/2 exp[-(y i -  0 ) 2 /(2  2 )],  0 =  - a Linkage disequilibrium mapping – natural population

Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(y,M|  ) =  i=1 n [  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )] log L(y,M|  ) =  i=1 n log[  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )] Define  2|i =  2|i f 1 (y i )/[  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )](1)  1|i =  1|i f 1 (y i )/[  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )](2)  0|i =  0|i f 1 (y i )/[  2|i f 2 (y i ) +  1|i f 1 (y i ) +  0|i f 0 (y i )](3)  2 =  i=1 n (  2|i y i )/  i=1 n  1|i (4)  1 =  i=1 n (  1|i y i )/  i=1 n  1|i (5)  0 =  i=1 n (  0|i y i )/  i=1 n  0|i (6)  2 = 1/n  i=1 n [  1|i (y i -  1 ) 2 +  0|i (y i -  0 ) 2 ](7) Linkage disequilibrium mapping – natural population

Incomplete (observed) data Posterior prob QQQqqqObs MM  2|2i  1|2i  0|2i n 2 Mm  2|1i  1|1i  0|1i n 1 mm  2|0i  1|0i  0|0i n 0 p 11 =1/2n{  i=1 n2 [2  2|2i +  1|2i ]+  i=1 n1 [  2|1i +  1|1i ],(8) p 10 =1/2n{  i=1 n2 [2  0|2i +  1|2i ]+  i=1 n1 [  0|1i +(1-  )  1|1i ],(9) p 01 =1/2n{  i=1 n0 [2  2|0i +  1|0i ]+  i=1 n1 [  2|1i +(1-  )  1|1i ], (10) p 00 =1/2n{  i=1 n2 [2  0|0i +  1|0i ]+  i=1 n1 [  0|1i +  1|1i ] (11)

EM algorithm (1) Give initiate values  (0) =(  2,  1,  0,  2,p 11,p 10,p 01,p 00 ) (0) (2) Calculate  2|i (1),  1|i (1) and  0|i (1) using Eqs. 1-3, (3) Calculate  (1) using  2|i (1),  1|i (1) and  0|i (1) based on Eqs. 4-11, (4) Repeat (2) and (3) until convergence.

Do While (Abs(mu(1) - omu(1)) + Abs(p00 - p00old) > ) kkk = kkk + 1 ‘cumulate the number of iteration p00old = p00 ‘keep old value of p00 prob(1, 1) = p11 ^ 2 / p ^ 2 ‘prior conditional probability  2|2 prob(1, 2) = 2 * p11 * p10 / p ^ 2 ‘  2|1 prob(1, 3) = p10 ^ 2 / p ^ 2 ‘  2|0 prob(2, 1) = 2 * p11 * p01 / (2 * p * q) ‘  1|2 prob(2, 2) = 2 * (p11 * p00 + p10 * p01) / (2 * p * q) ‘  1|1 prob(2, 3) = 2 * p10 * p00 / (2 * p * q) ‘  1|0 prob(3, 1) = p01 ^ 2 / q ^ 2 ‘  0|2 prob(3, 2) = 2 * p01 * p00 / q ^ 2 ‘  0|1 prob(3, 3) = p00 ^ 2 / q ^ 2 ‘  0|0 Given a initial  2,  1,  0,  2, p11, p10, p01, p00 mu(1), mu(2), mu(3), s2 PROGRAM:

For j = 1 To 3 omu(j) = mu(j) : cmu(j) = 0 : cpi(j) = 0 : bpi(j) = 0 For i = 1 To 3 nnn(i, j) = 0 ’3 by 3 matrix to store  2|2,  2|1,  2|0, ….  0|0 Next Next j cs2 = 0 ll = 0 For i = 1 To N sss = 0 For j = 1 To 3 ’ f2(yi), f1(yi), f0(yi) f(j) = 1 / Sqr(2 * * s2) * Exp(-(y(i) - mu(j)) ^ 2 / 2 / s2) sss = sss + prob(datas(i, mrk), j) * f(j) Next j ll = ll + Log(sss) ’calculate log-likelihood For j = 1 To 3 bpi(j) = prob(datas(i, mrk), j) * f(j) / sss ’FORMULA (1-3) cmu(j) = cmu(j) + bpi(j) * datas(i, nmrk) ’ numerator of FORMULA (4-6) cpi(j) = cpi(j) + bpi(j) ’ denominator of FORMULA (4-6) cs2 = cs2 + bpi(j) * (y(i) - mu(j)) ^ 2 ’FORMULA (7) nnn(datas(i, mrk), j) = nnn(datas(i, mrk), j) + bpi(j) ’FORMULA (8-11) Next j Next i ‘[  2|if2(yi) +  1|if1(yi) +  0|if0(yi)]

‘ Update  2,  1,  0 formula (4-6) For j = 1 To 3 mu(j) = cmu(j) / cpi(j) Next j ‘Update  2 formula 7 s2 = cs2 / N ‘Update p11, p10, p01, p00 FORMULA (8-11) phi = p11 * p00 / (p11 * p00 + p10 * p01) p11 = (2 * nnn(1, 1) + nnn(1, 2) + nnn(2, 1) + phi * nnn(2, 2)) / 2 / N p10 = (2 * nnn(1, 3) + nnn(1, 2) + nnn(2, 3) + (1 - phi) * nnn(2, 2)) / 2 / N p01 = (2 * nnn(3, 1) + nnn(2, 1) + nnn(3, 2) + (1 - phi) * nnn(2, 2)) / 2 / N p00 = (2 * nnn(3, 3) + nnn(2, 3) + nnn(3, 2) + phi * nnn(2, 2)) / 2 / N p = p11 + p10 q = 1 - p Loop LR = 2 * (ll - ll0)

Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population Binary Trait OR

Linkage Disequilibrium Mapping - Natural Population Binary Trait Initial value of p11, p10, p01: Initial value of f2, f1, f0:

Linkage Disequilibrium Mapping - Natural Population Binary Trait Initial value of f2, f1, f0:

Linkage Disequilibrium Mapping - Natural Population Binary Trait L(  |y)=  j=0 2  i=0 nj log [  2|ij Pr{y ij =1|G ij =2,  } yij Pr{y ij =0|G ij =2,  } (1-yij) +  1|ij Pr{y ij =1|G ij =1,  } yij Pr{y ij =0|G ij =1,  } (1-yij) +  0|ij Pr{y ij =1|G ij =0,  } yij Pr{y ij =0|G ij =0,  } (1-yij) ] =  j=0 2  i=0 nj log[  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ]  = (p 11, p 10, p 01, p 00, f 2, f 1, f 0 ) (6 parameters)

For j = 1 To 3 omu(j) = mu(j) : cmu(j) = 0 : cpi(j) = 0 : bpi(j) = 0 For i = 1 To 3 nnn(i, j) = 0 ’3 by 3 matrix to store  2|2,  2|1,  2|0, ….  0|0 Next Next j cs2 = 0 ll = 0 For i = 1 To N sss = 0 For j = 1 To 3 ’ f2(yi), f1(yi), f0(yi) f(j) = 1 / Sqr(2 * * s2) * Exp(-(y(i) - mu(j)) ^ 2 / 2 / s2) f(j)=mu(j) ^ datas(i, nmrk) * (1 - mu(j)) ^ (1 - datas(i, nmrk)) sss = sss + prob(datas(i, mrk), j) * f(j) Next j ll = ll + Log(sss) ’calculate log-likelihood For j = 1 To 3 bpi(j) = prob(datas(i, mrk), j) * f(j) / sss ’FORMULA (1-3) cmu(j) = cmu(j) + bpi(j) * datas(i, nmrk) ’ numerator of FORMULA (4-6) cpi(j) = cpi(j) + bpi(j) ’ denominator of FORMULA (4-6) cs2 = cs2 + bpi(j) * (y(i) - mu(j)) ^ 2 ’FORMULA (7) nnn(datas(i, mrk), j) = nnn(datas(i, mrk), j) + bpi(j) ’FORMULA (8-11) Next j Next i