Linkage Disequilibrium Mapping - Natural Population

Slides:



Advertisements
Similar presentations
QTL Mapping in Natural Populations Basic theory for QTL mapping is derived from linkage analysis in controlled crosses There is a group of species in which.
Advertisements

Image Modeling & Segmentation
Frank Wood - Training Products of Experts by Minimizing Contrastive Divergence Geoffrey E. Hinton presented by Frank Wood.
Point Estimation Notes of STAT 6205 by Dr. Fan.
An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
EM Algorithm Jur van den Berg.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Segmentation and Fitting Using Probabilistic Methods
Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population OR.
Joint Linkage and Linkage Disequilibrium Mapping
Visual Recognition Tutorial
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Expectation-Maximization
Visual Recognition Tutorial
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
Gene, Allele, Genotype, and Phenotype
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Human Chromosomes Male Xy X y Female XX X XX Xy Daughter Son.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Functional Mapping of QTL and Recent Developments
Confidence Interval & Unbiased Estimator Review and Foreword.
Lecture 2: Statistical learning primer for biologists
QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Software Designing Interface – user friendly (think about MSoffice) –using menu, form, toolbars … –as simple as possible –using default setting, but allow.
Interval mapping with maximum likelihood Data Files: Marker file – all markers Traits file – all traits Linkage map – built based on markers For example:
Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
- Type of Study Composite Interval Mapping Program - Genetic Design.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
(1) Schedule Mar 15Linkage disequilibrium (LD) mapping Mar 17LD mapping Mar 22Guest speaker, Dr Yang Mar 24Overview Attend ENAR Biometrical meeting in.
Probability Theory and Parameter Estimation I
Classification of unlabeled data:
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
CS 2750: Machine Learning Expectation Maximization
Interval Mapping.
Latent Variables, Mixture Models and EM
Relationship between quantitative trait inheritance and
Expectation-Maximization
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
More about Posterior Distributions
Mathematical Foundations of BME Reza Shadmehr
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Mathematical Foundations of BME
EM Algorithm 主講人:虞台文.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Composite Interval Mapping Program
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Linkage Disequilibrium Mapping - Natural Population Put Markers and Trait Data into box below OR

Linkage Disequilibrium Mapping - Natural Population Initial value of p11, p10, p01:

Linkage Disequilibrium Mapping - Natural Population

Mixture model-based likelihood Linkage disequilibrium mapping – natural population Mixture model-based likelihood Height markers Sample (cm, y) m1 m2 m3 … 1 184 1 1 2 2 185 2 2 0 3 180 0 1 1 4 182 1 2 2 5 167 2 0 1 6 169 1 2 1 7 165 2 1 2 8 166 0 0 0

Association between marker and QTL Linkage disequilibrium mapping – natural population Association between marker and QTL -Marker, Prob(M)=p, Prob(m)=1-p -QTL, Prob(Q)=q, Prob(q)=1-q Four haplotypes: Prob(MQ)=p11=pq+D p=(p11+p10) Prob(Mq)=p10=p(1-q)-D q=(p11+p01) Prob(mQ)=p01=(1-p)q-D D=p11p00-p10p01 Prob(mq)=p00=(1-p)(1-q)+D Estimate p, q, D AND 2 , 1 , 0

Mixture model-based likelihood Linkage disequilibrium mapping – natural population Mixture model-based likelihood L(y,M|)=i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Sam- Height Marker genotype QTL genotype ple (cm, y) M QQ Qq qq 1 184 MM (2) 2|i 1|i 0|i 2 185 MM (2) 2|i 1|i 0|i 3 180 Mm (1) 2|i 1|i 0|i 4 182 Mm (1) 2|i 1|i 0|i 5 167 Mm (1) 2|i 1|i 0|i 6 169 Mm (1) 2|i 1|i 0|i 7 165 mm (0) 2|i 1|i 0|i 8 166 mm (0) 2|i 1|i 0|i Prior prob.

Joint and conditional (j|i) genotype prob. between marker and QTL QQ Qq qq Obs MM p112 2p11p10 p102 n2 Mm 2p11p01 2(p11p00+p10p01) 2p10p00 n1 mm p012 2p01p00 p002 n0 p2 p2 p2 2p(1-p) 2p(1-p) 2p(1-p) (1-p)2 (1-p)2 (1-p)2

= i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Linkage disequilibrium mapping – natural population Conditional probabilities of the QTL genotypes (missing) based on marker genotypes (observed) L(y,M|) = i=1n [2|if2(yi) + 1|if1(yi) + 0|if0(yi)] = i=1n2 [2|2f2(yi) + 1|2f1(yi) + 0|2f0(yi)] Conditional on 2 (n2)  i=1n1 [2|1f2(yi) + 1|1f1(yi) + 0|1f0(yi)] Conditional on 1 (n1)  i=1n0 [2|0f2(yi) + 1|0f1(yi) + 0|0f0(yi)] Conditional on 0 (n0)

Normal distributions of phenotypic values for each QTL genotype group Linkage disequilibrium mapping – natural population Normal distributions of phenotypic values for each QTL genotype group f2(yi) = 1/(22)1/2exp[-(yi-2)2/(22)], 2 =  + a f1(yi) = 1/(22)1/2exp[-(yi-1)2/(22)], 1 =  + d f0(yi) = 1/(22)1/2exp[-(yi-0)2/(22)], 0 =  - a

Linkage disequilibrium mapping – natural population Differentiating L with respect to each unknown parameter, setting derivatives equal zero and solving the log-likelihood equations L(y,M|) = i=1n[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] log L(y,M|) = i=1n log[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] Define 2|i = 2|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (1) 1|i = 1|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (2) 0|i = 0|if1(yi)/[2|if2(yi) + 1|if1(yi) + 0|if0(yi)] (3) 2 = i=1n(2|iyi)/ i=1n1|i (4) 1 = i=1n(1|iyi)/ i=1n1|i (5) 0 = i=1n(0|iyi)/ i=1n0|i (6) 2 = 1/ni=1n[1|i(yi-1)2+0|i(yi-0)2] (7)

Incomplete (observed) data Posterior prob QQ Qq qq Obs MM 2|2i 1|2i 0|2i n2 Mm 2|1i 1|1i 0|1i n1 mm 2|0i 1|0i 0|0i n0 p11=1/2n{i=1n2[22|2i+1|2i]+ i=1n1[2|1i+1|1i], (8) p10=1/2n{i=1n2[20|2i+1|2i]+ i=1n1[0|1i+(1-)1|1i], (9) p01=1/2n{i=1n0[22|0i+1|0i]+ i=1n1[2|1i+(1-)1|1i], (10) p00=1/2n{i=1n2[20|0i+1|0i]+ i=1n1[0|1i+1|1i] (11)

EM algorithm (1) Give initiate values (0) =(2,1,0,2,p11,p10,p01,p00)(0) (2) Calculate 2|i(1), 1|i(1) and 0|i(1) using Eqs. 1-3, (3) Calculate (1) using 2|i(1), 1|i(1) and 0|i(1) based on Eqs. 4-11, (4) Repeat (2) and (3) until convergence.

PROGRAM: Given a initial 2, 1, 0, 2, p11, p10, p01, p00 mu(1), mu(2), mu(3), s2 Do While (Abs(mu(1) - omu(1)) + Abs(p00 - p00old) > 0.00001) kkk = kkk + 1 ‘cumulate the number of iteration p00old = p00 ‘keep old value of p00 prob(1, 1) = p11 ^ 2 / p ^ 2 ‘prior conditional probability 2|2 prob(1, 2) = 2 * p11 * p10 / p ^ 2 ‘ 2|1 prob(1, 3) = p10 ^ 2 / p ^ 2 ‘ 2|0 prob(2, 1) = 2 * p11 * p01 / (2 * p * q) ‘ 1|2 prob(2, 2) = 2 * (p11 * p00 + p10 * p01) / (2 * p * q) ‘ 1|1 prob(2, 3) = 2 * p10 * p00 / (2 * p * q) ‘ 1|0 prob(3, 1) = p01 ^ 2 / q ^ 2 ‘ 0|2 prob(3, 2) = 2 * p01 * p00 / q ^ 2 ‘ 0|1 prob(3, 3) = p00 ^ 2 / q ^ 2 ‘ 0|0

For j = 1 To 3 omu(j) = mu(j) : cmu(j) = 0 : cpi(j) = 0 : bpi(j) = 0 For i = 1 To 3 nnn(i, j) = 0 ’3 by 3 matrix to store 2|2, 2|1, 2|0, …. 0|0 Next Next j cs2 = 0 ll = 0 For i = 1 To N sss = 0 For j = 1 To 3 ’ f2(yi), f1(yi), f0(yi) f(j) = 1 / Sqr(2 * 3.1415926 * s2) * Exp(-(y(i) - mu(j)) ^ 2 / 2 / s2) sss = sss + prob(datas(i, mrk), j) * f(j) ll = ll + Log(sss) ’calculate log-likelihood bpi(j) = prob(datas(i, mrk), j) * f(j) / sss ’FORMULA (1-3) cmu(j) = cmu(j) + bpi(j) * datas(i, nmrk) ’ numerator of FORMULA (4-6) cpi(j) = cpi(j) + bpi(j) ’ denominator of FORMULA (4-6) cs2 = cs2 + bpi(j) * (y(i) - mu(j)) ^ 2 ’FORMULA (7) nnn(datas(i, mrk), j) = nnn(datas(i, mrk), j) + bpi(j) ’FORMULA (8-11) Next i ‘[2|if2(yi) + 1|if1(yi) + 0|if0(yi)]

‘ Update 2, 1, 0 formula (4-6) For j = 1 To 3 mu(j) = cmu(j) / cpi(j) Next j ‘Update 2 formula 7 s2 = cs2 / N ‘Update p11, p10, p01, p00 FORMULA (8-11) phi = p11 * p00 / (p11 * p00 + p10 * p01) p11 = (2 * nnn(1, 1) + nnn(1, 2) + nnn(2, 1) + phi * nnn(2, 2)) / 2 / N p10 = (2 * nnn(1, 3) + nnn(1, 2) + nnn(2, 3) + (1 - phi) * nnn(2, 2)) / 2 / N p01 = (2 * nnn(3, 1) + nnn(2, 1) + nnn(3, 2) + (1 - phi) * nnn(2, 2)) / 2 / N p00 = (2 * nnn(3, 3) + nnn(2, 3) + nnn(3, 2) + phi * nnn(2, 2)) / 2 / N p = p11 + p10 q = 1 - p Loop LR = 2 * (ll - ll0)

Linkage Disequilibrium Mapping - Natural Population Binary Trait Put Markers and Trait Data into box below OR

Linkage Disequilibrium Mapping - Natural Population Binary Trait Initial value of p11, p10, p01: Initial value of f2, f1, f0:

Linkage Disequilibrium Mapping - Natural Population Binary Trait Initial value of f2, f1, f0:

Linkage Disequilibrium Mapping - Natural Population Binary Trait For Marker:

Linkage Disequilibrium Mapping - Natural Population Binary Trait L(|y)=j=02i=0nj log [2|ijPr{yij=1|Gij=2,}yij Pr{yij=0|Gij=2,}(1-yij) +1|ijPr{yij=1|Gij=1,}yij Pr{yij=0|Gij=1,}(1-yij) +0|ijPr{yij=1|Gij=0,}yij Pr{yij=0|Gij=0,}(1-yij)] =j=02i=0nj log[2|ijf2yij(1-f2)(1-yij)+1|ijf1yij(1-f1)(1-yij)+0|ijf0yij(1-f0)(1-yij)]  = (p11, p10, p01, p00, f2, f1, f0) (6 parameters)

For j = 1 To 3 omu(j) = mu(j) : cmu(j) = 0 : cpi(j) = 0 : bpi(j) = 0 For i = 1 To 3 nnn(i, j) = 0 ’3 by 3 matrix to store 2|2, 2|1, 2|0, …. 0|0 Next Next j cs2 = 0 ll = 0 For i = 1 To N sss = 0 For j = 1 To 3 ’ f2(yi), f1(yi), f0(yi) f(j) = 1 / Sqr(2 * 3.1415926 * s2) * Exp(-(y(i) - mu(j)) ^ 2 / 2 / s2) f(j)=mu(j) ^ datas(i, nmrk) * (1 - mu(j)) ^ (1 - datas(i, nmrk)) sss = sss + prob(datas(i, mrk), j) * f(j) ll = ll + Log(sss) ’calculate log-likelihood bpi(j) = prob(datas(i, mrk), j) * f(j) / sss ’FORMULA (1-3) cmu(j) = cmu(j) + bpi(j) * datas(i, nmrk) ’ numerator of FORMULA (4-6) cpi(j) = cpi(j) + bpi(j) ’ denominator of FORMULA (4-6) cs2 = cs2 + bpi(j) * (y(i) - mu(j)) ^ 2 ’FORMULA (7) nnn(datas(i, mrk), j) = nnn(datas(i, mrk), j) + bpi(j) ’FORMULA (8-11) Next i