Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.

Slides:



Advertisements
Similar presentations
Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.
Advertisements

QTL Mapping in Natural Populations Basic theory for QTL mapping is derived from linkage analysis in controlled crosses There is a group of species in which.
Association Tests for Rare Variants Using Sequence Data
Hypothesis Testing Steps in Hypothesis Testing:
Logistic Regression.
Put Markers and Trait Data into box below Linkage Disequilibrium Mapping - Natural Population OR.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Joint Linkage and Linkage Disequilibrium Mapping
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Statistical association of genotype and phenotype.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Confidence intervals. Population mean Assumption: sample from normal distribution.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Lesson #23 Analysis of Variance. In Analysis of Variance (ANOVA), we have: H 0 :  1 =  2 =  3 = … =  k H 1 : at least one  i does not equal the others.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Gene, Allele, Genotype, and Phenotype
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Quantitative Genetics
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 15: Linkage Analysis VII
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
QTL Mapping Quantitative Trait Loci (QTL): A chromosomal segments that contribute to variation in a quantitative phenotype.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
- Type of Study Composite Interval Mapping Program - Genetic Design.
Review of statistical modeling and probability theory Alan Moses ML4bio.
24.1 Quantitative Characteristics Vary Continuously and Many Are Influenced by Alleles at Multiple Loci The Relationship Between Genotype and Phenotype.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Chi square and Hardy-Weinberg
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
(1) Schedule Mar 15Linkage disequilibrium (LD) mapping Mar 17LD mapping Mar 22Guest speaker, Dr Yang Mar 24Overview Attend ENAR Biometrical meeting in.
Power in QTL linkage analysis
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
ChiMerge Discretization
Inference about the slope parameter and correlation
HGEN Thanks to Fruhling Rijsdijk
Chapter 13 Nonlinear and Multiple Regression
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Confidence Intervals and Hypothesis Tests for Variances for One Sample
Genome Wide Association Studies using SNP
Hypothesis Tests: One Sample
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
POINT ESTIMATOR OF PARAMETERS
The Chi Square Test A statistical method used to determine goodness of fit Goodness of fit refers to how close the observed data are to those predicted.
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
Linkage Disequilibrium Mapping - Natural Population
Power Calculation for QTL Association
Logistic Regression.
Statistical Inference for the Mean: t-test
Presentation transcript:

Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous variation oBinary, e.g., presence (1) or absence (0) of a disease oMultiple outcomes, e.g., none, moderate or severe disease Special topic for Rebecca and Amy’s project

Consider a nature population One marker with two alleles M and m, Prob(M)=p, Prob(m)=1-p One QTL (affecting a binary trait) with two alleles A and a, Prob(A)=q, Prob(a)=1-q Four haplotypes: Prob(MQ)=p 11 =pq+D p=p 11 +p 10 Prob(Mq)=p 10 =p(1-q)-Dq=p 11 +p 01 Prob(mQ)=p 01 =(1-p)q-DD=p 11 p 00 -p 10 p 01 Prob(mq)=p 00 =(1-p)(1-q)+D D is the linkage disequilibrium between the marker and underlying QTL

Data structure SampleBinary (y i )Marker (j) 11MM (2) 21Mm (1) 31Mm (1) 41mm (0) 50MM (2) 60Mm (1) 70Mm (1) 80mm (0)

Arrange the data in a 2 x 3 contingency table Marker genotype 210 Affected (1)n 12 n 11 n 10 n 1. Normal (0)n 02 n 01 n 00 n 0. n. 2 n. 1 n. 0 n Affected (1)g 12 g 11 g 10 g 1. Normal (0)g 02 g 01 g 00 g 0. g. 2 g. 1 g. 0 1

Independence test  2 df=2 =  l=0 1  j=0 2 (n lj - m lj ) 2 /m lj = n  l=0 1  j=0 2 (g li - g l.g. j ) 2 /(g l.g. j ) where m lj is the expected value of n lj, m lj =ng l.g.j. H0: g li = g l.g. j H1: g li  g l.g. j Under H0,  2 df=2 is central chi 2 -distributed for a large sample size n, with df = (2-1)x(3-1) =2 If H0 is rejected, there is a significant D

Regression analysis Marker ModelQTL model SampleBinary (y ij )Marker(j) #M(T ij )There is 2 A’s 11MM (2)2  2|2 =p Mm (1)1  2|1 =2p 11 p 01 31Mm (1)1  2|1 =2p 11 p 01 41mm (0)0  2|0 =p MM (2)2  2|2 =p Mm (1)1  2|1 =2p 11 p 01 70Mm (1)1  2|1 =2p 11 p 01 80mm (0)0  2|0 =p 01 2 p 11 =pq+D, p 01 =(1-p)q-D

AA (2)Aa (1)aa (0)Obs MMp p 11 p 10 p 10 2 n 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 mmp p 01 p 00 p 00 2 n 0 MMp p 11 p 10 p 10 2 n 2 p 2 p 2 p 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 2p(1-p)2p(1-p)2p(1-p) mmp p 01 p 00 p 00 2 n 0 (1-p) 2 (1-p) 2 (1-p) 2 Joint and conditional (  k|ij ) genotype prob. between marker and QTL

Statistical models Marker Model y ij = a + bT ij +  ij The least squares approach can be used to estimate a and b. The size of b reflects the marker effect, confounded by the QTL effect and marker-QTL LD

The phenotype of sample i can be within marker genotype group j is modeled by y ij = 1 If z ij   0If z ij <  where  is the threshold for the underlying liability of the trait z, which is formulated as z ij =  ik  k + e ij  k = the genotypic value of QTL k  ik = the (1/0) indicator variable for sample i e ij = normally distributed residual variable with mean 0 and variance 1

The conditional probability of y ij = 1 given sample i’s QTL genotype (say G ij =k) is obtained by f k = Pr(y ij =1|G ij =k,  ) = Pr(z ij   |G ij =k,  ) = 1 – Pr(z ij <  |G ij =k,  ) = 1 – 1/(2  )  -   exp[-(z-  k ) 2 /2]dz f k is called the penetrance of QTL genotype k

F-values as a function of q and D Landscape F q D

Maximum likelihood analysis: Mixture model L(  |y)=  j=0 2  i=0 nj log [  2|ij Pr{y ij =1|G ij =2,  } yij Pr{y ij =0|G ij =2,  } (1-yij) +  1|ij Pr{y ij =1|G ij =1,  } yij Pr{y ij =0|G ij =1,  } (1-yij) +  0|ij Pr{y ij =1|G ij =0,  } yij Pr{y ij =0|G ij =0,  } (1-yij) ] =  j=0 2  i=0 nj log[  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ]  = (p 11, p 10, p 01, p 00, f 2, f 1, f 0 ) (6 parameters)

EM algorithm Define  2|ij =  2|ij f 2 yij (1-f 2 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (1)  1|ij =  1|ij f 1 yij (1-f 1 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (2)  0|ij =  0|ij f 0 yij (1-f 0 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (3) as the posterior probabilities of QTL genotypes given marker genotypes for sample i

Population genetic parameters Posterior prob AAAaaaObs MM  2|2i  1|2i  0|2i n. 2 Mm  2|1i  1|1i  0|1i n. 1 mm  2|0i  1|0i  0|0i n. 0 p 11 =1/2n{  i=1 n.2 [2  2|2i +  1|2i ]+  i=1 n.1 [  2|1i +  1|1i ](4) p 10 =1/2n{  i=1 n.2 [2  0|2i +  1|2i ]+  i=1 n.1 [  0|1i +(1-  )  1|1i ](5) p 01 =1/2n{  i=1 n.0 [2  2|0i +  1|0i ]+  i=1 n.1 [  2|1i +(1-  )  1|1i ](6) p 00 =1/2n{  i=1 n.2 [2  0|0i +  1|0i ]+  i=1 n.1 [  0|1i +  1|1i ] (7)

Quantitative genetic parameters  j=0 2  i=0 nj (  2|ij y ij ) f 2 = (8)  j=0 2  i=0 nj  2|ij  j=0 2  i=0 nj (  1|ij y ij ) f 1 = (9)  j=0 2  i=0 nj  1|ij  j=0 2  i=0 nj (  0|ij y ij ) f 0 = (10)  j=0 2  i=0 nj  0|ij

EM algorithm (1) Give initiate values  (0) =(p 11,p 10,p 01,p 00,f 2,f 1,f 0 ) (0) (2) Calculate  2|ij (1),  1|ij (1) and  0|ij (1) using Eqs. 1- 3, (3) Calculate  (1) using  2|ij (1),  1|ij (1) and  0|ij (1) based on Eqs. 4-10, (4) Repeat (2) and (3) until convergence.

Three genotypic values  2 =  + a for AA  1 =  + dfor Aa  0 =  - afor aa With the MLEs of  k, we can estimate , a and d.

How to estimate  k ? f 2 = 1 – 1/(2  )  -   exp[-(z-  2 ) 2 /2]dz f 1 = 1 – 1/(2  )  -   exp[-(z-  1 ) 2 /2]dz f 0 = 1 – 1/(2  )  -   exp[-(z-  0 ) 2 /2]dz We can use numerical approaches to estimate  2,  1 and  0

Hypothesis test H0: f 2 = f 1 = f 0 H1: at least one equality does not hold LR = -2[logL(  0 |y,M,D) - logL(  1 |y,M,D)] for interval [max{-p(1-q),-(1-p)q}, min{pq, (1-p)(1-q)}] of D.  0 = MLE under H0  1 = MLE under H1

LR as a function of D Profile D min{p(1-q),(1-p)q}max{pq.(1-p)(1-q)}

Dr Ma will write the program.