1 Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous variation oBinary, e.g., presence (1) or absence (0) of a disease oMultiple outcomes, e.g., none, moderate or severe disease Special topic for Rebecca and Amy’s project

2 Consider a nature population One marker with two alleles M and m, Prob(M)=p, Prob(m)=1-p One QTL (affecting a binary trait) with two alleles A and a, Prob(A)=q, Prob(a)=1-q Four haplotypes: Prob(MQ)=p 11 =pq+D p=p 11 +p 10 Prob(Mq)=p 10 =p(1-q)-Dq=p 11 +p 01 Prob(mQ)=p 01 =(1-p)q-DD=p 11 p 00 -p 10 p 01 Prob(mq)=p 00 =(1-p)(1-q)+D D is the linkage disequilibrium between the marker and underlying QTL

3 Data structure SampleBinary (y i )Marker (j) 11MM (2) 21Mm (1) 31Mm (1) 41mm (0) 50MM (2) 60Mm (1) 70Mm (1) 80mm (0)

4 Arrange the data in a 2 x 3 contingency table Marker genotype 210 Affected (1)n 12 n 11 n 10 n 1. Normal (0)n 02 n 01 n 00 n 0. n. 2 n. 1 n. 0 n Affected (1)g 12 g 11 g 10 g 1. Normal (0)g 02 g 01 g 00 g 0. g. 2 g. 1 g. 0 1

5 Independence test  2 df=2 =  l=0 1  j=0 2 (n lj - m lj ) 2 /m lj = n  l=0 1  j=0 2 (g li - g l.g. j ) 2 /(g l.g. j ) where m lj is the expected value of n lj, m lj =ng l.g.j. H0: g li = g l.g. j H1: g li  g l.g. j Under H0,  2 df=2 is central chi 2 -distributed for a large sample size n, with df = (2-1)x(3-1) =2 If H0 is rejected, there is a significant D

6 Regression analysis Marker ModelQTL model SampleBinary (y ij )Marker(j) #M(T ij )There is 2 A’s 11MM (2)2  2|2 =p 11 2 21Mm (1)1  2|1 =2p 11 p 01 31Mm (1)1  2|1 =2p 11 p 01 41mm (0)0  2|0 =p 01 2 50MM (2)2  2|2 =p 11 2 60Mm (1)1  2|1 =2p 11 p 01 70Mm (1)1  2|1 =2p 11 p 01 80mm (0)0  2|0 =p 01 2 p 11 =pq+D, p 01 =(1-p)q-D

7 AA (2)Aa (1)aa (0)Obs MMp 11 2 2p 11 p 10 p 10 2 n 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 mmp 01 2 2p 01 p 00 p 00 2 n 0 MMp 11 2 2p 11 p 10 p 10 2 n 2 p 2 p 2 p 2 Mm2p 11 p 01 2(p 11 p 00 +p 10 p 01 )2p 10 p 00 n 1 2p(1-p)2p(1-p)2p(1-p) mmp 01 2 2p 01 p 00 p 00 2 n 0 (1-p) 2 (1-p) 2 (1-p) 2 Joint and conditional (  k|ij ) genotype prob. between marker and QTL

8 Statistical models Marker Model y ij = a + bT ij +  ij The least squares approach can be used to estimate a and b. The size of b reflects the marker effect, confounded by the QTL effect and marker-QTL LD

9 The phenotype of sample i can be within marker genotype group j is modeled by y ij = 1 If z ij   0If z ij <  where  is the threshold for the underlying liability of the trait z, which is formulated as z ij =  ik  k + e ij  k = the genotypic value of QTL k  ik = the (1/0) indicator variable for sample i e ij = normally distributed residual variable with mean 0 and variance 1

10 The conditional probability of y ij = 1 given sample i’s QTL genotype (say G ij =k) is obtained by f k = Pr(y ij =1|G ij =k,  ) = Pr(z ij   |G ij =k,  ) = 1 – Pr(z ij <  |G ij =k,  ) = 1 – 1/(2  )  -   exp[-(z-  k ) 2 /2]dz f k is called the penetrance of QTL genotype k

11 F-values as a function of q and D Landscape F q D

12 Maximum likelihood analysis: Mixture model L(  |y)=  j=0 2  i=0 nj log [  2|ij Pr{y ij =1|G ij =2,  } yij Pr{y ij =0|G ij =2,  } (1-yij) +  1|ij Pr{y ij =1|G ij =1,  } yij Pr{y ij =0|G ij =1,  } (1-yij) +  0|ij Pr{y ij =1|G ij =0,  } yij Pr{y ij =0|G ij =0,  } (1-yij) ] =  j=0 2  i=0 nj log[  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ]  = (p 11, p 10, p 01, p 00, f 2, f 1, f 0 ) (6 parameters)

13 EM algorithm Define  2|ij =  2|ij f 2 yij (1-f 2 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (1)  1|ij =  1|ij f 1 yij (1-f 1 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (2)  0|ij =  0|ij f 0 yij (1-f 0 ) (1-yij) [  2|ij f 2 yij (1-f 2 ) (1-yij) +  1|ij f 1 yij (1-f 1 ) (1-yij) +  0|ij f 0 yij (1-f 0 ) (1-yij) ] (3) as the posterior probabilities of QTL genotypes given marker genotypes for sample i

14 Population genetic parameters Posterior prob AAAaaaObs MM  2|2i  1|2i  0|2i n. 2 Mm  2|1i  1|1i  0|1i n. 1 mm  2|0i  1|0i  0|0i n. 0 p 11 =1/2n{  i=1 n.2 [2  2|2i +  1|2i ]+  i=1 n.1 [  2|1i +  1|1i ](4) p 10 =1/2n{  i=1 n.2 [2  0|2i +  1|2i ]+  i=1 n.1 [  0|1i +(1-  )  1|1i ](5) p 01 =1/2n{  i=1 n.0 [2  2|0i +  1|0i ]+  i=1 n.1 [  2|1i +(1-  )  1|1i ](6) p 00 =1/2n{  i=1 n.2 [2  0|0i +  1|0i ]+  i=1 n.1 [  0|1i +  1|1i ] (7)

15 Quantitative genetic parameters  j=0 2  i=0 nj (  2|ij y ij ) f 2 = (8)  j=0 2  i=0 nj  2|ij  j=0 2  i=0 nj (  1|ij y ij ) f 1 = (9)  j=0 2  i=0 nj  1|ij  j=0 2  i=0 nj (  0|ij y ij ) f 0 = (10)  j=0 2  i=0 nj  0|ij

16 EM algorithm (1) Give initiate values  (0) =(p 11,p 10,p 01,p 00,f 2,f 1,f 0 ) (0) (2) Calculate  2|ij (1),  1|ij (1) and  0|ij (1) using Eqs. 1- 3, (3) Calculate  (1) using  2|ij (1),  1|ij (1) and  0|ij (1) based on Eqs. 4-10, (4) Repeat (2) and (3) until convergence.

17 Three genotypic values  2 =  + a for AA  1 =  + dfor Aa  0 =  - afor aa With the MLEs of  k, we can estimate , a and d.

18 How to estimate  k ? f 2 = 1 – 1/(2  )  -   exp[-(z-  2 ) 2 /2]dz f 1 = 1 – 1/(2  )  -   exp[-(z-  1 ) 2 /2]dz f 0 = 1 – 1/(2  )  -   exp[-(z-  0 ) 2 /2]dz We can use numerical approaches to estimate  2,  1 and  0

19 Hypothesis test H0: f 2 = f 1 = f 0 H1: at least one equality does not hold LR = -2[logL(  0 |y,M,D) - logL(  1 |y,M,D)] for interval [max{-p(1-q),-(1-p)q}, min{pq, (1-p)(1-q)}] of D.  0 = MLE under H0  1 = MLE under H1

20 LR as a function of D Profile D min{p(1-q),(1-p)q}max{pq.(1-p)(1-q)}

21 Dr Ma will write the program.

