A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and Yuanqing Ye Department of Epidemiology and Public Health Yale University Presented at Workshop on Genomics, NUS November 14, 2005
2 Outline Data structure TDT, Q-TDT, S-TDT, etc. O-TDT for ordinal traits Simulations Data analysis Discussion and conclusion
November 14, Data Structure … … n families
November 14, Linkage Analysis – Null Hypothesis To test for linkage, the null hypothesis is that the marker locus is not linked to any trait locus. Marker Trait locus LinkedUnlinked
November 14, Linkage Analysis - Recombination Fraction Marker Trait locus
November 14, Coefficient of Linkage Disequilibrium Marker Trait locus Allele FrequencyHaplotype Frequency? Freq(, ) - Freq( )Freq( )=
November 14, Null Hypothesis – Linkage Disequilibrium TDT is to test for linkage in presence of association or test for association in presence of linkage (Spielman et al. 1993; Ewens and Spielman 1995). The null hypothesis of haplotype relative risk (Falk and Rubinstein, 1987) being 1 is:
November 14, Transmission/Disequilibrium Test (TDT) Eliminate the confounding effects caused by population stratification/admixture, and other factors A McNemar’s test
November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Father Mother Aa x x x x
November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Father Aa x Nontrans A a Trans A a 0 1 0
November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Mother Aa x Nontrans A a Trans A a 0 1 0
November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Nontrans A a Trans A a Aa x x
November 14, TDT-McNemar Test Nontransmitted TransmittedAaTotal A a Combinations of Transmitted and Nontransmitted Marker Alleles A and a among 2n Parents of n Affected Children
November 14, Further Developments Q-TDT proposed by Allison (1997) Q-TDT further investigated by Rabinowitz (1997) S-TDT (Spielman and Ewens 1998) FBAT (Lunetta et al. 2000; Rabinowitz and Laird 2000) Many other extensions
November 14, General Test Statistic Assume that there are n nuclear families. In the family, there are siblings, i=1,…, n. For the child in the family, the trait value is and the genotype is. is the number of allele A in the genotype. The linkage/association test statistic can be constructed as follows: where is a weight of the phenotype.
November 14, Example For a sample of affected child-parent triads, let then is the TDT introduced by Spielman et al. (1993). For a sample of nuclear families with quantitative trait values, let, where is the average of trait values, then is the Q-TDT introduced by Rabinowitz (1997) For ordinal trait?
November 14, TDT for Ordinal Traits Let be the count of children whose trait values greater or less than y and, the test statistic for ordinal traits is Under the null hypothesis, follows.
November 14, Model and Method Di-allelic maker with possible alleles A and a. Assume that there is a trait increasing allele, and we use to denote the wild type allele(s) Consider a trait taking values in ordinal responses 1,…, K.
November 14, Two Common Assumptions The trait and marker loci are closely linked such that, given the family’s genotypes at a trait locus, the family’s phenotypes and marker genotypes are independent; Given disease genotypes, the traits of the family members are conditionally independent.
November 14, Conditional Likelihood The score function
November 14, Score Statistic After plugging in the estimates for the nuisance parameters, the score function under the null hypothesis is, where
November 14, Expectation and Variance Following the idea of Rabinowitz and laird (2000), we can compute or estimate the conditional expectation and the conditional variance given the observed trait values under null hypothesis in the following three cases: (a)both parental marker information is available; (b)only one of parental marker information is available; and (c)none of parental marker information is available.
November 14, Expectation and Variance
November 14, Both Parents Genotyped When both parents’ genotypes are observed, the children’s genotypes are conditionally independent. Parental GenotypesExpectationVariance (AA, AA)20 (AA, Aa)3/21/4 (AA, aa)10 (Aa, Aa)11/2 (Aa, aa)1/21/4 (aa, aa)00
November 14, One Parent Genotyped Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa AA{AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {AA, Aa}1/2 P{AA, Aa}= P{AA, AA}=P(Aa, Aa}= aa{Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {Aa, aa} P{AA, Aa}= P{Aa, Aa}=P{aa, aa}= Aa {AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {AA, Aa}P{AA, AA}= P(Aa, Aa}= P{AA, Aa}=
November 14, One Parent Genotyped (continued) Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa Aa{Aa, aa} P{Aa, Aa}= P(aa, aa}= P{Aa, aa}= {AA,aa} P{AA, AA}=P{aa, aa}= P{AA, Aa}/2= P{Aa, aa}/2 = P{AA, aa}= P{Aa, Aa}= {AA, Aa, aa}
November 14, No Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa {AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {AA, Aa} P{AA, AA}= P(Aa, Aa}= P{AA, Aa}= {Aa, aa} P{Aa, Aa}= P(aa, aa}= P{Aa, aa}= {AA,aa} P{AA,AA}=P{aa,aa}=P{AA,Aa}/2= P{Aa, aa}/2= P{AA, aa}= P{Aa, Aa}= {AA, Aa, aa}
November 14, Simulation Studies Assess the type I error of our score test with respect to specific nominal levels (0.05, 0.01, and ) to validate the asymptotic behavior of the test statistic. Compare the power of our test with other test statistics. Choose the ordinal level K=3, 4, or 5.
November 14, Simulation Design Generate the parent’s genotypes for given the haplotype frequencies HaplotypeFrequency AD0.2 Ad0.1 aD0.1 ad0.6
November 14, Simulation Design Given the parental genotypes, generate the offspring genotypes assuming unlinked (null) or linked (1cM, alternative) trait and marker loci Conditional on the trait genotype, use the proportional odds model to generate the ordinal trait. 200 or 400 families are generated
November 14, Three models to generated trait values (a)A proportional odds model is used to generate an ordinal trait; (b)A non-proportional odds model is also used to generate an ordinal trait to assess the robustness of our score test with respect to the proportionality assumption; (c)A Gaussian model is used to generate a quantitative trait to evaluate the performance of O-TDT for the quantitative trait.
November 14, Ordinal Traits Generated from a Proportional Odds Model (a)
November 14, Type I Errors Based on 10,000 Replications (a) #of families K Q-TDTO-TDTTDTQ-TDTO-TDTTDTQ-TDTO-TDTTDT e-0058e-0059e e-0056e-0059e e e e e e e-005
November 14, Figure: Power comparison (a)
November 14, K=3 P(Y 1|dd)=.7 P(Y 2|dd)=.9 P(Y 1|dD)=.3 P(Y 2|dD)=.6 P(Y 1|DD)=.1 P(Y 2|DD)=.5 P(Y=1)=0.478 P(Y=2)=0.260 P(Y=3)=0.262 K=4 P(Y 1|dd)=.7 P(Y 2|dd)=.8 P(Y 3|dd)=.9 P(Y 1|dD)=.3 P(Y 2 |dD)=.5 P(Y 3 |dD)=.7 P(Y 1|DD)=.1 P(Y 2 |DD)=.35 P(Y 3 |DD)=.6 P(Y=1)=0.478 P(Y=2)=0.155 P(Y=3)=0.156 P(Y=4)=0.211 K=5P(Y 1|dd)=.7 P(Y 2|dd)=.77 P(Y 3|dd)=.85 P(Y 4|dd)=.92 P(Y 1 |dD)=.2 P(Y 2 |dD)=.45 P(Y 3 |dD)=.65 P(Y 4 |dD)=.8 P(Y 1 |DD)=.05 P(Y 2 |DD)=.35 P(Y 3 |DD)=.55 P(Y 4 |DD)=.75 P(Y=1)=0.431 P(Y=2)=0.166 P(Y=3)=0.141 P(Y=4)=0.115 P(Y=5)=0.146 Non-Proportional Odds Model (b) Conditional and marginal distribution for ordinal trait
November 14, Type I Errors Based on 10,000 Replications (b) #of families K Q-TDTO-TDTTDTQ-TDTO-TDTTDTQ-TDTO-TDTTDT e e-0057e e-0057e-0055e e-0059e e
November 14, Figure: Power comparison (b)
November 14, Performance for Quantitative Traits (c) Our test can serve as a unified test for any trait. For quantitative trait, the weights in our test are the functions of quantiles. Simulations show that our test is competitive with, but slightly less powerful than Q-TDT.
November 14, Type I Errors for Quantitative Traits Based on 100,000 Replications (c) # of Family Q-TDTO-TDTQ-TDTO-TDTQ-TDTO-TDT e
November 14, Power: Quantitative Trait Data are simulated similarly to the experiments for assessing type I error, except the following. Given the genotype at the trait locus, the quantitative trait follows the normal distribution with mean proportional to the number of the trait increasing allele and unit variance. Namely,
November 14, Figure: Power comparison (c)
November 14, Data (Dr. Ming Li) Identify candidate SNPs through association analysis Nicotine dependence was measured in 313 families with 1,396 subjects. 12 SNPs were genotyped for GPR51 gene (suggested from Framingham Heart Study samples ). One ordinal trait with 8 levels was assessed by Fagerstrom test for nicotine dependence (FTND) FBAT was also used for comparison
November 14, FTND 1. How many cigarettes a day do you usually smoke? (0-3 points) 2. How soon after you wake up do you smoke your first cigarette? (0-3 points) 3. Do you smoke more during the first two hours of the day than during the rest of the day? (0,1) 4.Which cigarette would you most hate to give up? (0,1) 5.Do you find it difficult to refrain from smoking in places where it is forbidden, such as public buildings, on airplanes or at work? (0,1) 6. Do you still smoke even when you are so ill that you are in bed most of the day? (0,1) TOTAL POINTS =
November 14, GPR51 Gene G protein-coupled receptor 51 (on 5q24 on rat genome and 9p22.33 on human genome) Combines with GABA-B1 to form functional GABA-B receptors Inhibits high voltage activated calcium ion channels
November 14, Results SNP IDPBAT-GEEO-TDTQ-TDT PooledAAEAPooledAAEAPooledAAEA rs rs rs rs rs rs rs
November 14, Discussion and Conclusion We propose a score test statistic for Linkage analysis. Although it is derived from a proportional odds model for ordinal traits, power comparisons reveal that it can serve as a unified approach for dichotomous, quantitative, and ordinal traits. The score based Q-TDT test yields lower power than O- TDT for ordinal traits, but the difference ranges from a few to tens of percents, depending on the distribution of the ordinal traits.