A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.

Slides:

Advertisements

Similar presentations

Confounding from Cryptic Relatedness in Association Studies Benjamin F. Voight (work jointly with JK Pritchard)

Advertisements

Generalized Regional Admixture Mapping (RAM) and Structured Association Testing (SAT) David T. Redden, Associate Professor, Department of Biostatistics,

Mapping genes with LOD score method

Association Tests for Rare Variants Using Sequence Data

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh

Chapter 6: Quantitative traits, breeding value and heritability Quantitative traits Phenotypic and genotypic values Breeding value Dominance deviation.

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.

Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.

METHODS FOR HAPLOTYPE RECONSTRUCTION

Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.

Basics of Linkage Analysis

. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.

Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.

GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科台北榮民總醫院教學研究部.

Human Genetics Genetic Epidemiology.

Joint Linkage and Linkage Disequilibrium Mapping

Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.

Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.

1 How many genes? Mapping mouse traits, cont. Lecture 2B, Statistics 246 January 22, 2004.

Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.

More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.

Lecture 5 Artificial Selection R = h 2 S. Applications of Artificial Selection Applications in agriculture and forestry Creation of model systems of human.

MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.

Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.

Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.

Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.

Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.

Robust and powerful sibpair test for rare variant association

Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003

Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.

Candidate Gene Studies in Substance-Dependent Adolescents, their Siblings, and Controls S. E. Young, A. Smolen, M. C. Stallings, R. P. Corley, T. J. Crowley.

Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )

Family-Based Association Tests

Lecture 5: Major Genes, Polygenes, and QTLs

Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.

Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.

Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.

Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.

Quantitative Genetics

Rank-Sum Tests for Clustered Data Somnath Datta University of Georgia Athens, GA Joint work with Glen A. Satten, Centers.

Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.

1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.

Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.

Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.

Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.

Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.

Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,

Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.

Gene mapping by association 3/4/04 Biomath/HG 207B/Biostat 237.

Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.

Association Mapping in Families Gonçalo Abecasis University of Oxford.

Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.

Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.

Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.

Recombination (Crossing Over)

Regression-based linkage analysis

Power to detect QTL Association

The ‘V’ in the Tajima D equation is:

Lecture 9: QTL Mapping II: Outbred Populations

Linkage Analysis Problems

Genetic linkage analysis

Association Design Begins with KNOWN polymorphism theoretically expected to be associated with the trait (e.g., DRD2 and schizophrenia). Genotypes.

Jung-Ying Tzeng, Daowen Zhang The American Journal of Human Genetics

Presentation transcript:

A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and Yuanqing Ye Department of Epidemiology and Public Health Yale University Presented at Workshop on Genomics, NUS November 14, 2005

2 Outline Data structure TDT, Q-TDT, S-TDT, etc. O-TDT for ordinal traits Simulations Data analysis Discussion and conclusion

November 14, Data Structure … … n families

November 14, Linkage Analysis – Null Hypothesis To test for linkage, the null hypothesis is that the marker locus is not linked to any trait locus. Marker Trait locus LinkedUnlinked

November 14, Linkage Analysis - Recombination Fraction Marker Trait locus

November 14, Coefficient of Linkage Disequilibrium Marker Trait locus Allele FrequencyHaplotype Frequency? Freq(, ) - Freq( )Freq( )=

November 14, Null Hypothesis – Linkage Disequilibrium TDT is to test for linkage in presence of association or test for association in presence of linkage (Spielman et al. 1993; Ewens and Spielman 1995). The null hypothesis of haplotype relative risk (Falk and Rubinstein, 1987) being 1 is:

November 14, Transmission/Disequilibrium Test (TDT) Eliminate the confounding effects caused by population stratification/admixture, and other factors A McNemar’s test

November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Father Mother Aa x x x x

November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Father Aa x Nontrans A a Trans A a 0 1 0

November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Mother Aa x Nontrans A a Trans A a 0 1 0

November 14, TDT-McNemar Test Suppose two heterozygous parents and an affected child are genotyped. AAaa FatherMother marker Trans Nontrans Nontrans A a Trans A a Aa x x

November 14, TDT-McNemar Test Nontransmitted TransmittedAaTotal A a Combinations of Transmitted and Nontransmitted Marker Alleles A and a among 2n Parents of n Affected Children

November 14, Further Developments Q-TDT proposed by Allison (1997) Q-TDT further investigated by Rabinowitz (1997) S-TDT (Spielman and Ewens 1998) FBAT (Lunetta et al. 2000; Rabinowitz and Laird 2000) Many other extensions

November 14, General Test Statistic Assume that there are n nuclear families. In the family, there are siblings, i=1,…, n. For the child in the family, the trait value is and the genotype is. is the number of allele A in the genotype. The linkage/association test statistic can be constructed as follows: where is a weight of the phenotype.

November 14, Example For a sample of affected child-parent triads, let then is the TDT introduced by Spielman et al. (1993). For a sample of nuclear families with quantitative trait values, let, where is the average of trait values, then is the Q-TDT introduced by Rabinowitz (1997) For ordinal trait?

November 14, TDT for Ordinal Traits Let be the count of children whose trait values greater or less than y and, the test statistic for ordinal traits is Under the null hypothesis, follows.

November 14, Model and Method Di-allelic maker with possible alleles A and a. Assume that there is a trait increasing allele, and we use to denote the wild type allele(s) Consider a trait taking values in ordinal responses 1,…, K.

November 14, Two Common Assumptions The trait and marker loci are closely linked such that, given the family’s genotypes at a trait locus, the family’s phenotypes and marker genotypes are independent; Given disease genotypes, the traits of the family members are conditionally independent.

November 14, Conditional Likelihood The score function

November 14, Score Statistic After plugging in the estimates for the nuisance parameters, the score function under the null hypothesis is, where

November 14, Expectation and Variance Following the idea of Rabinowitz and laird (2000), we can compute or estimate the conditional expectation and the conditional variance given the observed trait values under null hypothesis in the following three cases: (a)both parental marker information is available; (b)only one of parental marker information is available; and (c)none of parental marker information is available.

November 14, Expectation and Variance

November 14, Both Parents Genotyped When both parents’ genotypes are observed, the children’s genotypes are conditionally independent. Parental GenotypesExpectationVariance (AA, AA)20 (AA, Aa)3/21/4 (AA, aa)10 (Aa, Aa)11/2 (Aa, aa)1/21/4 (aa, aa)00

November 14, One Parent Genotyped Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa AA{AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {AA, Aa}1/2 P{AA, Aa}= P{AA, AA}=P(Aa, Aa}= aa{Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {Aa, aa} P{AA, Aa}= P{Aa, Aa}=P{aa, aa}= Aa {AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {AA, Aa}P{AA, AA}= P(Aa, Aa}= P{AA, Aa}=

November 14, One Parent Genotyped (continued) Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa Aa{Aa, aa} P{Aa, Aa}= P(aa, aa}= P{Aa, aa}= {AA,aa} P{AA, AA}=P{aa, aa}= P{AA, Aa}/2= P{Aa, aa}/2 = P{AA, aa}= P{Aa, Aa}= {AA, Aa, aa}

November 14, No Parental Genotype Children’s Possible Genotypes Cond. Probability Joint Conditional Genotype Distribution of Two Sibs AAAaaa {AA}1P{AA, AA}=1 {Aa}1P{Aa, Aa}=1 {aa}1P{aa, aa}=1 {AA, Aa} P{AA, AA}= P(Aa, Aa}= P{AA, Aa}= {Aa, aa} P{Aa, Aa}= P(aa, aa}= P{Aa, aa}= {AA,aa} P{AA,AA}=P{aa,aa}=P{AA,Aa}/2= P{Aa, aa}/2= P{AA, aa}= P{Aa, Aa}= {AA, Aa, aa}

November 14, Simulation Studies Assess the type I error of our score test with respect to specific nominal levels (0.05, 0.01, and ) to validate the asymptotic behavior of the test statistic. Compare the power of our test with other test statistics. Choose the ordinal level K=3, 4, or 5.

November 14, Simulation Design Generate the parent’s genotypes for given the haplotype frequencies HaplotypeFrequency AD0.2 Ad0.1 aD0.1 ad0.6

November 14, Simulation Design Given the parental genotypes, generate the offspring genotypes assuming unlinked (null) or linked (1cM, alternative) trait and marker loci Conditional on the trait genotype, use the proportional odds model to generate the ordinal trait. 200 or 400 families are generated

November 14, Three models to generated trait values (a)A proportional odds model is used to generate an ordinal trait; (b)A non-proportional odds model is also used to generate an ordinal trait to assess the robustness of our score test with respect to the proportionality assumption; (c)A Gaussian model is used to generate a quantitative trait to evaluate the performance of O-TDT for the quantitative trait.

November 14, Ordinal Traits Generated from a Proportional Odds Model (a)

November 14, Type I Errors Based on 10,000 Replications (a) #of families K Q-TDTO-TDTTDTQ-TDTO-TDTTDTQ-TDTO-TDTTDT e-0058e-0059e e-0056e-0059e e e e e e e-005

November 14, Figure: Power comparison (a)

November 14, K=3 P(Y  1|dd)=.7 P(Y  2|dd)=.9 P(Y  1|dD)=.3 P(Y  2|dD)=.6 P(Y  1|DD)=.1 P(Y  2|DD)=.5 P(Y=1)=0.478 P(Y=2)=0.260 P(Y=3)=0.262 K=4 P(Y  1|dd)=.7 P(Y  2|dd)=.8 P(Y  3|dd)=.9 P(Y  1|dD)=.3 P(Y  2 |dD)=.5 P(Y  3 |dD)=.7 P(Y  1|DD)=.1 P(Y  2 |DD)=.35 P(Y  3 |DD)=.6 P(Y=1)=0.478 P(Y=2)=0.155 P(Y=3)=0.156 P(Y=4)=0.211 K=5P(Y  1|dd)=.7 P(Y  2|dd)=.77 P(Y  3|dd)=.85 P(Y  4|dd)=.92 P(Y  1 |dD)=.2 P(Y  2 |dD)=.45 P(Y  3 |dD)=.65 P(Y  4 |dD)=.8 P(Y  1 |DD)=.05 P(Y  2 |DD)=.35 P(Y  3 |DD)=.55 P(Y  4 |DD)=.75 P(Y=1)=0.431 P(Y=2)=0.166 P(Y=3)=0.141 P(Y=4)=0.115 P(Y=5)=0.146 Non-Proportional Odds Model (b) Conditional and marginal distribution for ordinal trait

November 14, Type I Errors Based on 10,000 Replications (b) #of families K Q-TDTO-TDTTDTQ-TDTO-TDTTDTQ-TDTO-TDTTDT e e-0057e e-0057e-0055e e-0059e e

November 14, Figure: Power comparison (b)

November 14, Performance for Quantitative Traits (c) Our test can serve as a unified test for any trait. For quantitative trait, the weights in our test are the functions of quantiles. Simulations show that our test is competitive with, but slightly less powerful than Q-TDT.

November 14, Type I Errors for Quantitative Traits Based on 100,000 Replications (c) # of Family Q-TDTO-TDTQ-TDTO-TDTQ-TDTO-TDT e

November 14, Power: Quantitative Trait Data are simulated similarly to the experiments for assessing type I error, except the following. Given the genotype at the trait locus, the quantitative trait follows the normal distribution with mean proportional to the number of the trait increasing allele and unit variance. Namely,

November 14, Figure: Power comparison (c)

November 14, Data (Dr. Ming Li) Identify candidate SNPs through association analysis Nicotine dependence was measured in 313 families with 1,396 subjects. 12 SNPs were genotyped for GPR51 gene (suggested from Framingham Heart Study samples ). One ordinal trait with 8 levels was assessed by Fagerstrom test for nicotine dependence (FTND) FBAT was also used for comparison

November 14, FTND 1. How many cigarettes a day do you usually smoke? (0-3 points) 2. How soon after you wake up do you smoke your first cigarette? (0-3 points) 3. Do you smoke more during the first two hours of the day than during the rest of the day? (0,1) 4.Which cigarette would you most hate to give up? (0,1) 5.Do you find it difficult to refrain from smoking in places where it is forbidden, such as public buildings, on airplanes or at work? (0,1) 6. Do you still smoke even when you are so ill that you are in bed most of the day? (0,1) TOTAL POINTS =

November 14, GPR51 Gene G protein-coupled receptor 51 (on 5q24 on rat genome and 9p22.33 on human genome) Combines with GABA-B1 to form functional GABA-B receptors Inhibits high voltage activated calcium ion channels

November 14, Results SNP IDPBAT-GEEO-TDTQ-TDT PooledAAEAPooledAAEAPooledAAEA rs rs rs rs rs rs rs

November 14, Discussion and Conclusion We propose a score test statistic for Linkage analysis. Although it is derived from a proportional odds model for ordinal traits, power comparisons reveal that it can serve as a unified approach for dichotomous, quantitative, and ordinal traits. The score based Q-TDT test yields lower power than O- TDT for ordinal traits, but the difference ranges from a few to tens of percents, depending on the distribution of the ordinal traits.