Download presentation
1
Family-Based Association Tests
“If you cannot get rid of the family skeleton, you may as well make it dance” (G.B. Shaw)
2
Outline Overview Trios: Transmission Disequilibrium Test (TDT)
Discordant sibships: Conditional logistic regression General Pedigree: FBAT test Comparisons and extensions
3
Family-based designs Discordant sibpairs, sibships
Affected offspring and their parents Trios (2 parents, child) common design Complex nuclear families Extended pedigrees Leftovers from linkage (next lecture)
4
Family-based vs. Case-control
5
Family-based vs. Case-control
Completely robust to population substructure Robust to HWE failure More powerful for very rare highly penetrant diseases (e.g., arguments coming back for sequencing) Pseudo-controls (e.g., longevity study…), but much harder to recruit (esp. late onset diseases, children generally not difficult) Adjusting for PC’s/AIMs does well in practice, now Test for HWE in controls More powerful in most other situations More careful selection of good controls (sort of)
6
Family-based vs. Case-control
Detect genotyping error (Mendel error) More complex analysis (but doable) Cryptics, maybe Standard regression methods
7
Mendel’s laws Recall the playing cards example...
One allele from each parent for each gene Many family based tests based on this, rather than estimating allele frequencies (case-control)
8
Mendelian transmission: Ex
E.g., parents are Aa, Aa: P(offspring=AA | Mother=Aa,Father=Aa)=? P(offspring=Aa | Mother=Aa,Father=Aa)=? P(offpsring=aa | Mother=Aa, Father=Aa)=?
9
Mendelian transmission: Ex
E.g., parents are Aa, Aa: P(offspring=AA | Mother=Aa,Father=Aa)=1/4 P(offspring=Aa | Mother=Aa,Father=Aa)=1/2 P(offpsring=aa | Mother=Aa, Father=Aa)=1/4 Conditioning on parents...
10
Mendelian transmission: Ex
E.g., parents are AA, Aa: P(offspring=AA | Mother=AA,Father=Aa)=? P(offspring=Aa | Mother=AA,Father=Aa)=? P(offpsring=aa | Mother=AA, Father=Aa)=?
11
Mendelian transmission: Ex
E.g., parents are AA, Aa: P(offspring=AA | Mother=AA,Father=Aa)=1/2 P(offspring=Aa | Mother=AA,Father=Aa)=1/2 P(offpsring=aa | Mother=AA, Father=Aa)=0
12
Mendelian transmission: Ex
E.g., parents are AA, AA: P(offspring=AA | Mother=AA,Father=AA)=? P(offspring=Aa | Mother=AA,Father=AA)=? P(offpsring=aa | Mother=AA, Father=AA)=?
13
Mendelian transmission: Ex
E.g., parents are AA, AA: P(offspring=AA | Mother=AA,Father=AA)=1 P(offspring=Aa | Mother=AA,Father=AA)=0 P(offpsring=aa | Mother=AA, Father=AA)=0 Homozygote parents are “non-informative” (no variation in offspring’s conditional genotype distribution)
14
Outline Overview Trios: Transmission Disequilibrium Test (TDT)
Discordant sibships: Conditional logistic regression General Pedigree: FBAT test Comparisons and extensions
15
Trios: Transmission Disequilibrium Test (TDT)
Test based on transmissions from parents to offspring Assumptions Parents’ and offspring genotypes known dichotomous phenotype (though Q-TDT), only affected offspring Count transmissions from heterozygote parents, and compare to expected transmissions Mendel’s laws of segregation (previous slides), not control group test for over/under-transmission of alleles in cases (intuition…) Conditional test offspring affection status Parental genotypes (conditions out allele frequencies, which is what case-control is based on testing) Spielmen et al., AJHG 1993
16
Trios: Transmission Disequilibrium Test (TDT)
Non-transmitted parental allele Transmitted parental allele w AA parents (transmit one A, do not transmit other A) z aa parents (transmit one a, do not transmit other a) x Aa parents that transmit A, do not transmit a y Aa parents that transmit a, do not transmit A
17
Possible Parental Configurations
AA-AA, AA-Aa, AA-aa, Aa-AA, Aa-Aa, Aa-aa, aa-AA, aa-Aa, aa-aa (Ones not bolded are symmetric for what we will do next, e.g., AA-Aa == Aa-AA Six possible configurations
18
Both parents homozygous
Non-transmitted parental allele AA-AA | AA Transmitted parental allele Offspring genotype is deterministic, no variation, not informative!
19
Both parents homozygous
Non-transmitted parental allele aa-aa | aa Transmitted parental allele Offspring genotype is deterministic, no variation, not informative!
20
Both parents homozygous
Non-transmitted parental allele AA-aa | Aa Transmitted parental allele Offspring genotype is deterministic, no variation, not informative!
21
One parent heterozygous
Non-transmitted parental allele AA-Aa | AA,Aa ← Pr Transmitted parental allele Non-transmitted parental allele Transmitted parental allele Variation from one parent
22
One parent heterozygous
Non-transmitted parental allele Aa-aa | Aa,aa ← Pr Transmitted parental allele Non-transmitted parental allele Transmitted parental allele Variation from one parent
23
Both parents heterozygous
Non-transmitted parental allele Aa-Aa | AA,Aa,aa ← Pr Transmitted parental allele Non-transmitted parental allele Non-transmitted parental allele Transmitted parental allele Transmitted parental allele Variation from both parents
24
Trios: Transmission Disequilibrium Test (TDT)
Non-transmitted parental allele Transmitted parental allele w AA parents (transmit one A, do not transmit other A) z aa parents (transmit one a, do not transmit other a) x Aa parents that transmit A, do not transmit a y Aa parents that transmit a, do not transmit A
25
Transmission Disequilibrium Test (TDT)
Non-transmitted parental allele Transmitted parental allele No variation in w or z (recall homozygous parents non informative) (x-y)2/(x+y) ~ 12; it’s just special case of McNemar’s test Think of it as testing are there an excess of the A allele in the affected offspring than would happen by Mendel's laws?
26
Transmission Disequilibrium Test (TDT)
Non-transmitted parental allele Insulin Dependent Diabetes Mellitus (IDDM) Transmitted parental allele Example from the text: 94 families, 78 parents transmit allele A, 46 transmit allele a (78-46)2/(78+46)=8.26, p- value=0.004 Spielman et al., 1993
27
Limitations of TDT Only affected offspring Only dichotomous phenotypes
Bi-allelic markers Additive genetic model No missing parents Incorporating siblings assumes no linkage (more next time) Can’t do multiple markers, multiple phenotypes
28
Key features of the TDT Random variable in analysis is offspring genotype Parental genotypes fixed Trait fixed (condition on affected offspring)
29
Outline Overview Trios: Transmission Disequilibrium Test (TDT)
Discordant sibships: Conditional logistic regression General Pedigree: FBAT test Extra
30
Discordant sibships Conditional logistic regression
P(Y1=1|Y1+Y2=1,g1,g2,…) Matching each sib together, conditions on the fact that they have discordant phenotypes Standard model for disease as in logistic regression, just matching based on family strata Can also use FBAT framework Similar power for main effects Greater power for GxE (Witte, AJE 1999; Chatterjee et al., Gen Epi 2005; Hoffmann et al., Biometrics 2011) You will go through an example in the homework
31
Outline Overview Trios: Transmission Disequilibrium Test (TDT)
Discordant sibships: Conditional logistic regression General Pedigree: FBAT test Comparisons and extensions
32
FBAT: More general methodology
Maintains general principals of TDT Other genetic models (dominant, recessive, …) Additional siblings, extended pedigrees, missing parents Multiple markers, (haplotypes) Test statistic intuition: covariance between offspring trait and genotype
33
FBAT: Extending TDT to more general families
For the moment, assume parents are genotyped Let i index across families, j offspring Score test of f({offspring genotype}ij|traitij,parentsi), use Mendel’s laws, Bayes rule U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) Assume trait is continuous or binary Assume offset is mean (continuous) or population prevalence (dichotomous) Condition on Parents (avoid specification of allele distribution) Condition on offspring phenotypes (avoid specification of trait distribution)
34
FBAT: Extending the TDT to more general families (cont.)
U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) Intuition: Like a sample covariance between trait and genotype ZFBAT=U/sqrt(var(U)) ~ N(0,1)
35
FBAT: Extending the TDT to more general families (cont.)
U=i,j (traitij-offset) x ({offspring genotype}ij - E[{offspring genotype}ij|parentsi]) Let oij={offspring genotype}ij Let Pi=parentsi E[oij|Pi] = X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) + X(AA)P(oij=AA|Pi) Essentially using Mendel’s laws, as we calculated earlier
36
FBAT computations X = Additive coding of A alleles
Parents AA, aa: E(X|P) = 0*P(AA|P)+1*P(Aa|P)+2*P(aa|P) = 0*0+1*1+2*0=1 Child: X Pr(X) (X-E(X|P)) 1 1 0 Parents Aa, Aa (E(X|P)=0*(1/4)+1*(1/2)+2*(1/4)=1 Child 0 1/4 1/4 1 1/2 0 2 1/4 1/4 (Over/under-transmissions) AA-aa | Aa Uninformative families still contribute nothing! Aa-Aa | AA,Aa,aa
37
Seem familiar? FBAT=TDT
If Y=affection status (1=affected, 0=unaffected), offset=0, then FBAT==TDT Similarly conditional logistic regression roughly equivalent to TDT in terms of power for main effects
38
FBAT offset for dichotomous traits
If all offspring are affected, then it does not matter For rare diseases, affected most informative For more common, can get some information from unaffecteds Population prevalence, allows one to gain a little information from unaffecteds
39
Offset choice Disease prevalence K = 0.05, allele frequency of the disease gene p=0.05, attributable fraction of the disease due to carrying at least one disease gene AF=0.3, significance level α=10−4 and sample size 100 Lange and Laird (2002) Disease prevalence K=0.3, allele frequency of the disease gene p=0.143, attributable fraction of the disease due to carrying at least one disease gene AF=0.25, significance level α=0.01 and sample size 100.
40
Offset choice
41
FBAT offset for continuous traits
The trait mean (Optimal choice is E(Y), depends on ascertainment) Residual from the trait adjusted for covariates e.g., regress gender on bmi, use residual Suppose Y is your phenotype of interest, Z covariate Linear regression Y = 0 + 1Z Compute residual R=Y- (0 + 1Z) Use R as trait in FBAT
42
Continuous vs. Dichotomous trait
Modeling as continuous trait -- more powerful With highly selected traits, dichotomizing may be preferable Using mean for offset is a poor choice here Results very sensitive to offset choice Dichotomizing will lose power compared to best offset choice
43
Offset general comments
Very poor choice -- poor power More complicated slightly more efficient offsets are also available
44
Childhood asthma management program (CAMP) example
696 trios bi-allelic locus in IL13 gene five groups of 22 quantitative phenotypes
45
DeMeo Gen Epi, 2006
46
Can also do a multi-marker (gene-based) test...
DeMeo Gen Epi, 2006
47
Obesity GWAS example BMI follow-up for 24 years 86,604 SNPs
694 participants One of the first GWAS successes
48
GWAS example uses clever screening approach, longitudinal phenotype data...
49
Obesity example: Longitudinal phenotype
50
Obesity example: Screening based on “conditional mean model”
Prioritizes SNPs based on modeling X imputed from parental genotypes (PBAT software) f(X,P)=f(X|P)f(P) Screening not robust to population substructure, but later testing is (so doesn’t matter)
51
Obesity example: Results
52
Screening based on “conditional power”...
Started with only analyze “top k” (Lange et al.) Criticized, not looking at all SNPs, and in practice... Prior distribution for type I error (Iulianna et al, AJHG 2007) Bayesian (Naylor et al, Gen Epi 2010)
53
Critiques Only modeling the offspring conditional on parents, not using parents? Other models do, not robust to population stratification (but could adjust for covariates…) Are used in conditional mean model screening approach
54
Outline Overview Trios: Transmission Disequilibrium Test (TDT)
Discordant sibships: Conditional logistic regression General Pedigree: FBAT test Comparisons and extensions
55
Power of FBAT, CACO, Rare disease
In your book, but not necessarily a fair comparison... 200 trios (600 genotypes) 200 DSP (400 genotypes) 200 sibtrios (3 offspring, no parents, 600 genotypes) 200 cases/200 controls (400 genotypes) OR=1.5
56
Power of common disease
200 trios (600 genotypes) 200 DSP (400 genotypes) 200 sibtrios (3 offspring, no parents, 600 genotypes) 200 cases/200 controls (400 genotypes) OR=1.5
57
Final thoughts FBAT also extended to X chromosome Survival Analysis
Multi-marker Multi-phenotype Haplotypes Missing parents Gene-environment/gene-gene interactions Meta-analysis
58
Final thoughts Other likelihood approaches, Shaid 1989, Cordell 2000, Dudbridge 2010 (software unphased) Other approach by Allison 1997, Abecasis for quantitative traits Also simultaneous modeling of family and case-control data (Sage/Mendel software) If large enough sample, maybe cryptics?
59
Software FBAT http://www.biostat.harvard.edu/fbat/fbat.htm
PBAT P2BAT Dudbridge's UNPHASED ased/ Clayton's software
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.