Association analysis Shaun Purcell Boulder Twin Workshop 2004
Overview Candidate gene association Haplotypes and linkage disequilibrium Linkage and association Family-based association
What is association? Categorical traits –disease susceptibility genes Continuous traits –quantitative trait loci, QTL
Disease traits Case Control AAn 1 n 2 Aan 3 n 4 aan 5 n 6 Is there a difference in allele/genotype frequency between cases and controls?
Disease traits Case Control AA 3025p 2 Aa 50502p(1-p) aa (1-p) 2 Is there a difference in allele/genotype frequency between cases and controls? Test for independence, p-value
Disease traits CaseControl AAn1n1 n2n2 Aan3n3 n4n4 aan5n5 n6n6 CaseControl A2n 1 +n 3 2n 2 +n 4 a2n 5 +n 3 2n 6 +n 4 CaseControl A*n 1 +n 3 n 2 +n 4 aan5n5 n6n6 General model Additive modelDominant model for A 2 df 1 df Effect sizes calculated as odds ratios
Relative risk D+D- E+ab E-cd Risk in E+ = a / ( a + b ) Risk in E- = c / ( c + d ) Relative risk of exposure = (a /( a + b )) / (c /(c + d ))
Odds ratio D+D- E+ab E-cd Odds in D+ = a/c Odds in D- = b/d Odds ratio = (a/c) / (b/d)
Quantitative traits AA Aa aa Aa AA IDYGAD aa Aa Aa AA AA10 …………… Y = aA + dD + e
Some web resources BGIM Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language. GxE moderator models Power calculation Case/control association tools
Relative risk GenotypeP(D|G)RR AAP(D|AA)P(D|AA)/P(D|aa) AaP(D|Aa)P(D|Aa)/P(D|aa) aaP(D|aa)1 P(D|AA) / P(D|aa) labelled RR(AA) P(D|Aa) / P(D|aa) labelled RR(Aa)
Genetic models ModelRR(Aa)RR(AA) Generalxy Multiplicativexx2x2 Dominantxx Recessive1.000x No effect1.000
Tests TestAlternateNull Any effect? GeneralNo effect Any effect assuming a multiplicative gene? MultiplicativeNo effect Any effect assuming a dominant gene? DominanceNo effect Any effect assuming a recessive gene? RecessiveNo effect Can we assume a multiplicative effect? GeneralMultiplicative Can we assume a dominant effect? GeneralDominance Can we assume a recessive effect? GeneralRecessive
Multiple samples Constrain frequencies across samples Constrain effects across samples –Can test genetic models with effects and/or frequencies constrained to be equal –Can perform tests of homogeneity of effects and/or frequencies across samples
An example 2 case/control samples Population frequency 5% CaseControl AA1711 Aa3559 aa2440 CaseControl AA3710 Aa6743 aa2037
Homogeneous effects across samples Homogeneous allele frequencies across samples ModelpRR(Aa)RR(AA)-2LL Gen Mult Dom Rec None
Heterogeneous effects across samples Homogeneous allele frequencies across samples ModelpRR(Aa)RR(AA)-2LL Gen Mult Dom Rec None
TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS ========================================================= Gen vs None (2 df) : p = Mult vs None (1 df) : p = Dom vs None (1 df) : p = Rec vs None (1 df) : p = Gen vs Mult (1 df) : 0.056p = Gen vs Dom (1 df) : 9.784p = Gen vs Rec (1 df) : p = TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS =========================================================== Gen vs None (4 df) : p = Mult vs None (2 df) : p = Dom vs None (2 df) : p = Rec vs None (2 df) : p = Gen vs Mult (2 df) : 1.764p = Gen vs Dom (2 df) : 9.925p = Gen vs Rec (2 df) : p = TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS =========================================== w/ Gen model (2 df) : 6.645p = w/ Mult model (1 df) : 4.938p = w/ Dom model (1 df) : 6.505p = w/ Rec model (1 df) : 1.215p = 0.270
Indirect association QTL Genotyped markers Ungenotyped markers
Recombination Paternal chromosome Maternal chromosome Homologous chromosomes in one parent Recombination event during meiosis Recombinant gamete transmitted, harboring mutation
Recombination Paternal chromosome Maternal chromosome Homologous chromosomes in one parent No recombination event during meiosis Nonrecombinant gamete transmitted, not harboring mutation
Linkage: affected sib pairs Paternal chromosome Maternal chromosome First affected offspring, no recombination Second affected offspring, recombinant gamete IBD sharing from this one parent (0 or 1) 1 0
Association analysis Mutation occurs on a ‘red’ chromosome
Association analysis Mutation occurs on a ‘red’ chromosome
Association analysis Association due to `linkage disequilibrium’
Aa MAMaM mAmam This individual has aa and Mm genotypes and am and aM haplotypes Haplotypes
Aa MAMaM mAmam This individual has Aa and Mm genotypes and AM and am haplotypes … but given only genotype data, consistent with Am/aM as well as AM/amHaplotypes
Aa MAMaM mAmam This individual has AA and Mm genotypes and AM and Am haplotypesHaplotypes
Equilibrium haplotype frequencies Aa Mprpsp mqrqsq rs
Linkage disequilibrium Aa Mpr + Dps - Dp mqr - Dqs + Dq rs D MAX = Min(qs, pr) D’ = D /D MAX r 2 = D’ / pqrs
Haplotype analysis 1.Estimate haplotypes from genotypes 2.Associate haplotypes with trait HaplotypeFreq.Odds Ratio AAGG40%1.00* AAGT30%2.21 CGCG25%1.07 AGCT5%0.92 * baseline, fixed to 1.00
LinkageAssociation QTL genotype Trait IBD at the QTL Sib correlation aaAaAA Marker genotype Trait QTL genotype Trait LD RF IBD at the Marker Sib correlation IBD at the QTL Sib correlation aaAaAA aaAaAA
Variance Components Means M 1 M 2 Variance-covariance matrix V 1 C 21 C 12 V 2 ASSOCIATION LINKAGE
Variance Components Means M 1 + bG 1 M 2 + bG 2 Variance-covariance matrix V 1 C 21 + q( -½) C 12 + q( -½) V 2 LINKAGE q = regression coef. = IBD sharing 0, ½, 1 ASSOCIATION b = regression coef. G = individual’s genotype
POPULATION MODEL –Allele & genotype frequencies –Demographics & population history –Linkage disequilibrium, haplotype structure TRANSMISSION MODEL –Mendelian segregation –Identity by descent & genetic relatedness PHENOTYPE MODEL –Biometrical model of quantitative traits –Additive & dominance components Components of a Genetic Theory G G G G G G G G Time G G G G G G G G G G G G G G GG PP
3/52/6 3/2 5/2 3/52/6 3/6 5/6 Both families are ‘linked’ with the marker… …but a different allele is involved. Linkage without association
3/62/4 3/2 6/2 3/52/6 3/6 5/6 All families are ‘linked’ with the marker… … and allele 6 is ‘associated’ with disease 4/62/6 6/66/6 6/66/6 Linkage is just association within families Linkage and association
3/6 2/4 3/2 6/2 3/5 2/5 3/6 5/6 Allele 6 is more common in the GREEN population The disease is more common in the GREEN population … a ‘spurious association’ 4/6 2/6 6/66/6 2/2 3/4 5/2 ControlsCases Association without linkage
TDT Transmission disequilibrium test –test for linkage and association AA Aa AA Aa aa AA Aa
TDT “A” disease allele AA x Aa AA x Aa aa x Aa aa x Aa AA Aa Aa aa Additive Dominant Recessive
Between and within components Sib1 Sib2 Sib1 = B - W Sib2 = B + W
Between and within components Fulker et al (1999) S1S1 S2S2 S1S1 S2S2 BWS1S1 S2S2 AA 1110B+WB-W AAAa100.5 B+WB-W AAaa101B+WB-W Note : W = S 1 – B
Parental genotypes Use parental genotypes to generate B Examples –AA from AAxAA W = 0 –Aa from AAxAa W = -0.5 –Aa from AaxAa W = 0 PatMatB
assoc.mx Sibling pair sample B and W components precalculated in input file Single SNP genotype Quantitative trait
assoc.dat s1 s2 g1 g2 b w1 w2
! Mx script for QTL association: sib pairs, univariate Group 1 : Calc NG=2 Begin Matrices; ! ** Parameters B Full 1 1 free! association : between component W Full 1 1 free ! association : within component M Full 1 1 free ! mean S Full 1 1 free ! Shared residual variance N Full 1 1 free! Nonshared residual variance ! ** Definition variables ** C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2 End Matrices; ! ** Uncomment for B=W model ! Equate W B ! Starting values Matrix B 0 Matrix W 0 Matrix M 0 Matrix S 0.5 Matrix N 0.5 End
Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 / Matrices = Group 1 Means M + B*C + W*X | M + B*C + W*Y / Covariance S + N | S _ S | S + N / Specify C b / Specify X w1 / Specify Y w2 / End
Models B & W B Full 1 1 free W Full 1 1 free !Equate W B B = W B Full 1 1 free W Full 1 1 free Equate W B B B Full 1 1 free W Full 1 1 !Equate W B B=W=0 B Full 1 1 W Full 1 1 !Equate W B 1 1 1
Tests TestH A H 0 Standard association testB = WB=W=0 Test of stratificationB & W B = W Robust association testB & W B
assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of total association H A B=W H 0 B=W= Δ-2LL= 58.29, df = 1, p < 1e-14
assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of stratification H A B &W H 0 B = W Δ-2LL= 1.09, df = 1, p =0.29
assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of within association H A B &W H 0 B Δ-2LL= 23.06, df = 1, p < 1e-6
Implementation QTDT –Abecasis et al (2001) AJHG –extends between/within model to general pedigrees –multiple alleles –covariates –combined test of linkage and association –discrete as well as quantitative traits
Linkage Association families detectable over large distances >10 cM large effects OR >3, variance>10% unrelateds or families detectable over small distances <1 cM small effects OR<2, variance<1%