Association analysis Shaun Purcell Boulder Twin Workshop 2004.

Slides:



Advertisements
Similar presentations
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Advertisements

Key Terms Foldable CH. 5 Heredity
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科 台北榮民總醫院教學研究部.
Human Genetics Genetic Epidemiology.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
What is a chromosome?.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Analysis of whole genome association studies in pedigreed populations
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Genetic Theory Manuel AR Ferreira Egmond, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Biometrical Genetics Pak Sham & Shaun Purcell Twin Workshop, March 2002.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Mx Practical TC18, 2005 Dorret Boomsma, Nick Martin, Hermine H. Maes.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Introduction to QTL analysis Peter Visscher University of Edinburgh
Process of Genetic Epidemiology Migrant Studies Familial AggregationSegregation Association StudiesLinkage Analysis Fine Mapping Cloning Defining the Phenotype.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Family-Based Association Tests
Non-Mendelian Genetics
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Hunting: Linkage and Association
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Linkage and association Sarah Medland. Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Family Based Association Danielle Posthuma Stacey Cherny TC18-Boulder 2005.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Introduction to Genetic Theory
Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009.
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Biometrical genetics Manuel AR Ferreira Boulder, 2008 Massachusetts General Hospital Harvard Medical School Boston.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Power in QTL linkage analysis
Extended Pedigrees HGEN619 class 2007.
Regression Models for Linkage: Merlin Regress
Linkage and Association in Mx
Can resemblance (e.g. correlations) between sib pairs, or DZ twins, be modeled as a function of DNA marker sharing at a particular chromosomal location?
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Error Checking for Linkage Analyses
Biometrical model and introduction to genetic analysis
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Lecture 10: QTL Mapping II: Outbred Populations
Association Analysis Spotted history
Lecture 9: QTL Mapping II: Outbred Populations
Power Calculation for QTL Association
Presentation transcript:

Association analysis Shaun Purcell Boulder Twin Workshop 2004

Overview Candidate gene association Haplotypes and linkage disequilibrium Linkage and association Family-based association

What is association? Categorical traits –disease susceptibility genes Continuous traits –quantitative trait loci, QTL

Disease traits Case Control AAn 1 n 2 Aan 3 n 4 aan 5 n 6 Is there a difference in allele/genotype frequency between cases and controls?

Disease traits Case Control AA 3025p 2 Aa 50502p(1-p) aa (1-p) 2 Is there a difference in allele/genotype frequency between cases and controls? Test for independence, p-value

Disease traits CaseControl AAn1n1 n2n2 Aan3n3 n4n4 aan5n5 n6n6 CaseControl A2n 1 +n 3 2n 2 +n 4 a2n 5 +n 3 2n 6 +n 4 CaseControl A*n 1 +n 3 n 2 +n 4 aan5n5 n6n6 General model Additive modelDominant model for A 2 df 1 df Effect sizes calculated as odds ratios

Relative risk D+D- E+ab E-cd Risk in E+ = a / ( a + b ) Risk in E- = c / ( c + d ) Relative risk of exposure = (a /( a + b )) / (c /(c + d ))

Odds ratio D+D- E+ab E-cd Odds in D+ = a/c Odds in D- = b/d Odds ratio = (a/c) / (b/d)

Quantitative traits AA Aa aa Aa AA IDYGAD aa Aa Aa AA AA10 …………… Y = aA + dD + e

Some web resources BGIM Introductory tutorials on twin analysis, primer on maximum likelihood, Mx language. GxE moderator models Power calculation Case/control association tools

Relative risk GenotypeP(D|G)RR AAP(D|AA)P(D|AA)/P(D|aa) AaP(D|Aa)P(D|Aa)/P(D|aa) aaP(D|aa)1 P(D|AA) / P(D|aa) labelled RR(AA) P(D|Aa) / P(D|aa) labelled RR(Aa)

Genetic models ModelRR(Aa)RR(AA) Generalxy Multiplicativexx2x2 Dominantxx Recessive1.000x No effect1.000

Tests TestAlternateNull Any effect? GeneralNo effect Any effect assuming a multiplicative gene? MultiplicativeNo effect Any effect assuming a dominant gene? DominanceNo effect Any effect assuming a recessive gene? RecessiveNo effect Can we assume a multiplicative effect? GeneralMultiplicative Can we assume a dominant effect? GeneralDominance Can we assume a recessive effect? GeneralRecessive

Multiple samples Constrain frequencies across samples Constrain effects across samples –Can test genetic models with effects and/or frequencies constrained to be equal –Can perform tests of homogeneity of effects and/or frequencies across samples

An example 2 case/control samples Population frequency 5% CaseControl AA1711 Aa3559 aa2440 CaseControl AA3710 Aa6743 aa2037

Homogeneous effects across samples Homogeneous allele frequencies across samples ModelpRR(Aa)RR(AA)-2LL Gen Mult Dom Rec None

Heterogeneous effects across samples Homogeneous allele frequencies across samples ModelpRR(Aa)RR(AA)-2LL Gen Mult Dom Rec None

TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS & EQ FREQS ========================================================= Gen vs None (2 df) : p = Mult vs None (1 df) : p = Dom vs None (1 df) : p = Rec vs None (1 df) : p = Gen vs Mult (1 df) : 0.056p = Gen vs Dom (1 df) : 9.784p = Gen vs Rec (1 df) : p = TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS & EQ FREQS =========================================================== Gen vs None (4 df) : p = Mult vs None (2 df) : p = Dom vs None (2 df) : p = Rec vs None (2 df) : p = Gen vs Mult (2 df) : 1.764p = Gen vs Dom (2 df) : 9.925p = Gen vs Rec (2 df) : p = TESTS OF EQUAL EFFECTS -- ASSUMING EQ FREQS =========================================== w/ Gen model (2 df) : 6.645p = w/ Mult model (1 df) : 4.938p = w/ Dom model (1 df) : 6.505p = w/ Rec model (1 df) : 1.215p = 0.270

Indirect association QTL Genotyped markers Ungenotyped markers

Recombination Paternal chromosome Maternal chromosome Homologous chromosomes in one parent Recombination event during meiosis Recombinant gamete transmitted, harboring mutation

Recombination Paternal chromosome Maternal chromosome Homologous chromosomes in one parent No recombination event during meiosis Nonrecombinant gamete transmitted, not harboring mutation

Linkage: affected sib pairs Paternal chromosome Maternal chromosome First affected offspring, no recombination Second affected offspring, recombinant gamete IBD sharing from this one parent (0 or 1) 1 0

Association analysis Mutation occurs on a ‘red’ chromosome

Association analysis Mutation occurs on a ‘red’ chromosome

Association analysis Association due to `linkage disequilibrium’

Aa MAMaM mAmam This individual has aa and Mm genotypes and am and aM haplotypes Haplotypes

Aa MAMaM mAmam This individual has Aa and Mm genotypes and AM and am haplotypes … but given only genotype data, consistent with Am/aM as well as AM/amHaplotypes

Aa MAMaM mAmam This individual has AA and Mm genotypes and AM and Am haplotypesHaplotypes

Equilibrium haplotype frequencies Aa Mprpsp mqrqsq rs

Linkage disequilibrium Aa Mpr + Dps - Dp mqr - Dqs + Dq rs D MAX = Min(qs, pr) D’ = D /D MAX r 2 = D’ / pqrs

Haplotype analysis 1.Estimate haplotypes from genotypes 2.Associate haplotypes with trait HaplotypeFreq.Odds Ratio AAGG40%1.00* AAGT30%2.21 CGCG25%1.07 AGCT5%0.92 * baseline, fixed to 1.00

LinkageAssociation QTL genotype Trait IBD at the QTL Sib correlation aaAaAA Marker genotype Trait QTL genotype Trait LD RF IBD at the Marker Sib correlation IBD at the QTL Sib correlation aaAaAA aaAaAA

Variance Components Means M 1 M 2 Variance-covariance matrix V 1 C 21 C 12 V 2 ASSOCIATION LINKAGE

Variance Components Means M 1 + bG 1 M 2 + bG 2 Variance-covariance matrix V 1 C 21 + q(  -½) C 12 + q(  -½) V 2 LINKAGE q = regression coef.  = IBD sharing 0, ½, 1 ASSOCIATION b = regression coef. G = individual’s genotype

POPULATION MODEL –Allele & genotype frequencies –Demographics & population history –Linkage disequilibrium, haplotype structure TRANSMISSION MODEL –Mendelian segregation –Identity by descent & genetic relatedness PHENOTYPE MODEL –Biometrical model of quantitative traits –Additive & dominance components Components of a Genetic Theory G G G G G G G G Time G G G G G G G G G G G G G G GG PP

3/52/6 3/2 5/2 3/52/6 3/6 5/6 Both families are ‘linked’ with the marker… …but a different allele is involved. Linkage without association

3/62/4 3/2 6/2 3/52/6 3/6 5/6 All families are ‘linked’ with the marker… … and allele 6 is ‘associated’ with disease 4/62/6 6/66/6 6/66/6 Linkage is just association within families Linkage and association

3/6 2/4 3/2 6/2 3/5 2/5 3/6 5/6 Allele 6 is more common in the GREEN population The disease is more common in the GREEN population … a ‘spurious association’ 4/6 2/6 6/66/6 2/2 3/4 5/2 ControlsCases Association without linkage

TDT Transmission disequilibrium test –test for linkage and association AA Aa AA Aa aa AA Aa

TDT “A” disease allele AA x Aa AA x Aa aa x Aa aa x Aa AA Aa Aa aa Additive Dominant Recessive

Between and within components Sib1 Sib2 Sib1 = B - W Sib2 = B + W

Between and within components Fulker et al (1999) S1S1 S2S2 S1S1 S2S2 BWS1S1 S2S2 AA 1110B+WB-W AAAa100.5 B+WB-W AAaa101B+WB-W Note : W = S 1 – B

Parental genotypes Use parental genotypes to generate B Examples –AA from AAxAA W = 0 –Aa from AAxAa W = -0.5 –Aa from AaxAa W = 0 PatMatB

assoc.mx Sibling pair sample B and W components precalculated in input file Single SNP genotype Quantitative trait

assoc.dat s1 s2 g1 g2 b w1 w2

! Mx script for QTL association: sib pairs, univariate Group 1 : Calc NG=2 Begin Matrices; ! ** Parameters B Full 1 1 free! association : between component W Full 1 1 free ! association : within component M Full 1 1 free ! mean S Full 1 1 free ! Shared residual variance N Full 1 1 free! Nonshared residual variance ! ** Definition variables ** C Full 1 1 ! association : between X Full 1 1 ! association : within, sib 1 Y Full 1 1 ! association : within, sib 2 End Matrices; ! ** Uncomment for B=W model ! Equate W B ! Starting values Matrix B 0 Matrix W 0 Matrix M 0 Matrix S 0.5 Matrix N 0.5 End

Group2 : Data Group Data NI=7 NO=0 RE file=assoc.dat Labels Sib1 Sib2 g1 g2 b w1 w2 Select Sib1 Sib2 b w1 w2 / Definition b w1 w2 / Matrices = Group 1 Means M + B*C + W*X | M + B*C + W*Y / Covariance S + N | S _ S | S + N / Specify C b / Specify X w1 / Specify Y w2 / End

Models B & W B Full 1 1 free W Full 1 1 free !Equate W B B = W B Full 1 1 free W Full 1 1 free Equate W B B B Full 1 1 free W Full 1 1 !Equate W B B=W=0 B Full 1 1 W Full 1 1 !Equate W B 1 1 1

Tests TestH A H 0 Standard association testB = WB=W=0 Test of stratificationB & W B = W Robust association testB & W B

assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of total association H A B=W H 0 B=W= Δ-2LL= 58.29, df = 1, p < 1e-14

assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of stratification H A B &W H 0 B = W Δ-2LL= 1.09, df = 1, p =0.29

assoc.mx ModelBW-2LLdf B & W B = W B B=W= Test of within association H A B &W H 0 B Δ-2LL= 23.06, df = 1, p < 1e-6

Implementation QTDT –Abecasis et al (2001) AJHG –extends between/within model to general pedigrees –multiple alleles –covariates –combined test of linkage and association –discrete as well as quantitative traits

Linkage Association families detectable over large distances >10 cM large effects OR >3, variance>10% unrelateds or families detectable over small distances <1 cM small effects OR<2, variance<1%