Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS.

Similar presentations


Presentation on theme: "Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS."— Presentation transcript:

1 Genome-wide Association Studies John S. Witte

2 Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS

3 Affymetrix Array Genome-wide Association Studies Altshuler & Clark, Science 2005

4 Genome-wide Assocation Studies (GWAS)

5 GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time

6 GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time

7 1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2  samples  markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs

8 SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design

9 Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power http://www.sph.umich.edu/csg/abecasis/CaTS/index.html

10 QC Steps Filter SNPs and Individuals – MAF, Low call rates Test for HWE among controls & within ethnic groups. Use conservative alpha-level Check for relatedness. Identity-by-state calculations.

11 Analysis of GWAS Most common approach: look at each SNP one-at-a-time. Possibly add in multi-marker information. Further investigate / report top SNPs only. Or backwards replication… P-values

12 GWAS Analysis Most commonly trend test. Log additive model, logistic regression. Adjust for potential population stratification.

13 Quantile-Quantile (QQ) Plot

14 http://cgems.cancer.gov chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24

15 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

16 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

17 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 - 29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, 2009 24,223 smallest P-value!

18 Manolio et al. Clin Invest 2008www.genome.gov/gwastudies Prostate Cancer

19 Population Attributable Risks for GWAS Jorgenson & Witte, 2009 Smoking & lung cancer BRCA1 & Breast cancer

20 Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Br Cancer Risk Gail = 58% SNPs = 58.9% G + S = 61.8% Wacholder et al. NEJM 2010

21 Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal

22 Where’s the Heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)

23 Will GWAS results explain more heritability? Possibly, if… 1.Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). 2.Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.

24 Imputation of SNP Genotypes Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power. Allow for combining data across different platforms (e.g., Affy & Illumina) (for replication / meta- analysis).

25 Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis

26 Identify Match with Reference Gonçalo Abecasis

27 Phase chromosomes, impute missing genotypes Gonçalo Abecasis http://www.sph.umich.edu/csg/abecasis/MACH

28 Imputation Application Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.

29 Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020xperfect 2,00010xr 2 =0.98r 2 =0.995 4,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis

30 Near-term Design Choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow- up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…?

31 Many weak associations combine to risk? Score model: where – ln(OR i ) = ‘score’ for SNP i from ‘discovery’ sample – SNP ij = # of alleles (0,1,2) for SNP i, person j in ‘validation’ sample. – Large number of SNPs (m) x j associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009

32 Purcell / ISC et al. Nature 2009 Application of Model

33 Application to CGEMs PCa GWAS 1,172 cases, 1,157 controls from PLCO Trial Oversampled more aggressive cases. Illumina 550K array. PCa & stratified by disease aggressiveness. Split into halves, resampling: – one as ‘discovery’ sample; – other as ‘validation’. LD filter: r 2 = 0.5. Witte & Hoffman 2010

34 Results for Prostate Cancer

35 Nat Rev Cancer 2010;10:205-212 Common Polygenic Model for Prostate and Breast Cancer? - CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.

36 Results for PCa & BrCa

37 Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!

38 Pathways Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: http://www.biocarta.com/http://www.biocarta.com/ May be interested in potential joint and/or interaction effects of multiple genes in one pathway.

39 Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression). Systems Biology


Download ppt "Genome-wide Association Studies John S. Witte. Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS."

Similar presentations


Ads by Google