Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide association studies (GWAS) Thomas Hoffmann.

Similar presentations

Presentation on theme: "Genome-wide association studies (GWAS) Thomas Hoffmann."— Presentation transcript:

1 Genome-wide association studies (GWAS) Thomas Hoffmann

2 Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models

3 Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models

4 Manolio et al., Clin Invest 2008


6 Recap: Association studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS (guilt by association)

7 GWAS Microarray Affymetrix, Assay ~ 0.7 - 5M SNPs (keeps increasing)

8 Genotype calls Good calls! Bad calls!

9 Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models

10 Genome-wide assocation studies (GWAS)

11 SNPs One-Stage Design Stage 1 Stage 2 n samples n markers Two-Stage Design SNPs Samples One- and two-stage GWA designs

12 SNPs Replication-based analysis SNPs Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Stage 1 Stage 2 Two-Stage Design Samples 1 2 1 2

13 Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power CaTs power calculator:

14 Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020x“perfect” 2,00010xr 2 =0.98r 2 =0.995 4,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis More later in “Next generation sequencing” (NGS) lecture

15 Near-term sequencing design choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow- up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…? More later in “Next generation sequencing” (NGS) lecture

16 Design choices GWAS Microarray –Only assay SNPs designed into array (0.7-5 million) –Much cheaper (so many more subjects) –Genotypes currently more reliable GWAS Sequencing –“De novo” discovery (particularly good for rare variants) –More expensive (but costs are falling) (many less subjects) –Need much more expansive IT support –Lots of interesting interpretation problems (field rapidly evolving)

17 Design choices Exome Microarray –Only assay SNPs designed into array (~300K+custom); in exons only and that could affect protein coding function –Cheapest (so many more subjects) –Genotypes currently more reliable (some question about rarest, but preliminary results good) Exome Sequencing –“De novo” discovery (particularly good for rare variants); %age of exons only –More expensive than microarrays, less expensive than gwas sequencing –Need more expansive IT support –Lots of interesting interpretation problems

18 Size of study Visscher, AJHG 2012,

19 Size of study Visscher, AJHG 2012,

20 Biggest studies... GWAS Microarray: 100,000 People in the Kaiser RPGEH, still to be analyzed (Hoffmann et al., Genomics, 2011a&b) Sequencing: 1000 Genomes Project (though not disease focused, low coverage issues) Exome Sequencing: GO ESP (12,031 subjects, for exome microarray design)

21 Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models

22 QC Steps Filter SNPs and Individuals – MAF, Low call rates Test for HWE among controls & within ethnic groups. Use conservative alpha-level. Check for relatedness. Identity-by-state calculations. Check genotype gender Filter Mendelian inhertance (family-based, or potentially cryptics, if large enough sample)

23 Check for relatedness, e.g., HapMap Pemberton et al., AJHG 2010

24 GWAS analysis Most common approach: look at each SNP one-at-a-time. Possibly add in multi-marker information. Further investigate / report top SNPs only. Or backwards replication… P-values

25 GWAS analysis Additive coding of SNP most common, just a covariate in a regression framework Dichotomous phenotype: logistic regression Continuous phenotype: linear regression Correct for multiple comparisons e.g., Bonferroni, 1 million gives  =5x10 -8 more next time Adjust for potential population stratification principal components (PC’s), on best performing SNPs software usually does LD filter (e.g. Eigensoft)

26 Adjusting for PC’s (recap) Balding, Nature Reviews Genetics 2010

27 Adjusting for PC's Li et al., Science 2008

28 Adjusting for PC's Razib, Current Biology 2008

29 Adjusting for PC's Wang, BMC Proc 2009

30 QQ-plots and PC adjustment Wang, BMC Proc 2009

31 Quantile-quantile (QQ) plot

32 chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24

33 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0. -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

34 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0. -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

35 LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0. -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 - 29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, 2009 24,223 smallest P-value!

36 Population Attributable Risks for GWAS Jorgenson & Witte, 2009 Smoking & lung cancer BRCA1 & Breast cancer

37 Imputation of SNP Genotypes Combine data from different platforms (e.g., Affy & Illumina) (for replication / meta-analysis). Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power (impute and analyze all), e.g. Sick sinus syndrome, most significant was 1000 Genomes imputed SNP (Holm et al., Nature Genetics, 2011) HapMap as reference, now 1000 Genomes Project?

38 Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis,

39 Identify Match with Reference Gonçalo Abecasis,

40 Phase chromosomes, impute missing genotypes Gonçalo Abecasis,

41 Imputation Application Chromosomal Position Marchini Nature Genetics2007 TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.

42 Replication To replicate: – Association test for replication sample significant at 0.05 alpha level – Same mode of inheritance – Same direction – Sufficient sample size for replication Non-replications not necessarily a false positive – LD structures, different populations (e.g., flip-flop) – covariates, phenotype definition, underpowered

43 Meta-analysis Combine multiple studies to increase power Either combine p-values (Fisher’s test), or z-scores (better)

44 chromosome (Meta-analysis) Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24

45 Replication & Meta-analysis

46 Meta-analysis

47 Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models

48 Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Breast Cancer Risk 58%: Gail model (# first degree relatives w bc, age menarche, age first live birth, number of previous biopsies) + age, study, entry year 58.9%: SNPs 61.8%: Combined Wacholder et al., NEJM 2010

49 Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal

50 Where's the heritability? Visccher, AJHG 2011

51 Where’s the heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)

52 Will GWAS results explain more heritability? Possibly, if… 1.Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). 2.Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.

53 Gene/pathway-based tests Various ways of collapsing the genotype information in multiple genes – Less multiple comparison adjustment logit (Prob(y=1| x, c)) =  +  x +  c e.g.=  +   x 1 +…+   x 12 +{PC’s, other covariates} – y: disease status) – x is a vector of genotypes (e.g., a gene, or a pathway) – c is a vector of covariates H 0 :  =0

54 Gene/pathway-based tests logit (Prob(y=1| x, c)) =  +  x +  c e.g.=  +   x 1 +…+   x 12 +   PC  +…+   PC  +... One example: “Kernel” machine [Question from last time] – Simplest case of linear “kernel” reduces to linear/logistic regression (model above) – More complicated function of genotypes can be tested, e.g., interactions, etc. – Gory details: Variance components score test [h(x) in paper] (Wu et al., AJHG 2010)

55 Pathways - how to define? Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: May be interested in potential joint and/or interaction effects of multiple genes in one pathway.

56 Many weak associations combine to risk? Score model (use all GWAS SNPs): where – ln(OR i ) = ‘score’ for SNP i from ‘discovery’ sample – SNP ij = # of alleles (0,1,2) for SNP i, person j in ‘validation’ sample. – Large number of SNPs (m) x j associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009

57 Purcell / ISC et al. Nature 2009 Application of Model

58 Application to CGEMs PCa GWAS 1,172 cases, 1,157 controls from PLCO Trial Oversampled more aggressive cases. Illumina 550K array. PCa & stratified by disease aggressiveness. Split into halves, resampling: – one as ‘discovery’ sample; – other as ‘validation’. LD filter: r 2 = 0.5. Witte & Hoffmann, OMICs 2010

59 Results for Prostate Cancer

60 Nat Rev Cancer 2010;10:205-212 Common Polygenic Model for Prostate and Breast Cancer? - CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.

61 Results for PCa & BrCa

62 Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!

63 Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression). Systems Biology

Download ppt "Genome-wide association studies (GWAS) Thomas Hoffmann."

Similar presentations

Ads by Google