Download presentation
Presentation is loading. Please wait.
Published byMeghan Jennings Modified over 8 years ago
1
Genome-wide association studies (GWAS) Thomas Hoffmann
2
Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models
3
Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models
4
Manolio et al., Clin Invest 2008
6
Recap: Association studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS (guilt by association)
7
GWAS Microarray Affymetrix, http://www.affymetrix.com Assay ~ 0.7 - 5M SNPs (keeps increasing)
8
Genotype calls Good calls! Bad calls!
9
Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models
10
Genome-wide assocation studies (GWAS)
11
SNPs One-Stage Design Stage 1 Stage 2 n samples n markers Two-Stage Design SNPs Samples One- and two-stage GWA designs
12
SNPs Replication-based analysis SNPs Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Stage 1 Stage 2 Two-Stage Design Samples 1 2 1 2
13
Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power CaTs power calculator: http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
14
Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020x“perfect” 2,00010xr 2 =0.98r 2 =0.995 4,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis More later in “Next generation sequencing” (NGS) lecture
15
Near-term sequencing design choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow- up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…? More later in “Next generation sequencing” (NGS) lecture
16
Design choices GWAS Microarray –Only assay SNPs designed into array (0.7-5 million) –Much cheaper (so many more subjects) –Genotypes currently more reliable GWAS Sequencing –“De novo” discovery (particularly good for rare variants) –More expensive (but costs are falling) (many less subjects) –Need much more expansive IT support –Lots of interesting interpretation problems (field rapidly evolving)
17
Design choices Exome Microarray –Only assay SNPs designed into array (~300K+custom); in exons only and that could affect protein coding function –Cheapest (so many more subjects) –Genotypes currently more reliable (some question about rarest, but preliminary results good) Exome Sequencing –“De novo” discovery (particularly good for rare variants); %age of exons only –More expensive than microarrays, less expensive than gwas sequencing –Need more expansive IT support –Lots of interesting interpretation problems
18
Size of study Visscher, AJHG 2012,
19
Size of study Visscher, AJHG 2012,
20
Biggest studies... GWAS Microarray: 100,000 People in the Kaiser RPGEH, still to be analyzed (Hoffmann et al., Genomics, 2011a&b) Sequencing: 1000 Genomes Project (though not disease focused, low coverage issues) Exome Sequencing: GO ESP (12,031 subjects, for exome microarray design)
21
Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models
22
QC Steps Filter SNPs and Individuals – MAF, Low call rates Test for HWE among controls & within ethnic groups. Use conservative alpha-level. Check for relatedness. Identity-by-state calculations. Check genotype gender Filter Mendelian inhertance (family-based, or potentially cryptics, if large enough sample)
23
Check for relatedness, e.g., HapMap Pemberton et al., AJHG 2010
24
GWAS analysis Most common approach: look at each SNP one-at-a-time. Possibly add in multi-marker information. Further investigate / report top SNPs only. Or backwards replication… P-values
25
GWAS analysis Additive coding of SNP most common, just a covariate in a regression framework Dichotomous phenotype: logistic regression Continuous phenotype: linear regression Correct for multiple comparisons e.g., Bonferroni, 1 million gives =5x10 -8 more next time Adjust for potential population stratification principal components (PC’s), on best performing SNPs software usually does LD filter (e.g. Eigensoft)
26
Adjusting for PC’s (recap) Balding, Nature Reviews Genetics 2010
27
Adjusting for PC's Li et al., Science 2008
28
Adjusting for PC's Razib, Current Biology 2008
29
Adjusting for PC's Wang, BMC Proc 2009
30
QQ-plots and PC adjustment Wang, BMC Proc 2009
31
Quantile-quantile (QQ) plot
32
http://cgems.cancer.gov chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24
33
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
34
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
35
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 - 29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, 2009 24,223 smallest P-value!
36
Population Attributable Risks for GWAS Jorgenson & Witte, 2009 Smoking & lung cancer BRCA1 & Breast cancer
37
Imputation of SNP Genotypes Combine data from different platforms (e.g., Affy & Illumina) (for replication / meta-analysis). Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power (impute and analyze all), e.g. Sick sinus syndrome, most significant was 1000 Genomes imputed SNP (Holm et al., Nature Genetics, 2011) HapMap as reference, now 1000 Genomes Project?
38
Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html http://faculty.washington.edu/browning/beagle/beagle.html http://www.sph.umich.edu/csg/abecasis/MACH/download/
39
Identify Match with Reference Gonçalo Abecasis http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html http://faculty.washington.edu/browning/beagle/beagle.html http://www.sph.umich.edu/csg/abecasis/MACH/download/
40
Phase chromosomes, impute missing genotypes Gonçalo Abecasis http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html http://faculty.washington.edu/browning/beagle/beagle.html http://www.sph.umich.edu/csg/abecasis/MACH/download/
41
Imputation Application Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.
42
Replication To replicate: – Association test for replication sample significant at 0.05 alpha level – Same mode of inheritance – Same direction – Sufficient sample size for replication Non-replications not necessarily a false positive – LD structures, different populations (e.g., flip-flop) – covariates, phenotype definition, underpowered
43
Meta-analysis Combine multiple studies to increase power Either combine p-values (Fisher’s test), or z-scores (better)
44
http://cgems.cancer.gov chromosome (Meta-analysis) Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24
45
Replication & Meta-analysis
46
Meta-analysis
47
Outline for GWAS Review / Overview Design Analysis – QC – Prostate cancer example – Imputation – Replication & Meta-analysis Advanced analysis intro (more next lecture) – Limitations & “missing heritability” – Gene/pathway tests – Polygenic models
48
Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Breast Cancer Risk 58%: Gail model (# first degree relatives w bc, age menarche, age first live birth, number of previous biopsies) + age, study, entry year 58.9%: SNPs 61.8%: Combined Wacholder et al., NEJM 2010
49
Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal
50
Where's the heritability? Visccher, AJHG 2011
51
Where’s the heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)
52
Will GWAS results explain more heritability? Possibly, if… 1.Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). 2.Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.
53
Gene/pathway-based tests Various ways of collapsing the genotype information in multiple genes – Less multiple comparison adjustment logit (Prob(y=1| x, c)) = + x + c e.g.= + x 1 +…+ x 12 +{PC’s, other covariates} – y: disease status) – x is a vector of genotypes (e.g., a gene, or a pathway) – c is a vector of covariates H 0 : =0
54
Gene/pathway-based tests logit (Prob(y=1| x, c)) = + x + c e.g.= + x 1 +…+ x 12 + PC +…+ PC +... One example: “Kernel” machine [Question from last time] – Simplest case of linear “kernel” reduces to linear/logistic regression (model above) – More complicated function of genotypes can be tested, e.g., interactions, etc. – Gory details: Variance components score test [h(x) in paper] (Wu et al., AJHG 2010)
55
Pathways - how to define? Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: http://www.biocarta.com/ May be interested in potential joint and/or interaction effects of multiple genes in one pathway.
56
Many weak associations combine to risk? Score model (use all GWAS SNPs): where – ln(OR i ) = ‘score’ for SNP i from ‘discovery’ sample – SNP ij = # of alleles (0,1,2) for SNP i, person j in ‘validation’ sample. – Large number of SNPs (m) x j associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009
57
Purcell / ISC et al. Nature 2009 Application of Model
58
Application to CGEMs PCa GWAS 1,172 cases, 1,157 controls from PLCO Trial Oversampled more aggressive cases. Illumina 550K array. PCa & stratified by disease aggressiveness. Split into halves, resampling: – one as ‘discovery’ sample; – other as ‘validation’. LD filter: r 2 = 0.5. Witte & Hoffmann, OMICs 2010
59
Results for Prostate Cancer
60
Nat Rev Cancer 2010;10:205-212 Common Polygenic Model for Prostate and Breast Cancer? - CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.
61
Results for PCa & BrCa
62
Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!
63
Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression). Systems Biology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.