Download presentation
Presentation is loading. Please wait.
Published byDylan Arnold Byrd Modified over 9 years ago
1
Genome-wide association studies (GWAS) Thomas Hoffmann
2
Outline GWAS Overview Design Microarray Sequencing Which to use and other censiderations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
3
Manolio et al., Clin Invest 2008
5
Genetic association studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS (guilt by association)
6
GWAS Microarray Affymetrix, http://www.affymetrix.com Assay ~ 0.7 - 5M SNPs (keeps increasing)
7
Genotype calls Good calls! Bad calls!
8
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
9
Genome-wide assocation studies (GWAS)
10
SNPs One-Stage Design Stage 1 Stage 2 n samples n markers Two-Stage Design SNPs Samples One- and two-stage GWA designs
11
SNPs Replication-based analysis SNPs Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Stage 1 Stage 2 Two-Stage Design Samples 1 2 1 2
12
Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power CaTs power calculator: http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
13
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
14
Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020x“perfect” 2,00010xr 2 =0.98r 2 =0.995 4,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis
15
Near-term sequencing design choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow-up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…?
16
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
17
Design choices GWAS Microarray –Only assay SNPs designed into array (0.7-5 million) –Much cheaper (so many more subjects) GWAS Sequencing “De novo” discovery (particularly good for rare variants) More expensive (but costs are falling) (many less subjects) Need much more expansive IT support Lots of interesting interpretation problems (field rapidly evolving)
18
Design choices Exome Microarray –Only assay SNPs designed into array (~300K+custom); in exons only and that could affect protein coding function –Cheapest (so many more subjects) Exome Sequencing “De novo” discovery (particularly good for rare variants); %age of exons only More expensive than microarrays, less expensive than gwas sequencing Need more expansive IT support Lots of interesting interpretation problems
19
Size of study Visscher, AJHG 2012,
20
Size of study Visscher, AJHG 2012,
21
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
22
QC Steps Remove SNPs with low call rate (e.g., <97%) Proportion of SNPs actually called by software If it's low, the clusters aren't well defined, artifacts Remove those with low minor allele frequency? Rarer variants more likely artifacts / underpowered Exome arrays – rare variants are the whole point! Remove SNPs / Individuals who have too much missing data
23
QC Steps (2) SNPs that fail Hardy-Weinberg Suppose a SNP with alleles A and B has allele frequency of p. If random matting, then AA has frequency p*p AB has frequency 2*p*(1-p) BB has frequency (1-p)*(1-p) Test for this (e.g., chi-squared test) In practice do for homogeneous populations (more later)
24
QC Steps Check genotype gender Filter Mendelian inhertance (family-based, or potentially cryptics, if large enough sample) Check for relatedness...
25
Check for relatedness, e.g., HapMap Pemberton et al., AJHG 2010
26
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
27
GWAS analysis Most common approach: look at each SNP one-at-a-time – Additive coding of SNP most common, e.g., # of A alleles – Just a covariate in a regression framework Dichotomous phenotype: logistic regression Continuous phenotype: linear regression – {BMI}=B 1 {SNP}+ B 2 {Age}+... Further investigate / report top SNPs only Adjust for population stratification... P-values
28
What is population stratification? Balding, Nature Reviews Genetics 2010
29
Adjusting for PC's Li et al., Science 2008
30
Adjusting for PC's Razib, Current Biology 2008
31
Adjusting for PC's Wang, BMC Proc 2009
32
Aside: “random” mating? Sebro, Gen Epi, 2010
33
Multiple comparison correction If you conduct 20 tests at =0.05, one true by chance http://xkcd.com/882/. If you conduct 1 million tests... http://xkcd.com/882/ Correct for multiple comparisons – e.g., Bonferroni, 1 million gives =5x10 -8
34
QQ-plots and PC adjustment Wang, BMC Proc 2009
35
http://cgems.cancer.gov chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24
36
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
37
Imputation of SNP Genotypes Combine data from different platforms (e.g., Affy & Illumina) (for replication / meta-analysis). Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power (impute and analyze all), e.g. Sick sinus syndrome, most significant was 1000 Genomes imputed SNP (Holm et al., Nature Genetics, 2011) HapMap as reference, now 1000 Genomes Project?
38
Imputation Example Li et al., Ann Rev Genom Human Genet, 2009
39
Imputation Example Li et al., Ann Rev Genom Human Genet, 2009
40
Imputation Application Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.
41
Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
42
Replication To replicate: – Association test for replication sample significant at 0.05 alpha level – Same mode of inheritance – Same direction – Sufficient sample size for replication Non-replications not necessarily a false positive – LD structures, different populations (e.g., flip-flop) – covariates, phenotype definition, underpowered
43
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
44
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
45
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, 2009 24,223 smallest P-value!
46
Meta-analysis Combine multiple studies to increase power Either combine p-values (Fisher’s test), or z-scores (better)
47
http://cgems.cancer.gov chromosome (Meta-analysis) Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24
48
Replication & Meta-analysis
49
Meta-analysis
50
Outline GWAS Overview Design Microarray Sequencing Which to use and other censiderations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond
51
Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Breast Cancer Risk 58%: Gail model (# first degree relatives w bc, age menarche, age first live birth, number of previous biopsies) + age, study, entry year 58.9%: SNPs 61.8%: Combined Wacholder et al., NEJM 2010
52
Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal
53
Where's the heritability? Visccher, AJHG 2011
54
Where’s the heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)
55
Where's the heritability? Power & sample size issues? Polygenic models? Gene-gene interactions, gene-environment interactions? Rare variants?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.