Download presentation
Presentation is loading. Please wait.
Published byLesley Stinchcomb Modified over 9 years ago
1
Genome-wide Association Studies John S. Witte
2
Association Studies Hirschhorn & Daly, Nat Rev Genet 2005 Candidate Gene or GWAS
3
Affymetrix Array Genome-wide Association Studies Altshuler & Clark, Science 2005
4
Genome-wide Assocation Studies (GWAS)
5
GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time
6
GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+ Discovery: Multi-stage GWAS+ # Markers # Samples Time
7
1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2 samples markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs
8
SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design
9
Multistage Designs Joint analysis has more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
10
QC Steps Filter SNPs and Individuals – MAF, Low call rates Test for HWE among controls & within ethnic groups. Use conservative alpha-level Check for relatedness. Identity-by-state calculations.
11
Analysis of GWAS Most common approach: look at each SNP one-at-a-time. Possibly add in multi-marker information. Further investigate / report top SNPs only. Or backwards replication… P-values
12
GWAS Analysis Most commonly trend test. Log additive model, logistic regression. Adjust for potential population stratification.
13
Quantile-Quantile (QQ) Plot
14
http://cgems.cancer.gov chromosome Example: GWAS of Prostate Cancer Witte, Nat Genet 2007 Multiple prostate cancer loci on 8q24
15
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
16
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 -29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
17
LocusA FreqAssociation Chr RegSNPCntrlCaseORp valueNearby Genes / Fcn 2p15rs721048G/A0.190.211.157.7x10 -9 EHBP1: endocytic trafficking 3p12rs2660753C/T0.100.121.302.7x10 -8 Intergenic 6q25rs9364554C/T0.290.331.215.5x10 -10 SLC22A3: drugs and toxins. 7q21rs6465657T/C0.460.501.191.1x10 -9 LMTK2: endosomal trafficking 8q24 (2)rs16901979C/A0.040.061.521.1x10 -12 Intergenic 8q24 (3)rs6983267T/G0.500.561.259.4x10 -13 Intergenic 8q24 (1)rs1447295C/A0.100.141.426.4x10 -18 Intergenic 10q11rs10993994C/T0.380.461.388.7x10 - 29 MSMB: suppressor prop. 10q26rs4962416T/C0.270.321.182.7x10 -8 CTBP2: antiapoptotic activity 11q13rs7931342T/G0.510.561.211.7x10 -12 Intergenic 17q12rs4430796G/A0.490.551.221.4x10 -11 HNF1B: suppressor properties 17q24rs1859962T/G0.460.511.202.5x10 -10 Intergenic 19q13rs2735839A/G0.830.871.371.5x10 -18 KLK2/KLK3: PSA Xp11rs5945619T/C0.360.411.291.5x10 -9 NUDT10, NUDT11: apoptosis SNPs Missed in Replication? Witte, Nat Rev Genet, 2009 24,223 smallest P-value!
18
Manolio et al. Clin Invest 2008www.genome.gov/gwastudies Prostate Cancer
19
Population Attributable Risks for GWAS Jorgenson & Witte, 2009 Smoking & lung cancer BRCA1 & Breast cancer
20
Limitations of GWAS Not very predictive Witte, Nat Rev Genet 2009 Example: AUC for Br Cancer Risk Gail = 58% SNPs = 58.9% G + S = 61.8% Wacholder et al. NEJM 2010
21
Limitations of GWAS Not very predictive Explain little heritability Focus on common variation Many associated variants are not causal
22
Where’s the Heritability? McCarthy et al., 2008 Many more of these? See: NEJM, April 30, 2009 Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity)
23
Will GWAS results explain more heritability? Possibly, if… 1.Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). 2.Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.
24
Imputation of SNP Genotypes Estimate unmeasured or missing genotypes. Based on measured SNPs and external info (e.g., haplotype structure of HapMap). Increase GWAS power. Allow for combining data across different platforms (e.g., Affy & Illumina) (for replication / meta- analysis).
25
Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis
26
Identify Match with Reference Gonçalo Abecasis
27
Phase chromosomes, impute missing genotypes Gonçalo Abecasis http://www.sph.umich.edu/csg/abecasis/MACH
28
Imputation Application Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red.
29
Genome-wide Sequence Studies Trade off between number of samples, depth, and genomic coverage. MAF Sample SizeDepth0.5-1%2-5% 1,00020xperfect 2,00010xr 2 =0.98r 2 =0.995 4,0005xr 2 =0.90r 2 =0.98 Goncalo Abecasis
30
Near-term Design Choices For example, between: 1.Sequencing few subjects with extreme phenotypes: e.g., 200 cases, 200 controls, 4x coverage. Then follow- up in larger population. 2.10M SNP chip based on 1,000 genomes. 5K cases, 5K controls. Which design will work best…?
31
Many weak associations combine to risk? Score model: where – ln(OR i ) = ‘score’ for SNP i from ‘discovery’ sample – SNP ij = # of alleles (0,1,2) for SNP i, person j in ‘validation’ sample. – Large number of SNPs (m) x j associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009
32
Purcell / ISC et al. Nature 2009 Application of Model
33
Application to CGEMs PCa GWAS 1,172 cases, 1,157 controls from PLCO Trial Oversampled more aggressive cases. Illumina 550K array. PCa & stratified by disease aggressiveness. Split into halves, resampling: – one as ‘discovery’ sample; – other as ‘validation’. LD filter: r 2 = 0.5. Witte & Hoffman 2010
34
Results for Prostate Cancer
35
Nat Rev Cancer 2010;10:205-212 Common Polygenic Model for Prostate and Breast Cancer? - CGEMs GWAS data on prostate and breast cancer. - Use one cancer as ‘discovery’ sample, the other as ‘validation’.
36
Results for PCa & BrCa
37
Complex diseases Diabetes Obesity Diet Physical activity Hypertension Hyperlipidemia Vulnerable plaques Atherosclerosis MI Genetic susceptibility Complex diseases: Many causes = many causal pathways!
38
Pathways Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. Example: BioCarta: http://www.biocarta.com/http://www.biocarta.com/ May be interested in potential joint and/or interaction effects of multiple genes in one pathway.
39
Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression). Systems Biology
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.