Download presentation
Presentation is loading. Please wait.
1
Genomewide Association Studies
2
1. History –Linkage vs. Association –Power/Sample Size 2. Human Genetic Variation: SNPs 3. Direct vs. Indirect Association –Linkage Disequilibrium 4. SNP selection, Coverage, Study Designs 5. Genotyping Platforms 6. Early (recent) GWA Studies
3
Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage
4
Risch and Merikangas 1996
5
Sample Size Required Linkage Analysis with affected sib pairs Transmission Disequilbrium Test (TDT) TDT with affected sib pairs
6
Affected Sib Pair Linkage Analysis 2 siblings/family Both sibs affected IBD at the marker locus Expect 50% on average
7
Identity By Descent AA AaaAAAaa Sibling 1 2 11 0 Alleles IBD Frequency 2 25% 25% 1 50% 50% 0 25% 25%
8
Identity By Descent Alleles IBD Frequency 2 25% 25% 1 50% 50% 0 25% 25% Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing
9
Risch and Merikangas 1996
10
Sample Size Calculation Effect Size Exposure Frequency Identity By Descent (IBD M ) Sample Size Required
11
Sample Size Calculation Effect Size Exposure Frequency Identity By Descent (IBD M ) Sample Size Required High IBD sharing Low IBD sharing
12
TDT Transmitted alleles vs. non-transmitted alleles M 1 M 2 M 2 M 1 M 2
13
TDT Transmitted alleles vs. non-transmitted alleles Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M1 n 11 n 12 M2M2M2M2 n 21 n 22 TDT = (n 12 - n 21 ) 2 (n 12 + n 21 ) Asymptotically 2 with 1 degree of freedom
14
TDT Transmitted alleles vs. non-transmitted alleles M 1 M 2 M 2 M 1 M 2
15
TDT For this one Trio: Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M101 M2M2M2M201 TDT = (1 - 0) 2 (1 + 0) = 1 p-value = 0.32
16
TDT For one hundred Trios: Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M115050 M2M2M2M245155 TDT = (50 - 45) 2 (50 + 45) = 6.58 p-value = 0.01
17
Risch and Merikangas 1996 TDT
18
Linkage –Good for Large Effect Sizes Genomewide Association –Good for Modest Effect Sizes –Not good for rare disease alleles
19
Two Hypotheses Common Disease-Common Variant –Common variants –Small to modest effects Rare Variant –Rare variants –Larger effects
20
Allele Frequency and Sample Size
21
GWA Issues Cost –Sample Size Effect Size Disease Allele Frequency Multiple Testing –SNP selection How many? Which SNPs? Available Genotyping Platforms
22
Types of Variants Single Nucleotide Polymorphism (SNP) Insertion/Deletion (indel) Microsatellite or Short Tandem Repeat (STR)
23
What is a SNP? AAGTCAGTCTAGGATCGGG TTCAGTCAGATCCTAGCCC TTCAGTCAGATCCCAGCCC AAGTCAGTCTAGGGTCGGG Chromosome 1 Chromosome 2 SNP
24
What is an insertion/deletion? AAGTCAGTCTAGGATCGGG TTCAGTCAGATCCTAGCCC TTCAGTCAGATCCCTAGCCC AAGTCAGTCTAGGGATCGGG Chromosome 1 Chromosome 2 Insertion/Deletion
25
What is an microsatellite? AAGTGTCGTCGTCGTCTCGGG TTCACAGCAGCAGCAGAGCCC TTCACAGCAGCAGAGCCC AAGTGTCGTCGTCTCGGG Chromosome 1 Chromosome 2 3 vs. 4 trinucleotide repeats
26
Relative frequency of each type of variant
27
The Number of SNPs in the Human Genome
28
How many SNPs? 6 billion humans 12 billion chromosomes 1% frequency SNP 120 million copies of the minor allele
29
Ethnic/Racial Variation in SNP frequency
30
Rare SNPs across populations
31
How many of these SNPs have we found? dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/ –10,430,753 SNPs –4,868,126 are “validated”
32
What Risch and Merikangas proposed: 5 genetic polymorphisms per gene 100,000 genes (1996) = 500,000 genotypes per subject Candidate Gene Study Design –All genes are candidates Direct or Sequence-based approach –Causal variant is one of the variants tested
33
Direct vs. Indirect Sequence-based vs. Map-based
34
Indirect Association relies on LD Decay Variants that are close will have high LD Variants that are far apart will have low LD Indirect Association is a form of Positional Cloning
35
LD Decay E(D t ) = D 1 * (1- ) t where D t is the current amount of LD and t is the number of generations t is the number of generations If = 0.5, LD decays at a rate of 50% per generation If < 0.5, LD decay is slower
36
LD Decay over time
37
Observed LD Decay
38
Linkage Disequilibrium ABABABAB abababab AbAbAbAb aBaBaBaB r 2 = (pAB*pab – pAb*paB) 2 pA * pa * pB * pb
39
Indirect Association and LD Sample size required for Direct Association, n Sample size for Indirect Association =n/ r 2 For r 2 = 0.8, increase is 25% For r 2 = 0.5, increase is 100%
40
Coverage Percent of all SNPs captured by genotyped SNPs More genotyped SNPs = better coverage
41
Diminishing Marginal Returns (Wang and Todd 2003) r 2 = 0.5 r 2 = 0.8 600,000 SNPs 1,500,000 SNPs
42
Number of SNPs needed to capture all SNPs Depends on: –Population studied –Minor allele frequency of causal SNP –Level of LD (r 2 ) used as a cutoff 1.4 million selected SNPs for –Caucasians/Asians –5% and above –r 2 = 0.8
43
The HapMap Project Initial Goal: –600,000 SNPs for indirect association –LD information between SNPs Phase 1: 1 million SNPs Phase 2: additional 2.9 million SNPs
44
HapMap 270 subjects 45 Chinese 45 Japanese 90 Yoruban and 90 European-American –30 Trios –2 parents, 1 child
45
HapMap SNPs from dbSNP were genotyped Looked for 1 every 5kb SNP Validation –Polymorphic –Frequency Haplotype Estimation –Haplotype tagging SNPs
46
Haplotype Tagging
47
Two approaches Positional cloning –expand LD mapping to entire genome –Tool: HapMap SNPs Candidate gene or Gene-based –Expand the number of genes to all genes –25,000 genes –Tools: jSNPs, SeattleSNPs, NIEHSSNPs
48
Genome-wide Association LD Based Gene Based
49
Potentially Functional Regions of a Gene cis regulator ? Amino acid coding RNA processing Transcription regulation promoter
50
Comparison of Gene-based and Positional Cloning Designs Positional Cloning –Agnostic (no biological knowledge needed) –Regulatory regions –SNP sets currently incomplete –Expensive Gene-based –Efficient: Less SNPs need to be genotyped –May miss regulatory regions –Not all SNPs are known
51
Genotyping Platforms Affymetrix 500K –Randomly distributed SNPs Illumina 250K –“Gene-based” Parallele 20K –Nonsynonymous SNPs –code for an amino acid change
52
Multistage Study Designs
53
1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2 samples markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs
54
SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design
55
Multistage Designs Joint analysis is more power than replication p-value in Stage 1 must be liberal Lower cost—do not gain power
56
GWA studies have been published Myocardial Infarction –65K Gene-based SNPs Age related Macular Degeneration –Affymetrix 100K Parkinson’s Disease –Perlegen 200K chip –1,793 SNPs in second stage
57
Myocardial Infarction
58
Functional SNP approach
59
Myocardial Infarction
60
Macular Degeneration
62
Macular Degeneration Results
63
Macular Degeneration
64
Small Sample Sparse SNP set Large Effect Size High Minor Allele Frequency (>20%) Under a previous linkage peak Missed other loci
65
Parkinson’s Disease
66
Tier 1: –443 Discordant sib pairs –198,345 SNPs Tier 2: –332 case-control pairs –1,793 SNPs –11 positives at p < 0.01 –Expect 18 positives under the Null
67
Parkinson’s second stage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.