Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.

Similar presentations


Presentation on theme: "Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association."— Presentation transcript:

1 Genomewide Association Studies

2  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association –Linkage Disequilibrium  4. SNP selection, Coverage, Study Designs  5. Genotyping Platforms  6. Early (recent) GWA Studies

3 Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage

4 Risch and Merikangas 1996

5 Sample Size Required  Linkage Analysis with affected sib pairs  Transmission Disequilbrium Test (TDT)  TDT with affected sib pairs

6 Affected Sib Pair Linkage Analysis  2 siblings/family  Both sibs affected  IBD at the marker locus  Expect 50% on average

7 Identity By Descent AA AaaAAAaa Sibling 1 2 11 0 Alleles IBD Frequency 2 25% 25% 1 50% 50% 0 25% 25%

8 Identity By Descent Alleles IBD Frequency 2 25% 25% 1 50% 50% 0 25% 25% Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing

9 Risch and Merikangas 1996

10 Sample Size Calculation Effect Size Exposure Frequency Identity By Descent (IBD M ) Sample Size Required

11 Sample Size Calculation Effect Size Exposure Frequency Identity By Descent (IBD M ) Sample Size Required High IBD sharing Low IBD sharing

12 TDT Transmitted alleles vs. non-transmitted alleles M 1 M 2 M 2 M 1 M 2

13 TDT Transmitted alleles vs. non-transmitted alleles Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M1 n 11 n 12 M2M2M2M2 n 21 n 22 TDT = (n 12 - n 21 ) 2 (n 12 + n 21 ) Asymptotically  2 with 1 degree of freedom

14 TDT Transmitted alleles vs. non-transmitted alleles M 1 M 2 M 2 M 1 M 2

15 TDT For this one Trio: Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M101 M2M2M2M201 TDT = (1 - 0) 2 (1 + 0) = 1 p-value = 0.32

16 TDT For one hundred Trios: Non-Transmitted Allele Transmitted M1M1M1M1 M2M2M2M2 M1M1M1M115050 M2M2M2M245155 TDT = (50 - 45) 2 (50 + 45) = 6.58 p-value = 0.01

17 Risch and Merikangas 1996 TDT

18  Linkage –Good for Large Effect Sizes  Genomewide Association –Good for Modest Effect Sizes –Not good for rare disease alleles

19 Two Hypotheses  Common Disease-Common Variant –Common variants –Small to modest effects  Rare Variant –Rare variants –Larger effects

20 Allele Frequency and Sample Size

21 GWA Issues  Cost –Sample Size  Effect Size  Disease Allele Frequency  Multiple Testing –SNP selection  How many?  Which SNPs?  Available Genotyping Platforms

22 Types of Variants  Single Nucleotide Polymorphism (SNP)  Insertion/Deletion (indel)  Microsatellite or Short Tandem Repeat (STR)

23 What is a SNP? AAGTCAGTCTAGGATCGGG TTCAGTCAGATCCTAGCCC TTCAGTCAGATCCCAGCCC AAGTCAGTCTAGGGTCGGG Chromosome 1 Chromosome 2 SNP

24 What is an insertion/deletion? AAGTCAGTCTAGGATCGGG TTCAGTCAGATCCTAGCCC TTCAGTCAGATCCCTAGCCC AAGTCAGTCTAGGGATCGGG Chromosome 1 Chromosome 2 Insertion/Deletion

25 What is an microsatellite? AAGTGTCGTCGTCGTCTCGGG TTCACAGCAGCAGCAGAGCCC TTCACAGCAGCAGAGCCC AAGTGTCGTCGTCTCGGG Chromosome 1 Chromosome 2 3 vs. 4 trinucleotide repeats

26 Relative frequency of each type of variant

27 The Number of SNPs in the Human Genome

28 How many SNPs?  6 billion humans  12 billion chromosomes  1% frequency SNP  120 million copies of the minor allele

29 Ethnic/Racial Variation in SNP frequency

30 Rare SNPs across populations

31 How many of these SNPs have we found?  dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/ –10,430,753 SNPs –4,868,126 are “validated”

32 What Risch and Merikangas proposed:  5 genetic polymorphisms per gene  100,000 genes (1996)  = 500,000 genotypes per subject  Candidate Gene Study Design –All genes are candidates  Direct or Sequence-based approach –Causal variant is one of the variants tested

33 Direct vs. Indirect Sequence-based vs. Map-based

34 Indirect Association relies on LD Decay  Variants that are close will have high LD  Variants that are far apart will have low LD  Indirect Association is a form of Positional Cloning

35 LD Decay E(D t ) = D 1 * (1-  ) t where D t is the current amount of LD and t is the number of generations t is the number of generations If  = 0.5, LD decays at a rate of 50% per generation If  < 0.5, LD decay is slower

36 LD Decay over time

37 Observed LD Decay

38 Linkage Disequilibrium ABABABAB abababab AbAbAbAb aBaBaBaB r 2 = (pAB*pab – pAb*paB) 2 pA * pa * pB * pb

39 Indirect Association and LD  Sample size required for Direct Association, n  Sample size for Indirect Association =n/ r 2  For r 2 = 0.8, increase is 25%  For r 2 = 0.5, increase is 100%

40 Coverage  Percent of all SNPs captured by genotyped SNPs  More genotyped SNPs = better coverage

41 Diminishing Marginal Returns (Wang and Todd 2003) r 2 = 0.5 r 2 = 0.8 600,000 SNPs 1,500,000 SNPs

42 Number of SNPs needed to capture all SNPs  Depends on: –Population studied –Minor allele frequency of causal SNP –Level of LD (r 2 ) used as a cutoff  1.4 million selected SNPs for –Caucasians/Asians –5% and above –r 2 = 0.8

43 The HapMap Project  Initial Goal: –600,000 SNPs for indirect association –LD information between SNPs  Phase 1: 1 million SNPs  Phase 2: additional 2.9 million SNPs

44 HapMap  270 subjects  45 Chinese  45 Japanese  90 Yoruban and 90 European-American –30 Trios –2 parents, 1 child

45 HapMap  SNPs from dbSNP were genotyped  Looked for 1 every 5kb  SNP Validation –Polymorphic –Frequency  Haplotype Estimation –Haplotype tagging SNPs

46 Haplotype Tagging

47 Two approaches  Positional cloning –expand LD mapping to entire genome –Tool: HapMap SNPs  Candidate gene or Gene-based –Expand the number of genes to all genes –25,000 genes –Tools: jSNPs, SeattleSNPs, NIEHSSNPs

48 Genome-wide Association LD Based Gene Based

49 Potentially Functional Regions of a Gene cis regulator ? Amino acid coding RNA processing Transcription regulation promoter

50 Comparison of Gene-based and Positional Cloning Designs  Positional Cloning –Agnostic (no biological knowledge needed) –Regulatory regions –SNP sets currently incomplete –Expensive  Gene-based –Efficient: Less SNPs need to be genotyped –May miss regulatory regions –Not all SNPs are known

51 Genotyping Platforms  Affymetrix 500K –Randomly distributed SNPs  Illumina 250K –“Gene-based”  Parallele 20K –Nonsynonymous SNPs –code for an amino acid change

52 Multistage Study Designs

53 1,2,3,………………………,N 1,2,3,……………………………, M SNPs Samples One-Stage Design Stage 1 Stage 2  samples  markers Two-Stage Design 1,2,3,……………………………, M SNPs Samples 1,2,3,………………………,N One- and Two-Stage GWA Designs

54 SNPs Samples Replication-based analysis SNPs Samples Stage 1 Stage 2 One-Stage Design Joint analysis SNPs Samples Stage 1 Stage 2 Two-Stage Design

55 Multistage Designs  Joint analysis is more power than replication  p-value in Stage 1 must be liberal  Lower cost—do not gain power

56 GWA studies have been published  Myocardial Infarction –65K Gene-based SNPs  Age related Macular Degeneration –Affymetrix 100K  Parkinson’s Disease –Perlegen 200K chip –1,793 SNPs in second stage

57 Myocardial Infarction

58 Functional SNP approach

59 Myocardial Infarction

60 Macular Degeneration

61

62 Macular Degeneration Results

63 Macular Degeneration

64  Small Sample  Sparse SNP set  Large Effect Size  High Minor Allele Frequency (>20%)  Under a previous linkage peak  Missed other loci

65 Parkinson’s Disease

66  Tier 1: –443 Discordant sib pairs –198,345 SNPs  Tier 2: –332 case-control pairs –1,793 SNPs –11 positives at p < 0.01 –Expect 18 positives under the Null

67 Parkinson’s second stage


Download ppt "Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association."

Similar presentations


Ads by Google