Presentation is loading. Please wait.

Presentation is loading. Please wait.

SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences

Similar presentations


Presentation on theme: "SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences"— Presentation transcript:

1 SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu http://pga.gs.washington.edu

2 CasesControls 40% T, 60% C15% T, 85% C C/CC/T C/CC/T C/C C/C C/T C/C C/C C/T C/CC/TC/T C/C Multiple Genes Common Variants Polymorphic Markers > 500,000 -1,000,000 Single Nucleotide Polymorphisms (SNPs) Single Gene Rare Variants ~1,000 Short Tandem Repeat Markers and now 3,000 SNPs Strategies for Genetic Analysis Families Linkage Studies Populations Association Studies Simple Inheritance Complex Inheritance

3 Complex inheritance/disease Variant Gene Disease DiabetesHeart DiseaseSchizophrenia ObesityMultiple SclerosisCeliac Disease CancerAsthma Autism Many Other Genes Environment Two hypotheses: 1- common disease/common variants 2- common disease/many rare variants

4 Genetic Strategy - New Insights allele frequency HIGHLOW effect size WEAK STRONG LINKAGE ASSOCIATION Genome-wide Sequencing Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309 Zondervan & Cardon (2004) Nat. Genet. Rev. 5: 89-100

5 Finding SNPs - Strategies

6 Total sequence variation in humans Population size:6x10 9 (diploid) Mutation rate:2x10 –8 per bp per generation Expected “hits”:240 for each bp - Every variant compatible with life exists in the population BUT most are vanishingly rare in the population! Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature 409:928 - 933 (2001)

7 SNP Discovery: HapMap and others TACGCCTATA TCAAGGAGAT Generate more SNPs: Sources of SNPs: Perlegen SNP data Sequence chromatograms from Celera project HapMap Random Shotgun GTTACGCCAATACAGGATCCAGGAGATTACC Draft Human Genome Genomic DNA (multiple individuals) Sequence and align (reference sequence) Random Shotgun Sequencing dbSNP 127 - 11.8 Million SNPs and 5.7 Million SNPs Validated

8 Finding SNPs: Sequence-based SNP Mining Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC DNASEQUENCINGmRNAcDNALibrary ESTOverlap GenomicBACLibraryRRSLibrary BACOverlapShotgunOverlap RT errors SequencingQuality G C Validated SNPs - two independent discoveries

9 SNP discovery is dependent on your sample population size GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 8 8

10 Candidate Gene Resource

11 SNP Discovery in SeattleSNPs 5’3’ Complete analysis: cSNPs, Linkage Disequilbrium and Haplotype Data Arg-CysVal-Val PCR amplicons Generate SNP data from complete genomic resequencing (i.e., 5’ regulatory, exon, intron, 3’ regulatory sequence)

12 Increasing Sample Size Improves SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 8 48 24 16 8 96 HapMap Based on ~ 6 chromosomes SeattleSNPs

13 SNPs in the Average Gene Average Gene Size - 25 kb ~ Compare 2 haploid - 1 in 1,000 bp ~150 SNPs (200 bp) - 15,000,000 SNPs ~ 50 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs (33-40%) ~ 5 coding SNPs (half change the amino acid sequence) Crawford et al Ann Rev Genomics Hum Genet 2005;6:287-312

14 SeattleSNPs panel HapMap Integration (~4 million SNPs) = SeattleSNPs discovery (1/188 bp) = HapMap SNPs (~1/1000 bp) High Density Genic Coverage (SeattleSNPs) Low Density Genome Coverage (HapMap)

15 Sequence Variation and the HapMap

16 Summary: The Current State of SNP Resources  Random SNP discovery generates many SNPs (HapMap)  Random approaches to SNP discovery have reached limits of discovery and validation (~ 50% of the common SNPs)  Resequencing approaches continue to catalog important variants (rare and common not captured by the HapMap)  SeattleSNPs has generated SNP data across >300 key candidate genes

17 NHLBI - Candidate Genes and Medical Resequencing http://rsng.nhlbi.nih.gov/scripts/index.cfm

18 Typing SNPs: Approaches

19 HapMap Project: Genotype validated SNPs in the dbSNP Genotype SNPs in Four populations: Initially 1 Million -> Now 4 Million CEPH (CEU) (Europe - n = 90, trios) Yoruban (YRI) (Africa - n = 90, trios) Japanese (JPT) (Asian - n = 45) Chinese (HCB) (Asian - n =45) To produce a genome-wide map of common variation

20 Genotyping Adds Value to SNPs HapMap Genotyping Confirms a SNP as “real” and “informative” Determines Minor Allele Frequency (MAF) - - common or rare Determines MAF in different populations Detection of SNP correlations - (Linkage Disequilibrium and Haplotypes)

21 Genotype correlations among SNPs decreases the number of SNPs that need to be genotyped

22 IL1A in Europeans 18.5 kb 50 SNPs Homozygote common Heterozygote Homozygote alternative allele Missing Data 46 common SNPs (> 10%MAF) An Example of SNP Correlation in the Human IL1A Gene Carlson et al. (2004) Am J Hum Genet. 74: 106-120.

23 Threshold LD: r 2 –Bin 1: 22 sites –Bin 2: 18 sites –Bin 3: 5 sites Genotype 1 SNP from each bin TagSNP, chosen for biological intuition or ease of assay design 46 Common SNPs reduces to 3 SNPs - Select one SNP per bin using LDSelect

24 Common Variants - LD (Association) Patterns - Not the same in all genes for all populations All SNPs SNPs > 10% MAF African- American European- American

25 How do I pick TagSNPs?

26 TagSNPs for any gene - Use GVS http://gvs.gs.washington.edu/GVS/

27 TagSNPs in any Gene

28 TagSNPs for a gene for typing multiple populations

29

30 TagSNPs in a pathway of genes

31 HumanAssociationStudies

32 C-Reactive Protein (CRP) Pentamer belonging to pentraxin family Acute-phase protein produced by the liver in response to cytokine production (IL-6, IL-1, tumor necrosis factor) Non-specific response to inflammation, infection, tissue damage Well designed candidate gene studies have provided significant insights and these have been replicated in genome-wide association studies

33 CRP Analysis CRP is an independent risk factor for CVD CRP levels are heritable (~40% in FHS) Several reported SNPs alter CRP levels

34 tagSNP selection for CRP Synonymous SNP (2667) “Promoter” SNPs (790, 1440) Intron SNP (1919) Downstream SNPs (3872, 5237) 3’ UTR SNP (3006) 6 “cosmopolitan” tagSNPs 1 rare synonymous SNP

35 Association between CRP SNPs and Serum CRP Levels CARDIA - Carlson et al Am J Hum Genet 77: 64-77, 2005 NHANES- Crawford et al Circulation 114: 2458-65, 2006 CHS - Lange et al JAMA 296: 2703-11, 2006 Framingham - Larson et al Circulation 113: 1415-23, 2006 Other - Szalai et al J. Mol Med 83: 440-7, 2005

36 High CRP Associated with SNPs in USF1 Binding Site USF1 (Upstream Stimulating Factor) –Polymorphism at 1421 alters another USF1 binding site 1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5 gcagctacCACGTGcacccagatggcCACTTGtt H6 gcagctacCACATGcacccagatggcCACTTGtt SNP Alters Expression In Vitro Altered Gel Shift in Vitro Genome-wide studies lead to regional and candidate genes studies

37 Genome-Wide Association Studies

38 Genome-Wide Platforms 100,000 or 500,000 Quasi-Random SNPs 100,000, 317,000, 550,000, 650,000Y SNPs Affymetrix Random SNPs Illumina TagSNPs 1 Million Products are here!

39 Genome-wide Tour de force Nature 447: 661-678 Read all the supplemental materials too!

40 Applying HapMap - Will it work? YES!! Hits: Macular Degeneration, Obesity, Cardiac Repolarization, Inflammatory Bowel Disease, Diabetes T1 and T2, Coronary Artery Disease, Rheumatoid Arthritis, Breast Cancer, Colon Cancer ….. - -There are misses as well unclear why - Phenotype, Coverage, Environmental Contexts? Example of a miss - Hypertension -There are lots more hits in these data sets - sample size, low proxy coverage with other SNPs ….. - -Analysis of associations between phenotype(s) and even individual sites is daunting and this will just be the first stage, and this does even consider multi-site interactions

41 How robust are the new genome- wide platforms? How well do they capture common SNPs?

42 LD-based coverage of Sequence Variation MAF > 0.05 Bhangale et al, unpublished

43 How can I get more information about a reference SNP (rs) identified from an association study?

44 http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number

45 Structural Variation

46 Structural Variants Identified in the HapMap Conrad, et al. (Nature Genetics 38:75-81, 2006) Hinds, et al. (Nature Genetics 38:82-85, 2006) McCarroll, et al. (Nature Genetics 38:86-92, 2006) Structural Variation - Large Insertion-Deletion Events ~ 1,500 indels Lots more of them - this was only a start

47 New Variation to Consider - Structural Variation Types of Structural Variants Insertions/Deletions Inversions Duplications Translocations Size: Large-scale (>100 kb) intermediate-scale (500 bp–100 kb) Fine-scale (1–500 bp) More than 10% of the genome sequence Nature 447: 161-165, 2007

48 CEPH Yoruba Japanese & Chinese A Human Genome Structural Variation Project Goal: Complete characterization of normal pattern of structural variation in 62 human genomes Genomes have dense SNP maps (HapMap) Select most genetically diverse individuals 62 additional human genome projects underway Nature 447:161-165, 2007

49 Sequence-Based Resolution of Structural Variation Inversions << Insertion >< Deletion >< Concordant >< Build35 Fosmid Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage) Human Genomic DNA Genomic Library (1 million clones) Sequence ends of genomic inserts & Map to human genome

50 Kidd, Cooper, and Eichler - unpublished

51 Detection of Indels in Genotype Data X-linked SNP Unknown indel Carlson et al, Hum. Mol. Genet. 15: 1931-1937, 2006

52 http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number

53 DNA Sequencing the ultimate genotyping platform?

54 Rare Variant Versus Common Variant Both could play a role Rare Variant - Sequence Individuals Common Variants - Genotype a Smaller Set of Variants to Explore Correlations

55 High Density Lipoprotein (HDL) Sequencing Known Candidate Genes for Functional Variation From Individuals at the Tails of the Trait Distribution Low HDL High HDL Individuals

56 ABCA1 and HDL-C Observed excess of rare, nonsynonymous variants in low HDL-C samples at ABCA1 Demonstrated functional relevance in cell culture – –Cohen et al, Science 305, 869-872, 2004 Many examples emerging Common Disease Rare Variants

57 Personalized Human Genome Sequencing Solexa - an example

58 New Technologies 1 Gigabyte of Sequence Problem is to Target - Genes or Regions Short reads - 30-35bp - quality? Variation discovery needs ~ 20-fold coverage Needs to be fairly uniform Provide 30-50 Mb of baseline

59 Human Genome Variation - Summary SeattleSNPs and HapMap - Common variation sources - SeattleSNPs offers insights into coverage New Genotyping Platforms - Very successful but more coverage will be coming Many genome associations are being identified regions Other variants of interest emerging - structural variation Paradigm Shift in Sequencing Technology

60 Acknowledgements UW Mark Rieder Alex Reiner Greg Cooper Peggy Robertson Tushar Bhangale FHCRC Chris Carlson Vanderbilt Dana Crawford Stanford Shelley Force-Aldred Rick Myers CARDIA David Siscovick Dale Williams Beth Lewis Kiang Liu Carlos Irribaren Myriam Fornage Cashell Jaquish Eric Boerwinkle NHLBI - SeattleSNPs


Download ppt "SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences"

Similar presentations


Ads by Google