Download presentation
Presentation is loading. Please wait.
1
SNP Resources and Applications SeattleSNPs PGA Debbie Nickerson Department of Genome Sciences debnick@u.washington.edu http://pga.gs.washington.edu
2
CasesControls 40% T, 60% C15% T, 85% C C/CC/T C/CC/T C/C C/C C/T C/C C/C C/T C/CC/TC/T C/C Multiple Genes Common Variants Polymorphic Markers > 500,000 -1,000,000 Single Nucleotide Polymorphisms (SNPs) Single Gene Rare Variants ~1,000 Short Tandem Repeat Markers and now 3,000 SNPs Strategies for Genetic Analysis Families Linkage Studies Populations Association Studies Simple Inheritance Complex Inheritance
3
Complex inheritance/disease Variant Gene Disease DiabetesHeart DiseaseSchizophrenia ObesityMultiple SclerosisCeliac Disease CancerAsthma Autism Many Other Genes Environment Two hypotheses: 1- common disease/common variants 2- common disease/many rare variants
4
Genetic Strategy - New Insights allele frequency HIGHLOW effect size WEAK STRONG LINKAGE ASSOCIATION Genome-wide Sequencing Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309 Zondervan & Cardon (2004) Nat. Genet. Rev. 5: 89-100
5
Finding SNPs - Strategies
6
Total sequence variation in humans Population size:6x10 9 (diploid) Mutation rate:2x10 –8 per bp per generation Expected “hits”:240 for each bp - Every variant compatible with life exists in the population BUT most are vanishingly rare in the population! Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature 409:928 - 933 (2001)
7
SNP Discovery: HapMap and others TACGCCTATA TCAAGGAGAT Generate more SNPs: Sources of SNPs: Perlegen SNP data Sequence chromatograms from Celera project HapMap Random Shotgun GTTACGCCAATACAGGATCCAGGAGATTACC Draft Human Genome Genomic DNA (multiple individuals) Sequence and align (reference sequence) Random Shotgun Sequencing dbSNP 127 - 11.8 Million SNPs and 5.7 Million SNPs Validated
8
Finding SNPs: Sequence-based SNP Mining Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC DNASEQUENCINGmRNAcDNALibrary ESTOverlap GenomicBACLibraryRRSLibrary BACOverlapShotgunOverlap RT errors SequencingQuality G C Validated SNPs - two independent discoveries
9
SNP discovery is dependent on your sample population size GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 8 8
10
Candidate Gene Resource
11
SNP Discovery in SeattleSNPs 5’3’ Complete analysis: cSNPs, Linkage Disequilbrium and Haplotype Data Arg-CysVal-Val PCR amplicons Generate SNP data from complete genomic resequencing (i.e., 5’ regulatory, exon, intron, 3’ regulatory sequence)
12
Increasing Sample Size Improves SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) Fraction of SNPs Discovered 2 8 48 24 16 8 96 HapMap Based on ~ 6 chromosomes SeattleSNPs
13
SNPs in the Average Gene Average Gene Size - 25 kb ~ Compare 2 haploid - 1 in 1,000 bp ~150 SNPs (200 bp) - 15,000,000 SNPs ~ 50 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs (33-40%) ~ 5 coding SNPs (half change the amino acid sequence) Crawford et al Ann Rev Genomics Hum Genet 2005;6:287-312
14
SeattleSNPs panel HapMap Integration (~4 million SNPs) = SeattleSNPs discovery (1/188 bp) = HapMap SNPs (~1/1000 bp) High Density Genic Coverage (SeattleSNPs) Low Density Genome Coverage (HapMap)
15
Sequence Variation and the HapMap
16
Summary: The Current State of SNP Resources Random SNP discovery generates many SNPs (HapMap) Random approaches to SNP discovery have reached limits of discovery and validation (~ 50% of the common SNPs) Resequencing approaches continue to catalog important variants (rare and common not captured by the HapMap) SeattleSNPs has generated SNP data across >300 key candidate genes
17
NHLBI - Candidate Genes and Medical Resequencing http://rsng.nhlbi.nih.gov/scripts/index.cfm
18
Typing SNPs: Approaches
19
HapMap Project: Genotype validated SNPs in the dbSNP Genotype SNPs in Four populations: Initially 1 Million -> Now 4 Million CEPH (CEU) (Europe - n = 90, trios) Yoruban (YRI) (Africa - n = 90, trios) Japanese (JPT) (Asian - n = 45) Chinese (HCB) (Asian - n =45) To produce a genome-wide map of common variation
20
Genotyping Adds Value to SNPs HapMap Genotyping Confirms a SNP as “real” and “informative” Determines Minor Allele Frequency (MAF) - - common or rare Determines MAF in different populations Detection of SNP correlations - (Linkage Disequilibrium and Haplotypes)
21
Genotype correlations among SNPs decreases the number of SNPs that need to be genotyped
22
IL1A in Europeans 18.5 kb 50 SNPs Homozygote common Heterozygote Homozygote alternative allele Missing Data 46 common SNPs (> 10%MAF) An Example of SNP Correlation in the Human IL1A Gene Carlson et al. (2004) Am J Hum Genet. 74: 106-120.
23
Threshold LD: r 2 –Bin 1: 22 sites –Bin 2: 18 sites –Bin 3: 5 sites Genotype 1 SNP from each bin TagSNP, chosen for biological intuition or ease of assay design 46 Common SNPs reduces to 3 SNPs - Select one SNP per bin using LDSelect
24
Common Variants - LD (Association) Patterns - Not the same in all genes for all populations All SNPs SNPs > 10% MAF African- American European- American
25
How do I pick TagSNPs?
26
TagSNPs for any gene - Use GVS http://gvs.gs.washington.edu/GVS/
27
TagSNPs in any Gene
28
TagSNPs for a gene for typing multiple populations
30
TagSNPs in a pathway of genes
31
HumanAssociationStudies
32
C-Reactive Protein (CRP) Pentamer belonging to pentraxin family Acute-phase protein produced by the liver in response to cytokine production (IL-6, IL-1, tumor necrosis factor) Non-specific response to inflammation, infection, tissue damage Well designed candidate gene studies have provided significant insights and these have been replicated in genome-wide association studies
33
CRP Analysis CRP is an independent risk factor for CVD CRP levels are heritable (~40% in FHS) Several reported SNPs alter CRP levels
34
tagSNP selection for CRP Synonymous SNP (2667) “Promoter” SNPs (790, 1440) Intron SNP (1919) Downstream SNPs (3872, 5237) 3’ UTR SNP (3006) 6 “cosmopolitan” tagSNPs 1 rare synonymous SNP
35
Association between CRP SNPs and Serum CRP Levels CARDIA - Carlson et al Am J Hum Genet 77: 64-77, 2005 NHANES- Crawford et al Circulation 114: 2458-65, 2006 CHS - Lange et al JAMA 296: 2703-11, 2006 Framingham - Larson et al Circulation 113: 1415-23, 2006 Other - Szalai et al J. Mol Med 83: 440-7, 2005
36
High CRP Associated with SNPs in USF1 Binding Site USF1 (Upstream Stimulating Factor) –Polymorphism at 1421 alters another USF1 binding site 1420 1430 1440 H1-4 gcagctacCACGTGcacccagatggcCACTCGtt H7-8 gcagctacCACGTGcacccagatggcCACTAGtt H5 gcagctacCACGTGcacccagatggcCACTTGtt H6 gcagctacCACATGcacccagatggcCACTTGtt SNP Alters Expression In Vitro Altered Gel Shift in Vitro Genome-wide studies lead to regional and candidate genes studies
37
Genome-Wide Association Studies
38
Genome-Wide Platforms 100,000 or 500,000 Quasi-Random SNPs 100,000, 317,000, 550,000, 650,000Y SNPs Affymetrix Random SNPs Illumina TagSNPs 1 Million Products are here!
39
Genome-wide Tour de force Nature 447: 661-678 Read all the supplemental materials too!
40
Applying HapMap - Will it work? YES!! Hits: Macular Degeneration, Obesity, Cardiac Repolarization, Inflammatory Bowel Disease, Diabetes T1 and T2, Coronary Artery Disease, Rheumatoid Arthritis, Breast Cancer, Colon Cancer ….. - -There are misses as well unclear why - Phenotype, Coverage, Environmental Contexts? Example of a miss - Hypertension -There are lots more hits in these data sets - sample size, low proxy coverage with other SNPs ….. - -Analysis of associations between phenotype(s) and even individual sites is daunting and this will just be the first stage, and this does even consider multi-site interactions
41
How robust are the new genome- wide platforms? How well do they capture common SNPs?
42
LD-based coverage of Sequence Variation MAF > 0.05 Bhangale et al, unpublished
43
How can I get more information about a reference SNP (rs) identified from an association study?
44
http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number
45
Structural Variation
46
Structural Variants Identified in the HapMap Conrad, et al. (Nature Genetics 38:75-81, 2006) Hinds, et al. (Nature Genetics 38:82-85, 2006) McCarroll, et al. (Nature Genetics 38:86-92, 2006) Structural Variation - Large Insertion-Deletion Events ~ 1,500 indels Lots more of them - this was only a start
47
New Variation to Consider - Structural Variation Types of Structural Variants Insertions/Deletions Inversions Duplications Translocations Size: Large-scale (>100 kb) intermediate-scale (500 bp–100 kb) Fine-scale (1–500 bp) More than 10% of the genome sequence Nature 447: 161-165, 2007
48
CEPH Yoruba Japanese & Chinese A Human Genome Structural Variation Project Goal: Complete characterization of normal pattern of structural variation in 62 human genomes Genomes have dense SNP maps (HapMap) Select most genetically diverse individuals 62 additional human genome projects underway Nature 447:161-165, 2007
49
Sequence-Based Resolution of Structural Variation Inversions << Insertion >< Deletion >< Concordant >< Build35 Fosmid Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage) Human Genomic DNA Genomic Library (1 million clones) Sequence ends of genomic inserts & Map to human genome
50
Kidd, Cooper, and Eichler - unpublished
51
Detection of Indels in Genotype Data X-linked SNP Unknown indel Carlson et al, Hum. Mol. Genet. 15: 1931-1937, 2006
52
http://gvs.gs.washington.edu/GVS/ Searching for Genomic Information with an RS number
53
DNA Sequencing the ultimate genotyping platform?
54
Rare Variant Versus Common Variant Both could play a role Rare Variant - Sequence Individuals Common Variants - Genotype a Smaller Set of Variants to Explore Correlations
55
High Density Lipoprotein (HDL) Sequencing Known Candidate Genes for Functional Variation From Individuals at the Tails of the Trait Distribution Low HDL High HDL Individuals
56
ABCA1 and HDL-C Observed excess of rare, nonsynonymous variants in low HDL-C samples at ABCA1 Demonstrated functional relevance in cell culture – –Cohen et al, Science 305, 869-872, 2004 Many examples emerging Common Disease Rare Variants
57
Personalized Human Genome Sequencing Solexa - an example
58
New Technologies 1 Gigabyte of Sequence Problem is to Target - Genes or Regions Short reads - 30-35bp - quality? Variation discovery needs ~ 20-fold coverage Needs to be fairly uniform Provide 30-50 Mb of baseline
59
Human Genome Variation - Summary SeattleSNPs and HapMap - Common variation sources - SeattleSNPs offers insights into coverage New Genotyping Platforms - Very successful but more coverage will be coming Many genome associations are being identified regions Other variants of interest emerging - structural variation Paradigm Shift in Sequencing Technology
60
Acknowledgements UW Mark Rieder Alex Reiner Greg Cooper Peggy Robertson Tushar Bhangale FHCRC Chris Carlson Vanderbilt Dana Crawford Stanford Shelley Force-Aldred Rick Myers CARDIA David Siscovick Dale Williams Beth Lewis Kiang Liu Carlos Irribaren Myriam Fornage Cashell Jaquish Eric Boerwinkle NHLBI - SeattleSNPs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.