Download presentation
Presentation is loading. Please wait.
2
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson debnick@u.washington.eduSeattleSNPs
3
Complex inheritance/disease Variant Gene Disease DiabetesHeart DiseaseSchizophrenia ObesityMultiple SclerosisCeliac Disease CancerAsthma Autism Many Other Genes Environment Two hypotheses: 1- common disease/common variant? 2- common disease/many rare variants?
4
Copy-Number Variants Genomic Variation Frequency Size Single Nucleotide Polymorphisms Small indels cytogenetic structural variation duplications deletions insertions inversions Human Genetic Variation Gene-rich, eg immune response, drug metabolism Abundant 1 bp1 chr
5
Total sequence variation in humans Population size:6x10 9 (diploid) Mutation rate:2x10 –8 per bp per generation Expected “hits”:240 for each bp Every variant compatible with life exists in the population BUT: Most are vanishingly rare Compare 2 haploid genomes: 1 SNP per 1331 bp* *The International SNP Map Working Group, Nature 409:928 - 933 (2001)
6
Building Maps of Single Nucleotide Polymorphisms (SNPs) ATTCGGCATGAA ATTCGGGATGAA Developed in two overlapping phases: 1) SNP Discovery 2) SNP Genotyping
7
Finding SNPs: Sequence-based SNP Mining RANDOM Sequence Overlap - SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC Genomic RRSLibrary ShotgunOverlap BACLibrary BACOverlap DNA SEQUENCINGmRNAcDNALibrary ESTOverlap RandomShotgun Align to Reference > 11 Million SNPs G C Validated - 5.6 MILLON SNPS
8
Increasing Sample Size Improves SNP Discovery GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC { 2 chromosomes 0.00.20.30.40.50.1 0.0 0.5 1.0 Minor Allele Frequency (MAF) 2 8 48 24 16 8 96 HapMap Based on ~ 6-8 Chromosomesrandom CandidateGeneSequencing New 1000 Genome Program Fraction of SNPs Discovered
9
Genotype - Phenotype Studies What SNPs are available? How do I find the common SNPs? What is the validation/quality of the SNPs? Are these SNPs informative in my population/samples? What can I download information? How do I pick the “best” SNPs? - Dana Crawford You have candidate gene/region/pathway of interest and samples ready to study:
10
Minimal SNP information for genotyping/characterization What is the SNP? Flanking sequence and alleles. FASTA format >snp_name ACCGAGTAGCCAG [A/G] ACTGGGATAGAAC dbSNP reference SNP # (rs #) Where is the SNP mapped? Exon, promoter, UTR, etc How was it discovered? Method What assurances do you have that it is real? Validated how? What population – African, European, etc? What is the allele frequency of each SNP? Common (>5%), rare Are other SNPs associated - redundant? Is genotyping data for control populations available?
11
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. SeattleSNPs - Candidate gene website 2. Other web applications GVS HapMap Genome Browser HapMap Genome Browser 3. Entrez Gene - dbSNP - Entrez SNP
12
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. SeattleSNPs - Candidate gene website 2. Other web applications GVS HapMap Genome Browser HapMap Genome Browser 3. Entrez Gene - dbSNP - Entrez SNP
13
Finding SNPs: Seattle SNPs Candidate Genes pga.gs.washington.edu
14
Example - PCSK9
15
Finding SNPs: SeattleSNPs Candidate Genes
19
AD ED
20
SNP_pos Ind_ID allele1 allele2 Repeat for all individuals Repeat for next SNP
21
PolyPhen - Polymorphism Phenotyping Structural protein characteristics and evolutionary comparison SIFT = Sorting Intolerant From Tolerant Evolutionary comparison of non-synonymous SNPs
22
Finding SNPs: SeattleSNPs Candidate Genes pga.gs.washington.edu
23
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. SeattleSNPs - Candidate gene website 2. Other web applications GVS HapMap Genome Browser HapMap Genome Browser 3. Entrez Gene - dbSNP - Entrez SNP
24
Provides rapid analysis of 4.5 million genotyped SNPs from dbSNP and the HapMap Mapped to human genome build 36 (hg18) Displays genotype data in text and image formats Displays tagSNPs or clusters of informative SNPs in text and image formats Displays linkage disequilibrium (LD) in text and image formats Online tutorial provided at OpenHelix.com GVS: Genome Variation Server http://gvs.gs.washington.edu/GVS/
25
http://gvs.gs.washington.edu/GVS/ LDLR
27
GVS: Genome Variation Server
28
Table of genotypes Image of visual genotypes
29
GVS: Genome Variation Server Genotypes displayed in prettybase table and visual genotype graphic
30
GVS: Genome Variation Server
31
Dense genotypes around a candidate gene can be integrated with broader HapMap genotypes = Seattle \SNP discovery (1/200 bp) = HapMap SNPs (~1/1000 bp) High Density Genic Coverage (SeattleSNPs) Low Density Genome Coverage (HapMap)
32
GVS: Genome Variation Server Dense genotypes around a candidate gene can be integrated with lower-density HapMap genotypes
33
GVS: Genome Variation Server Combined Common A.Common samples- combined variations B. Combined samples- common variations C.Combined samples- combined variations
34
GVS: Genome Variation Server A.Common samples- combined variations Combined variations -Common samples-
35
GVS: Genome Variation Server B. Combined samples- common variations -Combined samples- HapMap SeattleSNPs
36
GVS: Genome Variation Server C. Combined samples- combined variations -Combined samples- Combined variations
39
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. SeattleSNPs - Candidate gene website 2. Other web applications GVS HapMap Genome Browser HapMap Genome Browser 3. Entrez Gene - dbSNP - Entrez SNP
40
www.hapmap.org
41
Finding SNPs: HapMap Browser
42
1.HapMap data sets are useful because individual genotype data in deeply sampled populations can be used to determine optimal genotyping strategies (tagSNPs) or perform population genetic analyses (linkage disequilbrium) 2.Data are specific to the HapMap project (not all dbSNP) HapMap data is available in dbSNP HapMap data is available in dbSNP 3.Visualization of data and direct access to SNP data, individual genotypes, and LD analysis possible in the browser and formats can be saved possible in the browser and formats can be saved for Haploview for Haploview
43
Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. SeattleSNPs - Candidate gene website 2. Other web applications GVS HapMap Genome Browser HapMap Genome Browser 3. Entrez Gene - dbSNP - Entrez SNP
44
NCBI - Database Resource www.ncbi.nlm.nih.gov PCSK9
45
Finding SNPs using NCBI databases http://www.ncbi.nlm.nih.gov/
47
Default View cSNPs
48
Finding SNPs using NCBI databases http://www.ncbi.nlm.nih.gov/
50
PCSK9
53
Finding SNPs - Entrez SNP Summary 1.dbSNP is useful for investigating detailed information on a small number SNPs - and it’s good for a picture of the gene 2.Entrez SNP is a direct, fast database for querying SNP data 3.Data from Entrez SNP can be retrieved in batches for many SNPs 4.Entrez SNP data can be “limited” to specific subsets of SNPs and formatted in plain text for easy parsing and manipulation 5.More detailed queries can be formed using specific “field tags” for retrieving SNP data
54
Summary Finding SNPs: Databases and Extraction Reviewing candidate genes using views and resources in - SeattleSNPs Integration of dense, gene-centric SNP maps with genomic HapMap SNPs - GVS HapMap viewer NCBI databases through Entrez portal -Entrez Gene, dbSNP, Entrez SNP -many ways to retrieve and format data
55
Genome Variation Server: GVS GWAS Asthma Moffatt et al Nature 448: 470-473, 2007
58
New Variation to Consider - Structural Variation Types of Structural Variants Insertions/Deletions Inversions Duplications Translocations Size: Large-scale (>100 kb) intermediate-scale (500 bp–100 kb) Fine-scale (1–500 bp) More than 10% of the genome sequence Nature 447: 161-165, 2007
59
Detection of Outliers of the Distribution X-linked SNP Unknown SNP
60
Genetic Strategy - New Insights allele frequency HIGHLOW effect size WEAK STRONG LINKAGE ASSOCIATION ?? Ardlie, Kruglyak & Seielstad (2002) Nat. Genet. Rev. 3: 299-309 Common Disease Many Rare Variants
61
High Density Lipoprotein (HDL) Sequencing Known Candidate Genes for Functional Variation From Individuals at the Tails of the Trait Distribution Low HDL High HDL Individuals
62
ABCA1 and HDL-C Observed excess of rare, nonsynonymous variants in low HDL-C samples at ABCA1 Demonstrated functional relevance in cell culture – –Cohen et al, Science 305, 869-872, 2004 Many examples emerging Common Disease Rare Variants
63
Personalized Human Genome Sequencing Solexa - an example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.