NCBI FieldGuide NCBI Molecular Biology Resources January 12, 2007 A Field Guide Part 1
NCBI FieldGuide The NCBI Entrez System NCBI Sequence Databases –Primary data: GenBank –Derivative data: RefSeq, Gene Protein Structure and Function Sequence polymorphisms and phenotypes ** Intermission ** NCBI Genomic Resources BLAST NCBI Resources
NCBI FieldGuide The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH –national resource for molecular biology information (biological information direct from organisms) –gather data both nationally and internationally –develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease Bethesda,MD
NCBI FieldGuide Data sources: traditional literature and data obtained from the direct study of organisms The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. Figure 1 from Geer RC., Broad issues to consider for library involvement in bioinformatics. J Med Libr Assoc Jul; 94(3):286–98. E-152.–5. PMID: Geer RC. NCBI: –accepts submissions of bibliographic records (example) and primary research data (example nucleotide sequence for colon cancer gene, MLH1)example –organizes the information into databases, maintains them, makes them available to the world –develops software to retrieve and analyze the data –conducts basic research to make new biological discoveries using the databases and software tools
NCBI FieldGuide What does NCBI do? NCBI accepts submissions of primary data NCBI develops tools to analyze these data NCBI uses these tools to create derivative databases based on the primary data NCBI provides free search, link, and retrieval of these data, primarily through the Entrez system
NCBI FieldGuide BLAST VAST Entrez Text Sequence Protein Structure Small Mol. Structure PubChem Web Access query
NCBI FieldGuide The NCBI ftp site 30,000 files per day 620 Gigabytes per day
NCBI FieldGuide NCBI Toolbox: In-house source code useful for incorporating NCBI-like functionality into their programs. Three main parts: Data Model, Data Encoding and Programming Libraries. Examples: BLAST, Cn3D, Sequin, Data format conversion scripts Help for Programmers E-Utilities: Guidelines for Entrez “URL calls” used to access data. Designed for use in scripts. Examples: ESearch, EPost, ESummary, EFetch and ELink Caution: Overuse may result in blocked IPs!
NCBI FieldGuide Global Entrez Search Page All[Filter]
NCBI FieldGuide What is Entrez? A system of 31 linked databases A text search engine A tool for finding biologically linked data A retrieval engine A virtual workspace for manipulating large datasets
NCBI FieldGuide Entrez Databases Each record is assigned a UID –unique integer identifier for internal tracking –GI number for Nucleotide Each record is given a Document Summary –a summary of the record’s content (DocSum) Each record is assigned links to biologically related UIDs Each record is indexed by data fields –[author], [title], [organism], and many others
NCBI FieldGuide Linking in Entrez Follow links to related data in the same database or in others! Links Hard Links: Curated links based on biology nucleotide taxonomy (based on organism identifier) protein domain relatives (based on domain assignment) domains pubmed (based on supporting literature) pcsubstance structures/mmdb (based on source information ) Soft Links: Pre-computed analyses nucleotide related sequences (BLAST neighbors) protein conserved domains (CDD/RPS-BLAST search) pccompound pccompound (structure-based neighboring)
NCBI FieldGuide Genomes Taxonomy Entrez: Database Integration PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure Word weight VAST BLAST Phylogeny Hard Link Neighbors Related Sequences Neighbors Related Seqs. BLink, Domains Neighbors Related Structures
NCBI FieldGuide Links: Database Integration at NCBI Gene Nucleotide Protein Structure CDD SNP Taxonomy PubMed Homolo- gene mRNAs; genome All CDS products Protein Function SNPs; indels Source organism Literature Gene locus BLASTn CDS product 3D DNA 3D RNA SNPs; indels Source organism Literature Gene locus cDNA transcript BLASTp 3D proteins FunctionSNPs; indels Source organism Literature DNA sequence Protein sequence VAST Protein Function SNP BLASTp Source organism Literature Gene lociProteins with CD 3D templates CDART Broadest taxon Literature Gene locus DNA sequence Protein sequence 3D template Source organism Literature Genes for taxon Seqs for taxon Structs for taxon CD spans Taxon SNPs for taxon Common Tree Gene loci in article Sequence in article Structure in article CDs in article SNPs in article Related articles Nucleotide Protein Structure CDD SNP Taxonomy PubMed
NCBI FieldGuide Types of Databases Primary Databases –Original submissions by experimentalists –Content controlled by the submitter Examples: GenBank, dbSNP, GEO, PubChem Substance and PubChem Bioassays Derivative Databases –Built from primary data –Content controlled by third party (NCBI) Examples: Refseq, RefSNP, GEO Datasets, PubChem Compound
NCBI FieldGuide An Entrez Database - Nucleotide GenBank: Primary Data (98.2%) –original submissions by experimentalists –submitters retain editorial control of records –archival in nature RefSeq: Derivative Data (1.8%) –curated by NCBI staff –NCBI retains editorial control of records –record content is updated continually
NCBI FieldGuide Literature Databases
NCBI FieldGuide NM_000249: PubMed Books
NCBI FieldGuide Books Link
A part of the NCBI Bookshelf Part 1. The Databases Part 3. Querying and Linking the Data Part 2. Data Flow and Processing Part 4. User Support
PubMed Central PubMed Central is a digital archive of life sciences journal literature. Integrated into the Entrez retrieval system, PMC provides free and unrestricted access to the full text of over 160 life sciences journals, with more to come.
NCBI FieldGuide NCBI Journal Database Detailed journal information
NCBI FieldGuide OMIM - A catalogue of genes involved with human disease processes - Detailed clinical and reference information - Curated and maintained by Johns Hopkins - Links to PubMed and sequence databases
NCBI FieldGuide Primary vs. Derivative Databases ACGTGC CGTGA ATTGACTA ACGTGC TTGACA TATAGCCG GenBank Sequencing Centers GA ATT C C GA ATT C C UniGene RefSeq: Gene and Genomes Pipelines RefSeq: Annotation Pipeline Labs Curators Algorithms TATAGCCG AGCTCCGATA CCGATGACAA Updated ONLY by submitters EST UniSTS STS GSS HTG Updated continually by NCBI PRIRODPLNMAMBCT INVVRTPHGVRL
NCBI FieldGuide What is GenBank? NCBI’s Primary Sequence Database Nucleotide only sequence database Archival in nature Each record is assigned a stable accession number GenBank Data –Direct submissions (traditional records ) –Batch submissions (EST, GSS, STS) –ftp accounts (genome data) Three collaborating databases –GenBank –DNA Database of Japan (DDBJ) –European Molecular Biology Laboratory (EMBL) Database
NCBI FieldGuide GenBank DDBJ EMBL EMBL Entrez SRS getentry NIG CIB NCBI NIH Submissions Updates Submissions Updates Submissions Updates The International Sequence Database Collaboration Sequin BankIt ftp EBI
NCBI FieldGuide full release every two months incremental and cumulative updates daily available only through internet (non-WGS) Release 156October Records Nucleotides >150,000Species 245 Gigabytes 1032 files GenBank Releases
NCBI FieldGuide The Growth of GenBank Non-WGS: 59.8 billion bases WGS: 63.2 billion bases Release 152
NCBI FieldGuide GenBank Divisions PRI Primate ROD Rodent PLN Plant and Fungal BCT Bacterial/Archeal VRT Other Vertebrate INV Invertebrate VRL Viral MAM Mammalian PHG Phage SYN Synthetic UNA Unannotated Direct Submissions (Sequin/Bankit) Accurate (~1 error per 10,000 bp) Well characterized Organized by taxonomy EST Expressed Sequence Tag GSS Genome Survey Sequence HTG High Throughput Genomic PAT Patent sequences STS Sequence Tagged Site HTC High Throughput cDNA CON Constructed entries From sequencing projects Batch submissions (ftp/ ) Inaccurate Poorly Characterized Organized by sequence type Traditional Bulk
NCBI FieldGuide Entrez Nucleotide Subsets CoreNucleotide EST GSS TOTAL
NCBI FieldGuide A Traditional GenBank Record LOCUS AY bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY VERSION AY GI: KEYWORDS. SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi: FEATURES Location/Qualifiers source /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene /gene="AFS1" CDS /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO " /db_xref="GI: " /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSNLSRDVVHS LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWW ANLGIADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLT KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSITHEGTKEMA DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt 1801 aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg ataaattaat ctttacagtt 1861 tgtaacgttg ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt aaaaaaaaaa 1921 aaaaaaaaaa a // Header Feature Table Sequence The Flatfile Format
NCBI FieldGuide An Example Record – M17755 FieldIndexed Terms [primary accession]M17755 [title]Homo sapiens thyroid peroxidase (TPO) mRNA… [organism]Homo sapiens [sequence length]3060 [modification date]1999/04/26 [properties]biomol mrna gbdiv pri srcdb genbank Indexing for Nucleotide UID
NCBI FieldGuide M17755: Feature Table CDS position in bp TPO [gene name] thyroiditis [text word] thyroid peroxidase [protein name] protein accession
NCBI FieldGuide Sequence: 99.99% Accurate The sequence itself is not indexed… Use BLAST for that!
NCBI FieldGuide Entrez Protein GenPept (DDBJ, EMBL, GenBank) RefSeq Swiss Prot PDB PIR PRF Third Party Annotation 4969 Total
NCBI FieldGuide Protein Sources and Links PIR RefSeq SWISS-PROT GenPept NM_ M17755 no mRNA!
NCBI FieldGuide Sequence Revisions Version and GI change only if the sequence changes The accession number always retrieves the most recent version First seen at NCBI, not first seen at GenBank!
NCBI FieldGuide Update without a Sequence Change June 15, 1989! GenBank came to NCBI in 1992!
NCBI FieldGuide Update with a Sequence Change
NCBI FieldGuide GenBank File Formats ASN.1 – The Raw Data XML FASTA flat file
NCBI FieldGuide /************************************************************************ * * asn2ff.c * convert an ASN.1 entry to flat file format, using the FFPrintArray. * **************************************************************************/ #include #include "asn2ff.h" #include "asn2ffp.h" #include "ffprint.h" #include #ifdef ENABLE_ID1 #include #endif FILE *fpl; Args myargs[] = { {"Filename for asn.1 input","stdin",NULL,NULL,TRUE,'a',ARG_FILE_IN,0.0,0,NULL}, {"Input is a Seq-entry","F", NULL,NULL,TRUE,'e',ARG_BOOLEAN,0.0,0,NULL}, {"Input asnfile in binary mode","F",NULL,NULL,TRUE,'b',ARG_BOOLEAN,0.0,0,NULL}, {"Output Filename","stdout", NULL,NULL,TRUE,'o',ARG_FILE_OUT,0.0,0,NULL}, {"Show Sequence?","T", NULL,NULL,TRUE,'h',ARG_BOOLEAN,0.0,0,NULL}, Toolbox Sources ftp> open ftp> cd toolbox ftp> cd ncbi_tools NCBI Toolbox
NCBI FieldGuide Text Queries in Entrez term1[limit] OP term2[limit] OP … limit = Entrez indexing field (organism, author, …) OP = Boolean operator = AND, OR, NOT where term1 term2 Complex queries: ((A[limit1] OR B[limit2]) AND C[limit3]) NOT D[limit4] 1:200[MW] Ranges: Wildcards: cancer[title] vs. cancer*[title]
NCBI FieldGuide Entrez Tabs Limits Provides a simple form for applying commonly used Entrez limits Preview/Index Allows access to the full indexing of each Entrez database and aids in constructing complex queries History Provides access to previous searches in the current Entrez database ClipboardA temporary storage area for selected records DetailsDisplays the detailed parsing of the current Entrez query, and lists errors and terms without matches
NCBI FieldGuide Programming Entrez: E-Utilities ESearch EPost ESummary Entrez query UID list or History Document summaries History UID list or History UID list EFetch Formatted data UID list or History ELink UID list or History
NCBI FieldGuide Finding Primary Sequences Search Entrez CoreNucleotide –94.8% GenBank (primary data) –5.2% RefSeq (curated data) M17755 [primary accession]TPO [gene name] thyroid peroxidase [title]thyroiditis [text word] Homo sapiens [organism]thyroid peroxidase [protein name] 3060 [sequence length]1999/04/26 [modification date] biomol mrna [properties]gbdiv pri [properties] srcdb genbank [properties] Possible queries we’ve seen so far…
NCBI FieldGuide A Starting Query Find nucleotide records for human thyroid peroxidase (("Homo sapiens“[Organism] OR human[All Fields]) AND thyroid peroxidase[All Fields]) human thyroid peroxidase human[organism] AND thyroid peroxidase ("Homo sapiens“[Organism] AND thyroid peroxidase[All Fields]) 276 records 262 records Field Limit! 14 records aren’t human sequences!!
NCBI FieldGuide Limit by Title and Database #1: thyroid peroxidase AND human[orgn] 262 #2: thyroid peroxidase[title] AND human[orgn] 55 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 50 Entrez Nucleotide GenBank srcdb ddbj/embl/genbank[properties] RefSeq srcdb refseq[properties] primary data
NCBI FieldGuide Limit by Biomolecule Type Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop] #1: thyroid peroxidase AND human[orgn] 262 #2: thyroid peroxidase[title] AND human[orgn] 55 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 50 #5: #4 AND biomol genomic[prop] 26 #6: #4 AND biomol mrna[prop] 24 mRNA / cDNA genomic DNA
NCBI FieldGuide Limit by Protein Name thyroid peroxidase[protein name] AND human[orgn] AND gbdiv pri[prop] AND biomol mrna[prop] 24 records [title] 5 records [protein name]
NCBI FieldGuide Entrez Document Summaries Click the accession to view the record Links menu Links to other Entrez databases computed for M17755
NCBI FieldGuide Viewing M17755
NCBI FieldGuide GenBank Sequences for Human TPO Which one is the best sequence???
NCBI FieldGuide Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators RefSeq: NCBI’s Derivative Sequence Database RefSeq Benefits
NCBI FieldGuide RefSeq: NCBI’s Derivative Sequence Database Curated transcripts and proteins –NM_ NP_ –NR_ (non-coding RNA) Model transcripts and proteins –XM_ XP_ –XR_ (non-coding RNA) Assembled Genomic Regions (contigs) –NT_ (BAC clones) –NW_ (WGS) Other Genomic Sequence –NG_ (complex regions, pseudogenes) –NZ_ABCD (WGS) ZP_ Chromosome records in Entrez Genome –NC_ (chromosome; microbial or organelle genome) Nucleotide Protein
NCBI FieldGuide NM/NP Records in Entrez COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from M and AW On Feb 25, 2003 this sequence version replaced gi: NM_000547: variant 1 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from J , AW and M NM_175719: variant 2 EST that completes 3’ end Nucleotide Protein
NCBI FieldGuide Genomic DNA (NC, NT, NW) Model mRNA (XM) (XR) Curated mRNA (NM) (NR) Model protein (XP) Annotating the Gene Curated Protein (NP) Scanning.... = ?= ! Genbank Sequences RefSeq
NCBI FieldGuide The Perils of the XM XM records are models based only on genomic sequence, and are subject to revision or removal with each new build of that genome. Query= gi| |ref|XM_ | Mus musculus expressed sequence AA (AA553001), mRNA gi| |ref|NM_ | Mus musculus DNA segment, Chr 4, Wayne State University 114, expressed (D4Wsu114e), mRNA Length=1898 Score = bits (1867), Expect = 0 Identities = 1870/1871 (99%), Gaps = 0/1871 (0%) Strand=Plus/Plus BLAST the XM against the RefSeq database to look for a replacement:
NCBI FieldGuide Entrez Gene and RefSeq Entrez Gene is the central depository for information about a gene available at NCBI, and often provides links to sites beyond NCBI Entrez Gene includes records for organisms that have NCBI Reference Sequences (RefSeqs) Entrez Gene records contain RefSeq mRNAs, proteins, and genomic DNA (if known) for a gene locus, plus links to other Entrez databases NCBI RefSeqs are based on primary sequence data in GenBank GenBankRefSeq Gene Nucleotide
NCBI FieldGuide Entrez Gene: RefSeq Annotations
NCBI FieldGuide NM/NP Records in Entrez Gene
NCBI FieldGuide Entrez Gene RefSeq Graphics NMNP
NCBI FieldGuide Getting the Annotation Details Genomic sequence ACCESSION NC_ REGION:
NCBI FieldGuide Genome Annotation in Entrez Nucleotide GenBank Components (clones, WGS) NT/NW Contigs NC Assembly Components Genome Components NM/XM Master mRNA
NCBI FieldGuide Genome Annotation Links curated mRNA genomic contig on chromosome 2 transcribing NM_ human chromosome 2 the 18 contigs of the chromosome 2 assembly
NCBI FieldGuide Searching Entrez Gene RefSeq status and variants: Reviewed RefSeqs with transcript variants srcdb refseq reviewed[prop] AND has transcript variants[prop] Gene symbol: human thyroid peroxidase (TPO) tpo [sym] AND human [organism] Disease and Gene Ontology: Membrane proteins linked to cancer integral to plasma membrane[gene ontology] AND cancer [dis] Chromosome and Links: genes on human chromosome 2 with OMIM links 2 [chromosome] AND gene omim [filter] AND human [organism] Protein name: topoisomerase genes from Archaea topoisomerase[gene/protein name] AND archaea [organism]
NCBI FieldGuide Examples of sequences appropriate for TPA are: Annotation of features on gene and/or mRNA sequences Assembled “full length” genes and/or mRNAs NCBI now accepts the submission of new annotations of existing GenBank sequences. Submissions must be published in a peer-reviewed journal. Facilitates the annotation of sequences by experts. What should not be submitted to TPA? Synthetic constructs (such as cloning vectors) that use well-characterized, publicly available genes, promoters, or terminators Updates or changes to existing sequence data Sequence annotations without experimental evidence Third Party Annotation (TPA) Database
NCBI FieldGuide Linking Protein Sequence, Structure, and Function sequence function (pfam, smart) Conserved Domains (CDD) sequence structure + function (cd) VAST Structure (MMDB) sequence structure structure structure Protein sequence sequence
NCBI FieldGuide Entrez Structure Derived from experimentally determined PDB records Add value to PDB records by: –Adding explicit chemical bonding information –Validating and indexing the sequences –Annotating 3D domains and secondary structure –Adding links to CDD, Taxonomy, Pubmed –Converting PDB data to ASN.1 Structure neighbors determined by Vector Alignment Search Tool (VAST) MM MMDB: Molecular Modeling Data Base Structure
NCBI FieldGuide Structure Summary Page Conserved Domains VAST Neighbors for chain C (domain 0) Cn3D VAST Neighbors for domain 2
NCBI FieldGuide Related Structures
NCBI FieldGuide VAST: Structure Neighbors Vector Alignment Search Tool For each 3D domain, locate SSEs (secondary structure elements), and represent them as individual vectors Human IL-4 VAST uses 3D Domains only! Whole polypeptides are assigned 3D domain 0 (zero).
NCBI FieldGuide VAST Neighbors 1D2V 1Q4G 3D domains! Cn3D
NCBI FieldGuide Submitting a PDB File to VAST Redesigned interface! This is the best way to convert PDB into MMDB format! New!
NCBI FieldGuide Structure + Function VAST finds proteins that have similar 3D folds CD-Search finds proteins that have similar sequences and similar functions Curated CDs = VAST + CD-Search Proteins that have similar 3D folds, similar sequences and similar functions
NCBI FieldGuide Protein Links: Domains Click on a colored bar to align your sequence to the CD
NCBI FieldGuide CDD Record – heme peroxidases aligned query red = high conservation blue = low conservation
NCBI FieldGuide Curated CD Record - EGF Annotated features Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree New
NCBI FieldGuide Curated CD Record - EGF Annotated features Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree New Cn3D
NCBI FieldGuide Entrez PubChem PC Substance PC Compound PC BioAssay Primary database of chemical samples Derived database of known chemicals from PC Substance records Primary database of bioactivity screens of samples in PC Substance
NCBI FieldGuide Links from Structure N-acetylglucosamine heme mannose fucose
NCBI FieldGuide Sequence Polymorphisms SNPOMIM Primary database of submitted SNPs Curated database of reference SNPs Contains more than just SNPs: True SNPs MNP (multiple nucleotide) Insertions Deletions Microsatellites Mixed No variation (constant) Clinical literature database Curated at Johns Hopkins Univ Links human genes and genetic disorders to human disease Lists allelic variants that have clinical consequences Variations in SNP are not necessarily in OMIM, and vice versa! General PolymorphismsHuman Phenotypes
NCBI FieldGuide Linking to SNP Links to SNP are also available from Nucleotide and Protein Entrez Gene - TPO
NCBI FieldGuide Entrez SNP primary data: ss# SNP UID: rs#
NCBI FieldGuide Find Non-synonymous SNPs #7 AND coding nonsynon[Function Class] Function Class
NCBI FieldGuide Non-synonymous TPO SNPs Link to Map Viewer View all SNPs in locus Link to related 3D structures
NCBI FieldGuide GeneView in dbSNP
NCBI FieldGuide Links to OMIM Entrez Gene - TPO
NCBI FieldGuide OMIM Record
NCBI FieldGuide Explore a Disease SNP 799
NCBI FieldGuide Curated CD Record Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree Cn3D E799