NCBI Molecular Biology Resources A Field Guide August 2-3, 2005 University of Massachusetts
NCBI Resources The NCBI Entrez System NCBI Sequence Databases Primary data: GenBank Derivative data: RefSeq, Gene, Genome Beyond Refseq: UniGene, Trace Archive NCBI Genomic Resources ** Intermission ** BLAST Protein Structure and Function Sequence polymorphisms and phenotypes
The National Institutes of Health Bethesda, MD
The National Center for Biotechnology Information Created as a part of NLM in 1988 Establish public databases Perform research in computational biology Develop software tools for sequence analysis Disseminate biomedical information
Web Access Text Entrez Sequence BLAST Structure VAST
NCBI Web Traffic User’s per day Christmas and New Year’s Day
The NCBI ftp site 30,000 files per day 620 Gigabytes per day
What does NCBI do? NCBI accepts submissions of primary data NCBI develops tools to analyze these data NCBI uses these tools to create derivative databases based on the primary data NCBI provides free search, link, and retreival of these data, primarily through the Entrez system
Types of Databases Primary Databases Original submissions by experimentalists Content controlled by the submitter Examples: GenBank, SNP, GEO, PubChem Substance Derivative Databases Built from primary data Content controlled by third party (NCBI) Examples: Refseq, TPA, RefSNP, UniGene, Protein, Structure, Conserved Domain, PubChem Compound Primary databases serve as a repository of experimentalist sequences (GenBank). Derivative databases are sources of edited/curated sequences (RefSeq…reference sequences, UniGene...genes compared to genetic loci on genomes)
Primary vs. Derivative Databases C GA ATT GA C GA C ATT GA UniGene C Algorithms TATAGCCG Sequencing Centers ACGTGC ATTGACTA ACGTGC CGTGA TTGACA UniSTS EST GenBank Updated continually by NCBI STS Updated ONLY by submitters RefSeq: Annotation Pipeline GSS HTG INV VRT PHG VRL PRI ROD PLN MAM BCT ACGTGC RefSeq: LocusLink and Genomes Pipelines Curators TATAGCCG AGCTCCGATA CCGATGACAA Labs
What is Entrez? A system of 29 linked databases A text search engine A tool for finding biologically linked data A retrieval engine A virtual workspace for manipulating large datasets
The Entrez System: Text Searches
Entrez Databases Each record is assigned a UID unique integer identifier for internal tracking GI number for Nucleotide Each record is given a Document Summary a summary of the record’s content (DocSum) Each record is assigned links to biologically related UIDs Each record is indexed by data fields [author], [title], [organism], and many others
Entrez Taxonomy The backbone of NCBI [organism]
An Entrez Database - Nucleotide GenBank: Primary Data (97.9%) original submissions by experimentalists submitters retain editorial control of records archival in nature RefSeq: Derivative Data (2.1%) curated by NCBI staff NCBI retains editorial control of records record content is updated continually
Entrez Nucleotide Primary Data DDBJ / EMBL / GenBank 56,865,268 Derivative Data RefSeq 1,226,084 PDB 5,973 Third Party Annotation 4,650 Total 58,101,975
What is GenBank? NCBI’s Primary Sequence Database Nucleotide only sequence database Archival in nature Each record is assigned a stable accession number GenBank Data Direct submissions (traditional records ) Batch submissions (EST, GSS, STS) ftp accounts (genome data) Three collaborating databases GenBank DNA Database of Japan (DDBJ) European Molecular Biology Laboratory (EMBL) Database
The International Sequence Database Collaboration NIH Entrez Sequin BankIt ftp NCBI GenBank Submissions Updates Submissions Updates EMBL DDBJ EBI CIB NIG Submissions Updates SRS EMBL getentry
ftp://ftp.ncbi.nih.gov/genbank/ GenBank Releases Release 148 June 2005 45,236,251 Records 49,398,852,122 Nucleotides >140,000 Species 172 Gigabytes 785 files full release every two months incremental and cumulative updates daily available only through internet GenBank, as a product, is treated like a software product with releases (full updates) every ~2 months. Originally it was put out on CDs, but eventually became much to large to fit, so an FTP site was set up to provide access to continually updated files. ftp://ftp.ncbi.nih.gov/genbank/
The Growth of GenBank Release 148: 45.2 million records 49.4 billion nucleotides Average doubling time ≈ 14 months* Doubling time is currently less than 1 year and still accelerating.
GenBank Divisions Traditional Bulk PRI (28) Primate ROD (14) Rodent PLN (13) Plant and Fungal BCT (10) Bacterial/Archeal INV (7) Invertebrate VRT (7) Other Vertebrate VRL (4) Viral MAM (2) Mammalian PHG (1) Phage SYN (1) Synthetic UNA (1) Unannotated Traditional Direct Submissions (Sequin/Bankit) Accurate (~1 error per 10,000 bp) Well characterized Organized by taxonomy Bulk EST (349) Expressed Sequence Tag GSS (120) Genome Survey Sequence HTG (62) High Throughput Genomic HTC (6) High Throughput cDNA STS (5) Sequence Tagged Site From sequencing projects Batch submissions (ftp/email) Inaccurate Poorly Characterized Organized by sequence type
A Traditional GenBank Record LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene 1..1931 /gene="AFS1" CDS 54..1784 /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSNLSRDVVHS LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWW ANLGIADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLT KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSITHEGTKEMA DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt 1801 aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg ataaattaat ctttacagtt 1861 tgtaacgttg ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt aaaaaaaaaa 1921 aaaaaaaaaa a // Header The Flatfile Format Feature Table Sequence
An Example Record – M17755 Field Indexed Terms Indexing for Nucleotide UID 4680720 Field Indexed Terms [primary accession] M17755 [title] Homo sapiens thyroid peroxidase (TPO) mRNA… [organism] Homo sapiens [sequence length] 3060 [modification date] 1999/04/26 [properties] biomol mrna gbdiv pri srcdb genbank
M17755: Feature Table TPO [gene name] CDS position in bp thyroiditis [text word] thyroid peroxidase [protein name] protein accession
Sequence: 99.99% Accurate The sequence itself is not indexed… Use BLAST for that!
Entrez Protein GenPept (DDBJ, EMBL, GenBank) 4,444,405 RefSeq 1,753,167 PIR 222,395 Swiss Prot 189,005 PDB 68,621 PRF 12,079 Third Party Annotation 4,219 Total 6,693,891
Protein Sources and Links PIR no mRNA! RefSeq NM_000537 SWISS-PROT no mRNA! GenPept M17755
First seen at NCBI, not first seen at GenBank! Sequence Revisions First seen at NCBI, not first seen at GenBank! Version and GI change only if the sequence changes The accession number always retrieves the most recent version
Update without a Sequence Change June 15, 1989! GenBank came to NCBI in 1992!
Update with a Sequence Change
GenBank File Formats ASN.1 – The Raw Data flat file XML (4 flavors) FASTA
NCBI Toolbox Toolbox Sources ftp> open ftp.ncbi.nih.gov . /************************************************************************ * * asn2ff.c * convert an ASN.1 entry to flat file format, using the FFPrintArray. **************************************************************************/ #include <accentr.h> #include "asn2ff.h" #include "asn2ffp.h" #include "ffprint.h" #include <subutil.h> #include <objall.h> #include <objcode.h> #include <lsqfetch.h> #include <explore.h> #ifdef ENABLE_ID1 #include <accid1.h> #endif FILE *fpl; Args myargs[] = { {"Filename for asn.1 input","stdin",NULL,NULL,TRUE,'a',ARG_FILE_IN,0.0,0,NULL}, {"Input is a Seq-entry","F", NULL ,NULL ,TRUE,'e',ARG_BOOLEAN,0.0,0,NULL}, {"Input asnfile in binary mode","F",NULL,NULL,TRUE,'b',ARG_BOOLEAN,0.0,0,NULL}, {"Output Filename","stdout", NULL,NULL,TRUE,'o',ARG_FILE_OUT,0.0,0,NULL}, {"Show Sequence?","T", NULL ,NULL ,TRUE,'h',ARG_BOOLEAN,0.0,0,NULL}, Toolbox Sources ftp> open ftp.ncbi.nih.gov . ftp> cd toolbox ftp> cd ncbi_tools ftp://ftp.ncbi.nlm.gov/toolbox/ncbi_tools
Text Searches in Entrez term1 term2 If no [limit] is specified… Organism? [ organism ] Journal? [ journal ] User compounds? search as phrase Author? [author] else [All Fields] term1[limit] OP term2[limit] OP … where limit = Entrez indexing field (organism, author, …) op = AND, OR, NOT
Entrez Tabs Limits Provides a simple form for applying commonly used Entrez limits Allows access to the full indexing of each Entrez database and aids in constructing complex queries Preview/Index History Provides access to previous searches in the current Entrez database Clipboard A temporary storage area for selected records Details Displays the detailed parsing of the current Entrez query, and lists errors and terms without matches
Programming Entrez: E-Utilities http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html ESearch Entrez query UID list or History ESummary UID list or History Document summaries EFetch UID list or History Formatted data UID list or History ELink UID list or History EPost UID list History
Finding Primary Sequences Search Entrez Nucleotide 97.9% GenBank (primary data) 2.1% RefSeq (curated data) Possible queries we’ve seen so far… M17755 [primary accession] TPO [gene name] thyroid peroxidase [title] thyroiditis [text word] Homo sapiens [organism] thyroid peroxidase [protein name] 3060 [sequence length] 1999/04/26 [modification date] biomol mrna [properties] gbdiv pri [properties] srcdb genbank [properties]
A Starting Query 309 records 298 records Find nucleotide records for human thyroid peroxidase 309 records human thyroid peroxidase (("Homo sapiens“[Organism] OR human[All Fields]) AND thyroid peroxidase[All Fields]) Field Limit! human[organism] AND thyroid peroxidase 298 records ("Homo sapiens“[Organism] AND thyroid peroxidase[All Fields]) 11 records aren’t human sequences!!
Limit by Title and Database Entrez Nucleotide GenBank srcdb ddbj/embl/genbank[properties] RefSeq srcdb refseq[properties] #1: thyroid peroxidase AND human[orgn] 298 #2: thyroid peroxidase[title] AND human[orgn] 169 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 primary data
Limit by Genbank Division EST Division gbdiv est[prop] Primate Division gbdiv pri[prop] #1: thyroid peroxidase AND human[orgn] 298 #2: thyroid peroxidase[title] AND human[orgn] 169 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 #5: #4 AND gbdiv est[prop] 20 #6: #4 AND gbdiv pri[prop] 144 traditional GenBank records
Limit by Biomolecule Type Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop] #1: thyroid peroxidase AND human[orgn] 298 #2: thyroid peroxidase[title] AND human[orgn] 169 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 #5: #2 AND gbdiv est[prop] 20 #6: #2 AND gbdiv pri[prop] 144 #7: #6 AND biomol genomic[prop] 26 #8: #6 AND biomol mrna[prop] 118 genomic DNA mRNA / cDNA
Limit by Protein Name thyroid peroxidase[protein name] AND human[orgn] AND gbdiv pri[prop] AND biomol mrna[prop] 118 records [title] 4 records [protein name]
Entrez Document Summaries Links menu Click the accession to view the record Links to other Entrez databases computed for M17755
Entrez Links for GI 4680720 Gene annotation based on M17755 Full text online articles about M17755 All polymorphisms in the TPO gene DNA/RNA sequences similar to M17755 Graphical view of TPO gene annotation Human phenotypes involving TPO Microarray datasets for M17755 Protein translation of M17755 Literature abstracts about M17755 Sequence polymorphisms in M17755 Source organism of M17755 STS markers in the TPO gene TPO links beyond NCBI
Viewing M17755
GenBank Sequences for Human TPO Which one is the best sequence???
RefSeq: NCBI’s Derivative Sequence Database RefSeq Benefits Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators ftp://ftp.ncbi.nih.gov/refseq/release
RefSeq: NCBI’s Derivative Sequence Database Curated transcripts and proteins NM_123456 NP_123456 NR_123456 (non-coding RNA) Model transcripts and proteins XM_123456 XP_123456 XR_123456 (non-coding RNA) Assembled Genomic Regions (contigs) NT_123456 (BAC clones) NW_123456 (WGS) Other Genomic Sequence NG_123456 (complex regions, pseudogenes) NZ_ABCD12345678 (WGS) ZP_123456 Chromosome records in Entrez Genome NC_123456 (chromosome; microbial or organelle genome) Nucleotide Protein
Creating NM Records NMs must have cDNA support Genome annotation Longest mRNA NMs must have cDNA support
NM/NP Records in Entrez NM_000547: variant 1 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from M17755.2 and AW874082.1. On Feb 25, 2003 this sequence version replaced gi:21361188. EST that completes 3’ end NM_175719: variant 2 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from J02970.1, AW874082.1 and M17755.2. Nucleotide Protein
= ! = ? Annotating the Gene RefSeq Genbank Sequences Genomic DNA (NC, NT, NW) Scanning.... Model mRNA (XM) (XR) Model protein (XP) = ! = ? Curated mRNA (NM) (NR) Curated Protein (NP) RefSeq Genbank Sequences
Entrez Gene and RefSeq Gene GenBank RefSeq Nucleotide Entrez Gene is the central depository for information about a gene available at NCBI, and often provides links to sites beyond NCBI Entrez Gene includes records for organisms that have NCBI Reference Sequences (RefSeqs) Entrez Gene records contain RefSeq mRNAs, proteins, and genomic DNA (if known) for a gene locus, plus links to other Entrez databases NCBI RefSeqs are based on primary sequence data in GenBank
Entrez Gene: RefSeq Annotations
NM/NP Records in Entrez Gene
Entrez Gene RefSeq Graphics NM NP
What about LOC440844? Entrez Gene
BLAST Results for XM_496543 Is there any GenBank support for this mRNA? srcdb ddbj/embl/genbank[prop] AND biomol mrna[prop] no full-length hit
The Perils of the XM XM records are models based only on genomic sequence, and are subject to revision or removal with each new build of that genome. BLAST the XM against the RefSeq database to look for a replacement: Query= gi|20850420|ref|XM_124429.1| Mus musculus expressed sequence AA553001 (AA553001), mRNA gi|19527087|ref|NM_133873.1| Mus musculus DNA segment, Chr 4, Wayne State University 114, expressed (D4Wsu114e), mRNA Length=1898 Score = 3701.55 bits (1867), Expect = 0 Identities = 1870/1871 (99%), Gaps = 0/1871 (0%) Strand=Plus/Plus
Eukaryotic NM/XM Records Bos taurus: 37541 Oryza sativa (japonica cultivar-group): 36836 Danio rerio: 30577 Homo sapiens: 29261 Arabidopsis thaliana: 28953 Mus musculus: 27033 Rattus norvegicus: 23975 Pan troglodytes: 21810 Caenorhabditis elegans: 21124 Drosophila melanogaster: 19412 Aspergillus nidulans FGSC A4: 18951 Gallus gallus: 18120 Canis familiaris: 16891 Anopheles gambiae str. PEST: 15328 Plasmodium chabaudi: 14747 Candida albicans SC5314: 13672 Dictyostelium discoideum: 13570 Ustilago maydis 521: 13044 Plasmodium berghei: 11778 Gibberella zeae PH-1: 11640 Magnaporthe grisea 70-15: 11109 Neurospora crassa: 10079 Aspergillus fumigatus Af293: 9923 Entamoeba histolytica HM-1:IMSS: 9772 Cryptococcus neoformans var. neoformans JEC21: 6594 Giardia lamblia ATCC 50803: 6569 Yarrowia lipolytica CLIB99: 6521 Debaryomyces hansenii CBS767: 6318 Apis mellifera: 6292 Kluyveromyces lactis NRRL Y-1140: 5327 Candida glabrata CBS138: 5181 Schizosaccharomyces pombe 972h-: 5035 Eremothecium gossypii: 4718 Theileria parva: 4079 Xenopus tropicalis: 4069 Cryptosporidium hominis: 3886 Cryptosporidium parvum: 3396 Sus scrofa: 938 Trypanosoma brucei: 599 Ovis aries: 253 Strongylocentrotus purpuratus: 215 Felis catus: 162 Plasmodium yoelii yoelii: 105 Takifugu rubripes: 7 Ciona intestinalis: 3 Trypanosoma cruzi: 3
Genome Annotation in Entrez Nucleotide GenBank Components (clones, WGS) NT/NW Contigs NC Genome Assembly NM/XM Master mRNA Components Components
Genome Annotation Links curated mRNA genomic contig on human chromosome 2 containing NM_000547 human chromosome 2 the 21 contigs of the chromosome 2 assembly
Getting the Annotation Details Genomic sequence ACCESSION NC_000002 REGION: 1396242..1525502
Getting the Annotation Details ACCESSION NC_000002 REGION: 1396242..1525502 exon-intron structure These flat files contain all annotations in the gene and the full, explicit sequence
Searching Entrez Gene Gene symbol: human thyroid peroxidase (TPO) tpo [sym] AND human [organism] Protein name: topoisomerase genes from Archaea topoisomerase[gene/protein name] AND archaea [organism] Chromosome and Links: genes on human chromosome 2 with OMIM links 2 [chromosome] AND gene omim [filter] AND human [organism] RefSeq status and variants: Reviewed RefSeqs with transcript variants srcdb refseq reviewed[prop] AND has transcript variants[prop] Disease and Gene Ontology: Membrane proteins linked to cancer integral to plasma membrane[gene ontology] AND cancer [dis]
Gene Links in Entrez Microarray datasets for TPO Gene homologs for TPO DNA and RNA sequences for TPO Phenotypes involving TPO Protein sequences for TPO Literature abstracts about TPO Sequence polymorphisms in TPO Species whose genome has this TPO gene STS markers in the TPO gene ESTs aligned to the TPO gene
Third Party Annotation (TPA) Database NCBI now accepts the submission of new annotations of existing GenBank sequences. Submissions must be published in a peer-reviewed journal. Facilitates the annotation of sequences by experts. Examples of sequences appropriate for TPA are: Annotation of features on gene and/or mRNA sequences Assembled “full length” genes and/or mRNAs What should not be submitted to TPA? Synthetic constructs (such as cloning vectors) that use well-characterized, publicly available genes, promoters, or terminators Updates or changes to existing sequence data Sequence annotations without experimental evidence
Beyond RefSeq If your organism does not have RefSeqs… UniGene : gene-based clusters of cDNAs and ESTs WGS sequences in Entrez Nucleotide (wgs[prop]) Trace Archive
What is UniGene? A gene-oriented view of sequence entries MegaBlast based automated sequence clustering Now informed by genome hits New! Nonredundant set of gene oriented clusters Each cluster a unique gene Information on tissue types and map locations Includes known genes and uncharacterized ESTs Useful for gene discovery and selection of mapping reagents Clusters of ESTs based on automatic similarity. Each cluster represents a gene.
Organisms in UniGene Top Ten 1. Human 2. Rice 3. Mouse 4. Cow 5. Wheat 6. Zebrafish 7. Pig 8. Chicken 9. Frog (X. laevis) 10. Frog (X. tropicalis)
Finding UniGene Clusters by link by Entrez search
UniGene Cluster for TPO
GPL GSM GSE GDS Entrez GEO Datasets Entrez GEO Submitted by Experimentalists Submitted by Manufacturer* Curated by NCBI GPL Platform descriptions GSM Raw/processed spot intensities from a single slide/chip GSE Grouping of slide/chip data “a single experiment” GDS Grouping of experiments Entrez GEO Datasets Entrez GEO
Linking to GEO
GEO Datasets
Whole Genome Shotgun Projects Traditional GenBank Divisions 300 + projects Viruses Bacteria Environmental sequences Archaea 73 Eukaryotes featuring: Cow, Chicken, Rat, Mouse, Dog, Chimpanzee, Human Pufferfish (2), Zebrafish Honeybee, Anopheles, Fruit Flies (4), Silkworm Nematode (C. briggsae) Yeasts (9), Aspergillus (3) Rice WGS- preliminary way to getting a whole genome. WGS sequences go into traditional GenBank divisions.
Trace Archive
Short-tailed opossum traces
Viewing Simple Genomes All are RefSeq NC records in Entrez Genome Full chromosomal sequences are provided Genes are annotated The annotation can be shown graphically and linked to sequence records
mutL
Viewing Complex Genomes NCBI Map Viewer Map Viewer Home Page Shows all supported organisms Provides links to genomic BLAST Genome Overview Page Provides links to individual chromosomes Shows hits on a genome graphically Chromosome Viewing Page Allows interactive views of annotation details Provides numerous maps unique to each genome
Map Viewer Home Page
Species-specific help! Genome Overview Page Search the maps Genomic BLAST Species-specific help!
Chromosome Viewing Page Map Summary Add or remove maps Master Map with exploded content Genes UniGene Contigs Zooming Controls Ideogram
Map Summary TPO’s contig!
Map Content Sequence Maps Genetic Maps Core assembly Map content varies greatly by species! Sequence Maps Core assembly Annotation evidence Clones & Markers Polymorphisms Links & Features Genetic Maps Cytogenetic maps Linkage maps Radiation hybrid maps Assembly Contig Component Transcript Gene
View the Assembly near TPO
Assembly of Chr. 2 NT_033000 1255072 1563756
Assembly of Chromosome 2
Zooming
View of TPO Links to Entrez Nucleotide Links to Entrez Gene Links to Tools and Data Gap in assembly
Map Content Sequence Maps Genetic Maps Core assembly Map content varies greatly by species! Sequence Maps Core assembly Annotation evidence Clones & Markers Polymorphisms Links & Features Genetic Maps Cytogenetic maps Linkage maps Radiation hybrid maps Ab initio (model) GenBank DNA EST UniGene Gene
GenBank records not used in assembly Annotation Evidence GenBank records not used in assembly UniGene Clusters Ab initio models Aligned ESTs
Entrez Homologene Homologs by protein BLAST