Download presentation
1
NCBI Molecular Biology Resources
A Field Guide August 2-3, 2005 University of Massachusetts
2
NCBI Resources The NCBI Entrez System NCBI Sequence Databases
Primary data: GenBank Derivative data: RefSeq, Gene, Genome Beyond Refseq: UniGene, Trace Archive NCBI Genomic Resources ** Intermission ** BLAST Protein Structure and Function Sequence polymorphisms and phenotypes
3
The National Institutes of Health
Bethesda, MD
4
The National Center for Biotechnology Information
Created as a part of NLM in 1988 Establish public databases Perform research in computational biology Develop software tools for sequence analysis Disseminate biomedical information
5
Web Access Text Entrez Sequence BLAST Structure VAST
6
NCBI Web Traffic User’s per day Christmas and New Year’s Day
7
The NCBI ftp site 30,000 files per day 620 Gigabytes per day
8
What does NCBI do? NCBI accepts submissions of primary data
NCBI develops tools to analyze these data NCBI uses these tools to create derivative databases based on the primary data NCBI provides free search, link, and retreival of these data, primarily through the Entrez system
9
Types of Databases Primary Databases
Original submissions by experimentalists Content controlled by the submitter Examples: GenBank, SNP, GEO, PubChem Substance Derivative Databases Built from primary data Content controlled by third party (NCBI) Examples: Refseq, TPA, RefSNP, UniGene, Protein, Structure, Conserved Domain, PubChem Compound Primary databases serve as a repository of experimentalist sequences (GenBank). Derivative databases are sources of edited/curated sequences (RefSeq…reference sequences, UniGene...genes compared to genetic loci on genomes)
10
Primary vs. Derivative Databases
C GA ATT GA C GA C ATT GA UniGene C Algorithms TATAGCCG Sequencing Centers ACGTGC ATTGACTA ACGTGC CGTGA TTGACA UniSTS EST GenBank Updated continually by NCBI STS Updated ONLY by submitters RefSeq: Annotation Pipeline GSS HTG INV VRT PHG VRL PRI ROD PLN MAM BCT ACGTGC RefSeq: LocusLink and Genomes Pipelines Curators TATAGCCG AGCTCCGATA CCGATGACAA Labs
11
What is Entrez? A system of 29 linked databases A text search engine
A tool for finding biologically linked data A retrieval engine A virtual workspace for manipulating large datasets
12
The Entrez System: Text Searches
13
Entrez Databases Each record is assigned a UID
unique integer identifier for internal tracking GI number for Nucleotide Each record is given a Document Summary a summary of the record’s content (DocSum) Each record is assigned links to biologically related UIDs Each record is indexed by data fields [author], [title], [organism], and many others
14
Entrez Taxonomy The backbone of NCBI [organism]
15
An Entrez Database - Nucleotide
GenBank: Primary Data (97.9%) original submissions by experimentalists submitters retain editorial control of records archival in nature RefSeq: Derivative Data (2.1%) curated by NCBI staff NCBI retains editorial control of records record content is updated continually
16
Entrez Nucleotide Primary Data DDBJ / EMBL / GenBank 56,865,268
Derivative Data RefSeq ,226,084 PDB ,973 Third Party Annotation ,650 Total ,101,975
17
What is GenBank? NCBI’s Primary Sequence Database
Nucleotide only sequence database Archival in nature Each record is assigned a stable accession number GenBank Data Direct submissions (traditional records ) Batch submissions (EST, GSS, STS) ftp accounts (genome data) Three collaborating databases GenBank DNA Database of Japan (DDBJ) European Molecular Biology Laboratory (EMBL) Database
18
The International Sequence Database Collaboration
NIH Entrez Sequin BankIt ftp NCBI GenBank Submissions Updates Submissions Updates EMBL DDBJ EBI CIB NIG Submissions Updates SRS EMBL getentry
19
ftp://ftp.ncbi.nih.gov/genbank/
GenBank Releases Release 148 June 2005 45,236,251 Records 49,398,852,122 Nucleotides >140,000 Species 172 Gigabytes files full release every two months incremental and cumulative updates daily available only through internet GenBank, as a product, is treated like a software product with releases (full updates) every ~2 months. Originally it was put out on CDs, but eventually became much to large to fit, so an FTP site was set up to provide access to continually updated files. ftp://ftp.ncbi.nih.gov/genbank/
20
The Growth of GenBank Release 148: 45.2 million records
49.4 billion nucleotides Average doubling time ≈ 14 months* Doubling time is currently less than 1 year and still accelerating.
21
GenBank Divisions Traditional Bulk PRI (28) Primate ROD (14) Rodent
PLN (13) Plant and Fungal BCT (10) Bacterial/Archeal INV (7) Invertebrate VRT (7) Other Vertebrate VRL (4) Viral MAM (2) Mammalian PHG (1) Phage SYN (1) Synthetic UNA (1) Unannotated Traditional Direct Submissions (Sequin/Bankit) Accurate (~1 error per 10,000 bp) Well characterized Organized by taxonomy Bulk EST (349) Expressed Sequence Tag GSS (120) Genome Survey Sequence HTG (62) High Throughput Genomic HTC (6) High Throughput cDNA STS (5) Sequence Tagged Site From sequencing projects Batch submissions (ftp/ ) Inaccurate Poorly Characterized Organized by sequence type
22
A Traditional GenBank Record
LOCUS AY bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY GI: KEYWORDS . SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, (2004) REFERENCE 2 (bases 1 to 1931) TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi: FEATURES Location/Qualifiers source /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene /gene="AFS1" CDS /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO " /db_xref="GI: " /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSNLSRDVVHS LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWW ANLGIADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLT KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSITHEGTKEMA DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt 1801 aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg ataaattaat ctttacagtt 1861 tgtaacgttg ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt aaaaaaaaaa 1921 aaaaaaaaaa a // Header The Flatfile Format Feature Table Sequence
23
An Example Record – M17755 Field Indexed Terms
Indexing for Nucleotide UID Field Indexed Terms [primary accession] M17755 [title] Homo sapiens thyroid peroxidase (TPO) mRNA… [organism] Homo sapiens [sequence length] 3060 [modification date] 1999/04/26 [properties] biomol mrna gbdiv pri srcdb genbank
24
M17755: Feature Table TPO [gene name] CDS position in bp thyroiditis
[text word] thyroid peroxidase [protein name] protein accession
25
Sequence: 99.99% Accurate The sequence itself is not indexed…
Use BLAST for that!
26
Entrez Protein GenPept (DDBJ, EMBL, GenBank) 4,444,405
RefSeq ,753,167 PIR ,395 Swiss Prot ,005 PDB ,621 PRF ,079 Third Party Annotation ,219 Total ,693,891
27
Protein Sources and Links
PIR no mRNA! RefSeq NM_000537 SWISS-PROT no mRNA! GenPept M17755
28
First seen at NCBI, not first seen at GenBank!
Sequence Revisions First seen at NCBI, not first seen at GenBank! Version and GI change only if the sequence changes The accession number always retrieves the most recent version
29
Update without a Sequence Change
June 15, 1989! GenBank came to NCBI in 1992!
30
Update with a Sequence Change
31
GenBank File Formats ASN.1 – The Raw Data flat file XML (4 flavors)
FASTA
32
NCBI Toolbox Toolbox Sources ftp> open ftp.ncbi.nih.gov .
/************************************************************************ * * asn2ff.c * convert an ASN.1 entry to flat file format, using the FFPrintArray. **************************************************************************/ #include <accentr.h> #include "asn2ff.h" #include "asn2ffp.h" #include "ffprint.h" #include <subutil.h> #include <objall.h> #include <objcode.h> #include <lsqfetch.h> #include <explore.h> #ifdef ENABLE_ID1 #include <accid1.h> #endif FILE *fpl; Args myargs[] = { {"Filename for asn.1 input","stdin",NULL,NULL,TRUE,'a',ARG_FILE_IN,0.0,0,NULL}, {"Input is a Seq-entry","F", NULL ,NULL ,TRUE,'e',ARG_BOOLEAN,0.0,0,NULL}, {"Input asnfile in binary mode","F",NULL,NULL,TRUE,'b',ARG_BOOLEAN,0.0,0,NULL}, {"Output Filename","stdout", NULL,NULL,TRUE,'o',ARG_FILE_OUT,0.0,0,NULL}, {"Show Sequence?","T", NULL ,NULL ,TRUE,'h',ARG_BOOLEAN,0.0,0,NULL}, Toolbox Sources ftp> open ftp.ncbi.nih.gov . ftp> cd toolbox ftp> cd ncbi_tools ftp://ftp.ncbi.nlm.gov/toolbox/ncbi_tools
33
Text Searches in Entrez
term1 term2 If no [limit] is specified… Organism? [ organism ] Journal? [ journal ] User compounds? search as phrase Author? [author] else [All Fields] term1[limit] OP term2[limit] OP … where limit = Entrez indexing field (organism, author, …) op = AND, OR, NOT
34
Entrez Tabs Limits Provides a simple form for applying commonly used Entrez limits Allows access to the full indexing of each Entrez database and aids in constructing complex queries Preview/Index History Provides access to previous searches in the current Entrez database Clipboard A temporary storage area for selected records Details Displays the detailed parsing of the current Entrez query, and lists errors and terms without matches
35
Programming Entrez: E-Utilities
ESearch Entrez query UID list or History ESummary UID list or History Document summaries EFetch UID list or History Formatted data UID list or History ELink UID list or History EPost UID list History
36
Finding Primary Sequences
Search Entrez Nucleotide 97.9% GenBank (primary data) 2.1% RefSeq (curated data) Possible queries we’ve seen so far… M17755 [primary accession] TPO [gene name] thyroid peroxidase [title] thyroiditis [text word] Homo sapiens [organism] thyroid peroxidase [protein name] 3060 [sequence length] /04/26 [modification date] biomol mrna [properties] gbdiv pri [properties] srcdb genbank [properties]
37
A Starting Query 309 records 298 records
Find nucleotide records for human thyroid peroxidase 309 records human thyroid peroxidase (("Homo sapiens“[Organism] OR human[All Fields]) AND thyroid peroxidase[All Fields]) Field Limit! human[organism] AND thyroid peroxidase 298 records ("Homo sapiens“[Organism] AND thyroid peroxidase[All Fields]) 11 records aren’t human sequences!!
38
Limit by Title and Database
Entrez Nucleotide GenBank srcdb ddbj/embl/genbank[properties] RefSeq srcdb refseq[properties] #1: thyroid peroxidase AND human[orgn] #2: thyroid peroxidase[title] AND human[orgn] #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 primary data
39
Limit by Genbank Division
EST Division gbdiv est[prop] Primate Division gbdiv pri[prop] #1: thyroid peroxidase AND human[orgn] #2: thyroid peroxidase[title] AND human[orgn] #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 #5: #4 AND gbdiv est[prop] #6: #4 AND gbdiv pri[prop] traditional GenBank records
40
Limit by Biomolecule Type
Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop] #1: thyroid peroxidase AND human[orgn] #2: thyroid peroxidase[title] AND human[orgn] #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 164 #5: #2 AND gbdiv est[prop] #6: #2 AND gbdiv pri[prop] #7: #6 AND biomol genomic[prop] #8: #6 AND biomol mrna[prop] genomic DNA mRNA / cDNA
41
Limit by Protein Name thyroid peroxidase[protein name] AND human[orgn] AND gbdiv pri[prop] AND biomol mrna[prop] 118 records [title] 4 records [protein name]
42
Entrez Document Summaries
Links menu Click the accession to view the record Links to other Entrez databases computed for M17755
43
Entrez Links for GI 4680720 Gene annotation based on M17755
Full text online articles about M17755 All polymorphisms in the TPO gene DNA/RNA sequences similar to M17755 Graphical view of TPO gene annotation Human phenotypes involving TPO Microarray datasets for M17755 Protein translation of M17755 Literature abstracts about M17755 Sequence polymorphisms in M17755 Source organism of M17755 STS markers in the TPO gene TPO links beyond NCBI
44
Viewing M17755
45
GenBank Sequences for Human TPO
Which one is the best sequence???
46
RefSeq: NCBI’s Derivative Sequence Database
RefSeq Benefits Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators ftp://ftp.ncbi.nih.gov/refseq/release
47
RefSeq: NCBI’s Derivative Sequence Database
Curated transcripts and proteins NM_ NP_123456 NR_ (non-coding RNA) Model transcripts and proteins XM_ XP_123456 XR_ (non-coding RNA) Assembled Genomic Regions (contigs) NT_ (BAC clones) NW_ (WGS) Other Genomic Sequence NG_ (complex regions, pseudogenes) NZ_ABCD (WGS) ZP_123456 Chromosome records in Entrez Genome NC_ (chromosome; microbial or organelle genome) Nucleotide Protein
48
Creating NM Records NMs must have cDNA support Genome annotation
Longest mRNA NMs must have cDNA support
49
NM/NP Records in Entrez
NM_000547: variant 1 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from M and AW On Feb 25, 2003 this sequence version replaced gi: EST that completes 3’ end NM_175719: variant 2 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from J , AW and M Nucleotide Protein
50
= ! = ? Annotating the Gene RefSeq Genbank Sequences Genomic DNA
(NC, NT, NW) Scanning.... Model mRNA (XM) (XR) Model protein (XP) = ! = ? Curated mRNA (NM) (NR) Curated Protein (NP) RefSeq Genbank Sequences
51
Entrez Gene and RefSeq Gene GenBank RefSeq Nucleotide Entrez Gene is the central depository for information about a gene available at NCBI, and often provides links to sites beyond NCBI Entrez Gene includes records for organisms that have NCBI Reference Sequences (RefSeqs) Entrez Gene records contain RefSeq mRNAs, proteins, and genomic DNA (if known) for a gene locus, plus links to other Entrez databases NCBI RefSeqs are based on primary sequence data in GenBank
52
Entrez Gene: RefSeq Annotations
53
NM/NP Records in Entrez Gene
54
Entrez Gene RefSeq Graphics
NM NP
55
What about LOC440844? Entrez Gene
56
BLAST Results for XM_496543 Is there any GenBank support for this mRNA? srcdb ddbj/embl/genbank[prop] AND biomol mrna[prop] no full-length hit
57
The Perils of the XM XM records are models based only on genomic sequence, and are subject to revision or removal with each new build of that genome. BLAST the XM against the RefSeq database to look for a replacement: Query= gi| |ref|XM_ | Mus musculus expressed sequence AA (AA553001), mRNA gi| |ref|NM_ | Mus musculus DNA segment, Chr 4, Wayne State University 114, expressed (D4Wsu114e), mRNA Length=1898 Score = bits (1867), Expect = 0 Identities = 1870/1871 (99%), Gaps = 0/1871 (0%) Strand=Plus/Plus
58
Eukaryotic NM/XM Records
Bos taurus: Oryza sativa (japonica cultivar-group): Danio rerio: Homo sapiens: Arabidopsis thaliana: Mus musculus: Rattus norvegicus: Pan troglodytes: Caenorhabditis elegans: Drosophila melanogaster: Aspergillus nidulans FGSC A4: Gallus gallus: Canis familiaris: Anopheles gambiae str. PEST: Plasmodium chabaudi: Candida albicans SC5314: Dictyostelium discoideum: Ustilago maydis 521: Plasmodium berghei: Gibberella zeae PH-1: Magnaporthe grisea 70-15: Neurospora crassa: Aspergillus fumigatus Af293: Entamoeba histolytica HM-1:IMSS: Cryptococcus neoformans var. neoformans JEC21: 6594 Giardia lamblia ATCC 50803: Yarrowia lipolytica CLIB99: Debaryomyces hansenii CBS767: Apis mellifera: Kluyveromyces lactis NRRL Y-1140: 5327 Candida glabrata CBS138: Schizosaccharomyces pombe 972h-: 5035 Eremothecium gossypii: Theileria parva: Xenopus tropicalis: Cryptosporidium hominis: Cryptosporidium parvum: Sus scrofa: Trypanosoma brucei: Ovis aries: Strongylocentrotus purpuratus: Felis catus: Plasmodium yoelii yoelii: Takifugu rubripes: Ciona intestinalis: Trypanosoma cruzi:
59
Genome Annotation in Entrez Nucleotide
GenBank Components (clones, WGS) NT/NW Contigs NC Genome Assembly NM/XM Master mRNA Components Components
60
Genome Annotation Links
curated mRNA genomic contig on human chromosome 2 containing NM_000547 human chromosome 2 the 21 contigs of the chromosome 2 assembly
61
Getting the Annotation Details
Genomic sequence ACCESSION NC_ REGION:
62
Getting the Annotation Details
ACCESSION NC_ REGION: exon-intron structure These flat files contain all annotations in the gene and the full, explicit sequence
63
Searching Entrez Gene Gene symbol: human thyroid peroxidase (TPO)
tpo [sym] AND human [organism] Protein name: topoisomerase genes from Archaea topoisomerase[gene/protein name] AND archaea [organism] Chromosome and Links: genes on human chromosome 2 with OMIM links 2 [chromosome] AND gene omim [filter] AND human [organism] RefSeq status and variants: Reviewed RefSeqs with transcript variants srcdb refseq reviewed[prop] AND has transcript variants[prop] Disease and Gene Ontology: Membrane proteins linked to cancer integral to plasma membrane[gene ontology] AND cancer [dis]
64
Gene Links in Entrez Microarray datasets for TPO Gene homologs for TPO
DNA and RNA sequences for TPO Phenotypes involving TPO Protein sequences for TPO Literature abstracts about TPO Sequence polymorphisms in TPO Species whose genome has this TPO gene STS markers in the TPO gene ESTs aligned to the TPO gene
65
Third Party Annotation (TPA) Database
NCBI now accepts the submission of new annotations of existing GenBank sequences. Submissions must be published in a peer-reviewed journal. Facilitates the annotation of sequences by experts. Examples of sequences appropriate for TPA are: Annotation of features on gene and/or mRNA sequences Assembled “full length” genes and/or mRNAs What should not be submitted to TPA? Synthetic constructs (such as cloning vectors) that use well-characterized, publicly available genes, promoters, or terminators Updates or changes to existing sequence data Sequence annotations without experimental evidence
66
Beyond RefSeq If your organism does not have RefSeqs…
UniGene : gene-based clusters of cDNAs and ESTs WGS sequences in Entrez Nucleotide (wgs[prop]) Trace Archive
67
What is UniGene? A gene-oriented view of sequence entries
MegaBlast based automated sequence clustering Now informed by genome hits New! Nonredundant set of gene oriented clusters Each cluster a unique gene Information on tissue types and map locations Includes known genes and uncharacterized ESTs Useful for gene discovery and selection of mapping reagents Clusters of ESTs based on automatic similarity. Each cluster represents a gene.
68
Organisms in UniGene Top Ten 1. Human 2. Rice 3. Mouse 4. Cow 5. Wheat
6. Zebrafish 7. Pig 8. Chicken 9. Frog (X. laevis) 10. Frog (X. tropicalis)
69
Finding UniGene Clusters
by link by Entrez search
70
UniGene Cluster for TPO
71
GPL GSM GSE GDS Entrez GEO Datasets Entrez GEO Submitted by
Experimentalists Submitted by Manufacturer* Curated by NCBI GPL Platform descriptions GSM Raw/processed spot intensities from a single slide/chip GSE Grouping of slide/chip data “a single experiment” GDS Grouping of experiments Entrez GEO Datasets Entrez GEO
72
Linking to GEO
73
GEO Datasets
74
Whole Genome Shotgun Projects
Traditional GenBank Divisions 300 + projects Viruses Bacteria Environmental sequences Archaea 73 Eukaryotes featuring: Cow, Chicken, Rat, Mouse, Dog, Chimpanzee, Human Pufferfish (2), Zebrafish Honeybee, Anopheles, Fruit Flies (4), Silkworm Nematode (C. briggsae) Yeasts (9), Aspergillus (3) Rice WGS- preliminary way to getting a whole genome. WGS sequences go into traditional GenBank divisions.
75
Trace Archive
76
Short-tailed opossum traces
77
Viewing Simple Genomes
All are RefSeq NC records in Entrez Genome Full chromosomal sequences are provided Genes are annotated The annotation can be shown graphically and linked to sequence records
79
mutL
80
Viewing Complex Genomes
NCBI Map Viewer Map Viewer Home Page Shows all supported organisms Provides links to genomic BLAST Genome Overview Page Provides links to individual chromosomes Shows hits on a genome graphically Chromosome Viewing Page Allows interactive views of annotation details Provides numerous maps unique to each genome
81
Map Viewer Home Page
82
Species-specific help!
Genome Overview Page Search the maps Genomic BLAST Species-specific help!
83
Chromosome Viewing Page
Map Summary Add or remove maps Master Map with exploded content Genes UniGene Contigs Zooming Controls Ideogram
84
Map Summary TPO’s contig!
85
Map Content Sequence Maps Genetic Maps Core assembly
Map content varies greatly by species! Sequence Maps Core assembly Annotation evidence Clones & Markers Polymorphisms Links & Features Genetic Maps Cytogenetic maps Linkage maps Radiation hybrid maps Assembly Contig Component Transcript Gene
86
View the Assembly near TPO
87
Assembly of Chr. 2 NT_033000
88
Assembly of Chromosome 2
89
Zooming
90
View of TPO Links to Entrez Nucleotide Links to Entrez Gene
Links to Tools and Data Gap in assembly
91
Map Content Sequence Maps Genetic Maps Core assembly
Map content varies greatly by species! Sequence Maps Core assembly Annotation evidence Clones & Markers Polymorphisms Links & Features Genetic Maps Cytogenetic maps Linkage maps Radiation hybrid maps Ab initio (model) GenBank DNA EST UniGene Gene
92
GenBank records not used in assembly
Annotation Evidence GenBank records not used in assembly UniGene Clusters Ab initio models Aligned ESTs
93
Entrez Homologene Homologs by protein BLAST
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.