Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.

Similar presentations


Presentation on theme: "The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes."— Presentation transcript:

1 The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes Each RefSeq represents a single, naturally occurring molecule from one organism.RefSeq RefSeq biological sequences (also known as RefSeqs) are derived from GenBank records but differ in that each RefSeq is a synthesis of information, not an archived unit of primary research dataRefSeqs GenBankRefSeq Similar to a review article in the literature, a RefSeq represents the consolidation of information by a particular group at a particular time.RefSeq

2 Accession prefixMolecule typeComment AC_GenomicComplete genomic molecule, alternate assembly NC_GenomicComplete genomic molecule, reference assembly NG_GenomicIncomplete genomic region NT_GenomicContigContig or scaffold, clone-based or WGS a NW_GenomicContigContig or scaffold, primarily WGS a NS_GenomicEnvironmental sequence NZ_ b GenomicUnfinished WGS NM_mRNA NR_RNA XM_ c mRNAPredicted model XR_ c RNAPredicted model AP_ProteinAnnotated on AC_ alternate assembly NP_Protein YP_ c Protein XP_ c ProteinPredicted model ZP_ c ProteinPredicted model, annotated on NZ_ genomic records a Whole Genome Shotgun sequence data. b An ordered collection of WGS for a genome. c Computed. The RefSeq accession number format and molecule typesRefSeq

3 Flat File Format and Annotated Features RefSeqRefSeq records appear similar in format to the GenBank records from which they are derived.GenBank

4 Features of a RefSeq record

5 RefSeq records may also be displayed in a graphical format

6 CodeDescription GENOME ANNOTATIONThe RefSeq record is provided via automated processing and is not subject to individual review or revision between builds.RefSeqbuilds INFERREDThe RefSeq record has been predicted by genome sequence analysis, but it is not yet supported by experimental evidence. The record may be partially supported by homology data.RefSeq PREDICTEDThe RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.RefSeq PROVISIONALThe RefSeq record has not yet been subject to individual review. The initial sequence-to-gene name associations have been established by outside collaborators or NCBI staff.RefSeqNCBI REVIEWEDThe RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.RefSeqNCBI RefSeq VALIDATEDThe RefSeq record has undergone an initial review to provide the preferred sequence standard. The record has not yet been subject to final review, at which time additional functional information may be provided.RefSeq WGSThe RefSeq record is provided to represent a collection of whole genome shotgun sequences. These records are not subject to individual review or revisions between genome updates.RefSeq RefSeq status codes

7

8 Using Entrez Limits to restrict a query to RefSeq

9 http://www.ncbi.nlm.nih.gov/gene Gene maintains information about genes from genomes of interest to the RefSeq group

10 Find genes by...Search text free texthuman muscular dystrophy partial name and multiple speciestransporter[title] AND ("Drosophila melanogaster"[orgn] OR "Mus musculus"[orgn]) chromosome and symbol(II[chr] OR 2[chr]) AND adh*[sym] associated sequence accession numberM11313[accn] gene name (symbol)BRCA1[sym] publication (PubMed ID)11331580[PMID] Gene Ontology (GO) terms or identifiers"cell adhesion"[GO] 10030[GO] Genes with variants of medical interestgene_snp_clin[filter] chromosome and speciesY[CHR] AND human[ORGN] Enzyme Commission (EC) numbers1.9.3.1[EC] Entrez Gene is accessed like any other Entrez database:


Download ppt "The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes."

Similar presentations


Ads by Google