Download presentation
Presentation is loading. Please wait.
Published byTracy Woods Modified over 9 years ago
1
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010
2
What is an accession number? An accession number is a label that is used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775GenBank genomic DNA sequence NT_030059Genomic contig Rs7079946dbSNP (single nucleotide polymorphism) N91759.1An expressed sequence tag (1 of 170) NM_006744RefSeq DNA sequence (from a transcript) NP_007635RefSeq protein AAC02945GenBank protein Q28369SwissProt protein 1KT7Protein Data Bank structure record protein DNA RNA
3
Accession MoleculeMethodNote AC_123456 GenomicMixedAlternate complete genomic AP_123456 ProteinMixedProtein products; alternate NC_123456 GenomicMixedComplete genomic molecules NG_123456 GenomicMixedIncomplete genomic regions NM_123456 mRNAMixedTranscript products; mRNA NM_123456789 mRNAMixedTranscript products; 9-digit NP_123456 ProteinMixedProtein products; NP_123456789 ProteinCurationProtein products; 9-digit NR_123456 RNAMixedNon-coding transcripts NT_123456 GenomicAutomatedGenomic assemblies NW_123456 GenomicAutomatedGenomic assemblies NZ_ABCD12345678 GenomicAutomatedWhole genome shotgun data XM_123456 mRNAAutomatedTranscript products XP_123456 ProteinAutomatedProtein products XR_123456 RNAAutomatedTranscript products YP_123456 ProteinAuto. & CuratedProtein products ZP_12345678 ProteinAutomatedProtein products NCBI’s RefSeq project: accession for genomic, mRNA, protein sequences
4
Six ways to access DNA and protein sequences 1) Entrez Gene with RefSeq database (NCBI) 2) UniGene 3) Nucleotide or Protein databases (NCBI) 4) European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) 5) ExPASy Sequence Retrieval System (separate from NCBI) 6) UCSC Genome Browser
5
What is an EST? Expressed Sequence Tag sequence “A short strand of DNA that is part of a cDNA molecule and can act as an identifier of a gene.” In essence, a single pass DNA sequencing reaction for a particular cDNA
6
UniGene: unique genes via ESTs UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene UniGene clusters contain many ESTs, which are DNA sequences (typically 500 base pairs in length) corresponding to the mRNA from an expressed gene. ESTs are sequenced from a complementary DNA (cDNA) library. UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Pages 20-21
7
Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1
8
Cluster sizes in UniGene This is a gene (or 1 cluster) with10 ESTs associated; the cluster size is 10 Note: HTC= high thoroughput cDNAs
9
FASTA format
10
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene Orthologous genes for various model species can be easily identified using this site (curated database)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.