Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010.

Slides:



Advertisements
Similar presentations
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Advertisements

Created as a part of NLM in 1988 Establish public databases Research in computational biology Develop software tools for sequence analysis Disseminate.
Bunu databases’in icine koy lecture 5i de sonuna
Biology 4900 Biocomputing. Chapter 2 Molecular Databases and Data Analysis.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Introduction to Bioinformatics Monday, November 19, 2012 Jonathan Pevsner Bioinformatics M.E:
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genomic Database - Ensembl Ka-Lok Ng Department of Bioinformatics Asia University.
How to access genomic information using Ensembl August 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
2.7 DNA Replication, transcription and translation
Resolving the Mollusca Phylogeny using ESTs C. Dunn (Brown University), G. Giribet (Harvard University), N. Wilson (UCSD) Most controversy lies on the.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
Databases. Where to get data? GenBank – Protein Databases –SWISS-PROT:
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to Bioinformatics Monday, November 17, 2008 Jonathan Pevsner Bioinformatics M.E:
Bioinformatics Jack Min Office 3012 Office hours: TR 12:15 – 4.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
NCBI FieldGuide A Minimal Guide to NCBI Nucleotide Resources.
Data Type 1: Microarrays
Tri-I Bioinformatics Workshop: Public data and tool repositories Alex Lash & Maureen Higgins Bioinformatics Core Memorial Sloan-Kettering Cancer Center.
Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner Bioinformatics M.E:
Genomics and Personalized Health Care Databases Bailee Ludwig Quality Management.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Introduction to Bioinformatics Introduction to Databases
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
The EST database is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression,
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
EB3233 Bioinformatics Introduction to Bioinformatics.
The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
A Field Guide to GenBank and NCBI Molecular Biology Resources
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
The Genetic Code. The DNA that makes up the human genome can be subdivided into information bytes called genes. Each gene encodes a unique protein that.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
What is BLAST? Basic BLAST search What is BLAST?
Genome Bioinformatics DNA and protein Databases I.
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Chapter 2: Access to Information Jonathan Pevsner, Ph.D.
Introduction to Genes and Genomes with Ensembl
Introduction to Bioinformatics
The Transcriptional Landscape of the Mammalian Genome
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Retrieving Information: Using Entrez
Access to Sequence Data and Related Information
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
Next Generation Sequencing and Human Genome Databases
Chapter 3. THE GENBANK SEQUENCE DATABASE
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Problems from last section
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Data Type 1: Microarrays
Presentation transcript:

Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010

What is an accession number? An accession number is a label that is used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775GenBank genomic DNA sequence NT_030059Genomic contig Rs dbSNP (single nucleotide polymorphism) N An expressed sequence tag (1 of 170) NM_006744RefSeq DNA sequence (from a transcript) NP_007635RefSeq protein AAC02945GenBank protein Q28369SwissProt protein 1KT7Protein Data Bank structure record protein DNA RNA

Accession MoleculeMethodNote AC_ GenomicMixedAlternate complete genomic AP_ ProteinMixedProtein products; alternate NC_ GenomicMixedComplete genomic molecules NG_ GenomicMixedIncomplete genomic regions NM_ mRNAMixedTranscript products; mRNA NM_ mRNAMixedTranscript products; 9-digit NP_ ProteinMixedProtein products; NP_ ProteinCurationProtein products; 9-digit NR_ RNAMixedNon-coding transcripts NT_ GenomicAutomatedGenomic assemblies NW_ GenomicAutomatedGenomic assemblies NZ_ABCD GenomicAutomatedWhole genome shotgun data XM_ mRNAAutomatedTranscript products XP_ ProteinAutomatedProtein products XR_ RNAAutomatedTranscript products YP_ ProteinAuto. & CuratedProtein products ZP_ ProteinAutomatedProtein products NCBI’s RefSeq project: accession for genomic, mRNA, protein sequences

Six ways to access DNA and protein sequences 1) Entrez Gene with RefSeq database (NCBI) 2) UniGene 3) Nucleotide or Protein databases (NCBI) 4) European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) 5) ExPASy Sequence Retrieval System (separate from NCBI) 6) UCSC Genome Browser

What is an EST? Expressed Sequence Tag sequence “A short strand of DNA that is part of a cDNA molecule and can act as an identifier of a gene.” In essence, a single pass DNA sequencing reaction for a particular cDNA

UniGene: unique genes via ESTs UniGene at NCBI: UniGene clusters contain many ESTs, which are DNA sequences (typically 500 base pairs in length) corresponding to the mRNA from an expressed gene. ESTs are sequenced from a complementary DNA (cDNA) library. UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Pages 20-21

Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1

Cluster sizes in UniGene This is a gene (or 1 cluster) with10 ESTs associated; the cluster size is 10 Note: HTC= high thoroughput cDNAs

FASTA format

Orthologous genes for various model species can be easily identified using this site (curated database)