NCBI FieldGuide A Minimal Guide to NCBI Nucleotide Resources.

Slides:



Advertisements
Similar presentations
What is RefSeqGene?.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Created as a part of NLM in 1988 Establish public databases Research in computational biology Develop software tools for sequence analysis Disseminate.
Bunu databases’in icine koy lecture 5i de sonuna
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein.
NCBI Genome Resources Using NCBI Resources for Gene Discovery Kim D. Pruitt Transcriptome 2002 National Center for Biotechnology Information (NCBI) National.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University.
Archives and Information Retrieval
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
An Introduction to Bioinformatics Molecular Biology Databases.
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Gene Expression Omnibus (GEO)
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
1 Database Resources of the National Center for Biotechnology Information Baharak Rastegari MEDG 505 presentation February 3, 2005 David.
NCBI FieldGuide NCBI Molecular Biology Resources July 8, 2004 University of São Paulo, Brazil “ Third Latin American Course on Bioinformatics for Tropical.
Genomics and Personalized Health Care Databases Bailee Ludwig Quality Management.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Part I: Identifying sequences with … Speaker : S. Gaj Date
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
Introduction to Bioinformatics Introduction to Databases
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010.
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
NCBI Literature Databases: PubMed
The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
A Field Guide to GenBank and NCBI Molecular Biology Resources
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB.
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Introduction to Genes and Genomes with Ensembl
Retrieving Information: Using Entrez
NCBI Molecular Biology Resources
The NCBI Annotation Pipeline
Access to Sequence Data and Related Information
Searching the NCBI Databases
Chapter 3. THE GENBANK SEQUENCE DATABASE
Presentation transcript:

NCBI FieldGuide A Minimal Guide to NCBI Nucleotide Resources

NCBI FieldGuide Types of Databases Primary Databases –Original submissions by experimentalists –Content controlled by the submitter Examples: GenBank, SNP, GEO Derivative Databases –Built from primary data –Content controlled by third party (NCBI) Examples: Refseq, TPA, RefSNP, UniGene, GEO Datasets, NCBI Protein, Structure, Conserved Domain

NCBI FieldGuide Accessing the Data: Entrez all[filter]

NCBI FieldGuide EBI GenBank DDBJ EMBL EMBL Entrez SRS getentry NIG CIB NCBI NIH Submissions Updates Submissions Updates Submissions Updates International Sequence Database Collaboration

NCBI FieldGuide GenBank: NCBI’s Primary Sequence Database ftp://ftp.ncbi.nih.gov/genbank/ ftp://genbank.sdsc.edu/pub ftp://bio-mirror.net/biomirror/genbank Release 142June ,532,003Records 40,325,321,348Nucleotides >140,000Species 153 Gigabytes 634 files full release every two months incremental and cumulative updates daily available only through internet release notes: gbrel.txt

NCBI FieldGuide A GenBank Record LOCUS NM_ bp mRNA linear PRI 07-APR-2003 DEFINITION Homo sapiens interleukin 3 (colony-stimulatingfactor, multiple)(IL3), mRNA. ACCESSION NM_ VERSION NM_ GI: KEYWORDS.

NCBI FieldGuide GenBank Record: Feature Table /protein_id=“ NP_ ” /db_xref=“GI: GenPept identifiers

NCBI FieldGuide GenBank Record, Con’t

NCBI FieldGuide Sequence Revision History

NCBI FieldGuide NM_ Sequence Revision History: choose records

NCBI FieldGuide Display and Save Options

NCBI FieldGuide FASTA format (NCBI)

NCBI FieldGuide Abstract Syntax Notation: ASN.1 FASTA Nucleotide FASTA Protein GenPeptGenBank ASN.1

NCBI FieldGuide Bulk Divisions Expressed Sequence Tag –1 st pass single read cDNA Genome Survey Sequence –1 st pass single read gDNA High Throughput Genomic –incomplete sequences of genomic clones Sequence Tagged Site –PCR-based mapping reagents Batch submissions ( and ftp) Inaccurate Poorly characterized

NCBI FieldGuide NCBI’s Derivative Sequence Databases

NCBI FieldGuide Primary vs. Derivative Databases GenBank Sequencing Centers UniGene RefSeq: LocusLink and Genomes Pipelines RefSeq: Annotation Pipeline Labs Algorithms Updated ONLY by submitters EST UniSTS STS GSS HTG PRIRODPLNMAMBCT INVVRTPHGVRL Curators ATT GA ATT C GA C C C C ATT TA ACT Updated continually by NCBI RefSeq

NCBI FieldGuide Entrez Protein query: topoisomerase II alpha[title] AND human[organism] Why Make Reference Sequences? = AAC77388 splice variant Δ = 5 aa = P11388 RefSeq protein

NCBI FieldGuide RefSeq Benefits non-redundant, best representative updates to reflect current sequence data and biology distinct, stable accession series genomes transcripts proteins

NCBI FieldGuide Reference Sequence: RefSeq AccessionSequence Type NM_ mRNA NP_ protein, from NM_ NR_ non-coding RNA XM_ predicted mRNA XP_ predicted protein XR_ predicted non-coding RNA ZP_ predicted from NZ_ NC_ genomic, e.g., chromosomes NG_ genomic, incomplete region NT_ genomic, BAC assembly NW_ genomic, WGS assembly NZ_ABCD genomic, WGS collection blue=curated REFSEQ Key

NCBI FieldGuide RefSeq Status Codes REVIEWED: by NCBI staff or by a collaborator. Some RefSeq records may incorporate expanded sequence and annotation information including additional publications and features. VALIDATED: in an initial review to provide the preferred sequence standard; not yet subjected to final review at which time additional functional information may be provided. PROVISIONAL: the record has not yet been subject to individual review and is thought to be well supported and to represent a valid transcript and protein. PREDICTED: may represent an ab initio prediction or may be partially supported by other transcript data; the protein is predicted. INFERRED: by genome sequence analysis. MODEL: provided via automated processing and not subjected to individual review or revision between builds.

NCBI FieldGuide Third Party Annotation (TPA) Database Annotations of existing GenBank sequences Allows for community annotation of genomes Direct submissions –BankIt –Sequin

NCBI FieldGuide Other Databases at the NCBI dbSNP nucleotide polymorphisms GEO Gene Expression Omnibus microarray and other expression data GEO DataSets curated reports of GEO data collections of biologically and mathematically comparable GEO Samples. Structure imported structures (PDB) Cn3D viewer, NCBI curation CDD conserved domain database protein families (COGs and KOGs) single domains (PFAM, SMART, CD)

NCBI FieldGuide NCBI’s SNP Database Primary and derivative (RefSNP) Single nucleotide polymorphisms Repeat polymorphisms Insertion-deletion polymorphisms 24 Species Over 11 million refSNPs (rsXXXXXXX)

NCBI FieldGuide Non-redundant Computational Analysis BLAST hits to genome, mRNA, protein RefSNP

NCBI FieldGuide Using Entrez An integrated database search and retrieval system

Genomes Taxonomy Entrez: Database Integration PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure Word weight VAST BLAST Phylogeny

NCBI FieldGuide Home Page: Global Entrez Portal hfe

NCBI FieldGuide Global Entrez Search: HFE

NCBI FieldGuide Entrez Nucleotide: HFE 218 records Not HFE [Title]

NCBI FieldGuide Smarter Query hfe[title] AND human[orgn] 39 records Curated HFE splice variants (11 total)

NCBI FieldGuide hfe[title] AND human[orgn] (con’t) Primary data

NCBI FieldGuide Finding Primary Sequences Entrez Nucleotide 99+% GenBank (primary data) –srcdb ddbj/embl/genbank[properties]= 39,849,856 records <1% RefSeq (curated data) –srcdb refseq[properties]= 304,945 records Useful search terms in [Properties]: – srcdb : source database (e.g., srcdb genbank[prop]) – gbdiv : GenBank division (e.g., gbdiv est[prop]) – biomol : biomolecule type (e.g., biomol mrna[prop])

NCBI FieldGuide Database Queries #1 hfe 116 #2 hfe[title] AND human[orgn] 42 #3 #2 AND srcdb refseq[prop] 11 #4 #2 AND srcdb ddbj/embl/genbank[prop] 31 #5 #2 AND gbdiv pri[prop] 29 #4 #2 AND gbdiv est[prop] 2 Primate divisiongbdiv pri[prop] EST divisiongbdiv est[prop]

NCBI FieldGuide Molecule Queries #1 hfe 116 #2 hfe[title] AND human[orgn] 42 #3 #2 AND biomol mrna[prop] 29 #4 #2 AND biomol genomic[prop] 13 Genomic DNAbiomol genomic[prop] cDNAbiomol mrna[prop]

NCBI FieldGuide More Queries… RefSeq status, variants: reviewed RefSeqs with transcript variants srcdb refseq reviewed[prop] AND has transcript variants[prop] Gene symbol: human hemochromatosis (HFE) hfe[sym] AND human[organism] Disease and Gene Ontology: membrane proteins linked to cancer integral to plasma membrane[gene ontology] AND cancer[dis] Chromosome, Links: genes on human chromosome 2 with OMIM links 2[chromosome] AND gene omim[filter] AND human[organism] Protein name: topoisomerase genes from Archaea topoisomerase[gene/protein name] AND archaea[organism]

NCBI FieldGuide Other Entrez Databases UniSTS: markers on the Genethon map of human chromosome 12 Genethon[Map Name] AND human[organism] AND 12[chromosome] UniGene: rat clusters that have at least one mRNA rat[organism] NOT 0[mrna count] Structure: structures of bacterial kinases with resolutions below 2 Å bacteria[organism] AND kinase AND :002.00[resolution] SNP: uniquely mapped microsatellites on human chr2 microsat[SNP Class] AND 1[Map Weight] AND 2[Chromosome]) AND human[orgn]

NCBI FieldGuide Search by Sequence

NCBI FieldGuide Related Sequences Most similar Least similar

NCBI FieldGuide Search by Sequence: protein

NCBI FieldGuide BLink (BLAST Link)

NCBI FieldGuide BLink Output

NCBI FieldGuide BLink → Multiple sequence alignment