The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information www.ncbi.nih.gov Database Resources.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
Pharmacy Information Resources TTUHSC Preston Smith Library presents Rev. 08/2014.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Evidence-Based Information Retrieval in Bioinformatics
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Chapter 14 The Second Component: The Database.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Databases and Database Management Systems
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Introductory Overview
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
Database Lecture # 1 By Ubaid Ullah.
Gene Expression Omnibus (GEO)
Sequence Databases What are they and why do we need them.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
Key Applications Module Lesson 21 — Access Essentials
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
Organizing information in the post-genomic era The rise of bioinformatics.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
ITGS Databases.
NCBI Literature Databases: PubMed
Gene Expression Omnibus (GEO)
Bioinformatics and Computational Biology
Computer Storage of Sequences
Information Systems Today: Managing in the Digital World TB3-1 3 Technology Briefing Database Management “Modern organizations are said to be drowning.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
DATABASE.
NCBI Molecular Biology Resources
Using ArrayExpress.
Biological databases: Collection, storage and maintenance
What is Bioinformatics?
Functional Annotation of the Horse Genome
Mangaldai College, Mangaldai
Gene Expression Omnibus (GEO)
محسن شیرازی کارشناسي علوم کتابداري و اطلاع رساني پزشکی
Introduction to Bioinformatics
Biological Databases BI420 – Introduction to Bioinformatics
Lesson 3 Bioinformatics Laboratory
The ultimate in data organization
Presentation transcript:

The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources

…. to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. What does this involve ? creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules. NCBI Mission

What is a Database ? A model or representation of some aspect of the real world An organized collection of data. May contain many different types of data Coherent, consistent and designed for a specific purpose A computational system for managing and querying the data.

A collection of information organized in such a way that a computer program can quickly select desired pieces of data. An electronic filing system Traditional databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number. What is a Database ?

To access information from a database, you need a database management system (DBMS). This is a collection of programs that enables you to enter, organize, and select data in a database. Most molecular biology databases primarily use relational database management systems (RDBMS). What is a Database ?

A relational database is like a large spreadsheet. Each field is a column, each row is an entry. Relational databases use a set of tables to organize data. Each entry must be unambiguously identified Names are not reliable e.g. incorrectly assigned gene function Unique IDs (UID)s are used, e.g. in GenBank these are called accession numbers UIDNameSequenceQuality Value BU039022PP_LEa0001A01fCATACAAAT …35 BU039057PP_LEa0001B17fTACGGCTAC …28 Relational Database

Achieving consistency Repeated information is stored in a single place. Only one copy needs to be updated Sequence UID Definition Locus Accession Taxonomy ID* Sequence Taxonomy Taxonomy ID* Genus Species Ref Index* UID Medline ID Ref Index Medline ID Authors Title Journal * May be referred to by a secondary ID * May be referred to indirectly via an index Relational Database

Language used is SQL or structured query language Easy to understand (essentially English?) Relatively consistent across RBDMS Supplies a set of commands to define tables, insert data and make queries Queries SELECT some fields FROM some table WHERE some condition is met E.g. select accession, sequence FROM sequence WHERE Accession = BU BU CATACAAATACTGCTACHTAAATC …. More complex queries require two or more tables be joined to produce a result Relational Database

Most RDBMS do not allow users to directly query the database by SQL. An ill formed query can overload or crash the system SQL still too complex for biologists? Provide a search interface for the user instead E.g. user enters a phrase and the database identifies what part of the database should be searched. The queries that make it through the web interface have to be translated to SQL Relational Database

Relational database : Example GenBank Query

What Constitutes a Good Database ? Broad coverage of the chosen topic Up to date information gathering Curated Support staff Commitment to the future Good query interface Issues for Molecular Biological Databases ? Annotation Archives Updates Redundancy

Issues for Molecular Biological Databases ? Annotation Adding biological information to genome sequence. Textual descriptive information Correctness Many genes are incorrectly annotated. May assign a function to a novel gene from a similar sequence that may itself be incorrectly annotated so the error is propagated throughout the database. Routine error Quality Expert or non expert curation? Who provided the curation? Is there any biological verification? What vocabulary is used Has their been any peer review ?

Issues for Molecular Biological Databases ? Archival Quality Is the database archival or curated Can the same data be recovered later Don’t overwrite primary key (each accession numbers) The best databases note any changes to the data. Updates How often is the database updated? Major databases take direct submissions Only the direct submitter can make changes, even if you can prove its wrong. When is a sequence finished ? How is annotation updated as more knowledge is available Redundancy This is a major issue, how do we deal with it without losing potentially valuable information. Also relates to archival quality

Genbank is the genetic sequence database of all publicly available DNA and derived protein sequences, with annotations describing the biological information in them. GenBank is hosted within NCBI Researchers submit their sequences to GenBank NCBI provides analysis and retrieval resources for the data in GenBank (and many other NCBI hosted databases). NCBI and GenBank

NCBI Databases ( Nucleotide Database EST (dbEST) GSS (dbGSS) Protein Database Structure Database Genome 3D Domains Conserved Domains UniSTS Gene UniGene HomoloGene Reference Sequence (refseq) SNP (dbSNP) dbVAR – large scale genomic variation dbGAP – integration of genotype & phenotype PopSet Database Taxonomy Database GEO Profiles GEO Datasets Cancer Chromosomes Epigenomics PubMed Central Journals MeSH Bookshelf OMIM Database

Retrieving Data from NCBI using Entrez Entrez is a text based retrieval system that integrates all the information resources available at the NCBI such as; 1.Scientific literature 2.DNA and protein sequence databases 3.3D protein structure and protein domain data 4.Population study datasets 5.Expression data 6.Assemblies of complete genomes 7.Taxonomic information

http :// _

Create/login to the myNCBI portal

Understanding GenBank records Go to Click on the links on the left to get a description of what the term means, Copy the description into a word document and after completed, save the document on your drupal web site

Entrez Sequences Help