Genomes Databases and Open Access Bibliographic Resources Antonio Basílio de Miranda Laboratório de Genômica Funcional e Bioinformática Instituto Oswaldo.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
prepared with some help from friends...
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Introductory Overview
On line (DNA and amino acid) Sequence Information
Lesson 10 Bioinformatics
Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia Institut Pasteur Paris Genomes Databases and Open Access Bibliographic.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Sequence Databases What are they and why do we need them.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
Organizing information in the post-genomic era The rise of bioinformatics.
Genomics and Arabidopsis. What is ‘genomics’? Study of an organism’s entire genome –All the DNA encoded in the organism –Nucleus, mitochondria, chloroplasts.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
NCBI Literature Databases: PubMed
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Bailee Ludwig Quality Management. Before we get started…. ….Let’s see what you know about Genomics.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
1 Annotation of the bacteriophage 933W genome: an in- class interactive web-based exercise.
Introduction to Genes and Genomes with Ensembl
Annotating with GO: an overview
Archives and Information Retrieval
생물정보학 Bioinformatics.
What is Bioinformatics?
Mangaldai College, Mangaldai
Access to Sequence Data and Related Information
Genomes and Their Evolution
BIOL 433 Plant Genetics Term 2,
Gene Safari (Biological Databases)
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Genomes Databases and Open Access Bibliographic Resources Antonio Basílio de Miranda Laboratório de Genômica Funcional e Bioinformática Instituto Oswaldo Cruz Fundação Oswaldo Cruz Rio de Janeiro - Brazil

Outline General introduction and overview of complete genome sequences Genomes databases and where to find them Comparative Genomics Databases Other Omics resources Bibliographic/Open access resources

 Why use databases?  In the genomic era we have billions of data that need to be stored, curated and made accessible for analysis and knowledge discovery.  Databases are essential resources for both experimental and computational biologists.  We have crossed the Terabyte threshold of genomic data.

And what is a database system? From Oxford Dictionary:  Database: an organized body of related information.  Database system, DataBase Management System (DBMS): a software system that facilitates the creation, maintenance and use of an electronic database.

Common database models: Hierarchical Network Relational Object-relational Object Other models: Associative Concept-oriented Entity-Attribute-Value Multi-dimensional Semantic data model Semi-structured Star schema XML database

What is stored:  Nucleotide sequences  Protein sequences  Genomes  Patterns  Structures  Etc.

Some problems:  Different data formats and technologies  Different types of data  Size  Redundancy  “Hereditary” mistakes  Inconsistent annotations

Different formats – C. trachomatis pyruvate kinase

Completely sequenced genomes – a timeline  1977 first viral genome (5386 base pairs; encoding 11 genes). Sanger et al. sequence bacteriophage fX174.  1981 Human mitochondrial genome. 16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA).  1986 Chloroplast genome. 156,000 base pairs (most are 120 kb to 200 kb).  1995 first genome of a free-living organism, the bacterium Haemophilus influenzae, by TIGR, 1830 Kb, 1713 genes.

 1996 first genome of an archaeal genome: Methanococcus jannaschii DSM 2661, by TIGR, 1664 Kb, 1773 genes.  1997 first eukaryotic genome : Saccharomyces cerevisiae S288C; International collaboration; 16 Chromosomes; 12,057 Kb, ~6000 genes.  1998 first multicellular organism Nematode Caenorhabditis elegans; 97 Mb; ~19,000 genes.  1999: first human chromosome: Chromosome 22 (49 Mb, 673 genes).  2000 Fruitfly Drosophila melanogaster (137 Mb; ~13,000 genes).

 2000 first plant genome: Arabidopsis thaliana (115,428 Mb; genes  2001 draft sequence of the human genome (3300 Mb; ~28000 genes)  2002 Plasmodium falciparum (22,9 Mb; 5334 genes)  2002 mouse genome (2700 Mb; ~28000 genes)  2004 Fish draft Tetraodon nigroviridis genome (x Mb; ~28000 genes);  2005 Dog (41Mb, genes) and chicken genomes ( genes)

 2007 James Watson’s genome is sequenced.  2007 Craig Venter publishes the results of his own sequenced genome.  October 2013 Deadline for the X Prize Foundation challenge to sequence 100 human genomes for less than $10,000 each.

projects 827 published ( ) 1842 bacteria 90 archaea 936 eukaryotes 130 metagenomes

Genome sequencing projects There are several web-based resources that document the progress of completely sequenced genomes and their reference publications, including: GOLD - Genomes Online Database

How big are genome sizes? Viral genomes: 1 kb to 360 kb ( Canarypox virus) Note: Mimivirus: 1.2 Mb ( ) (Top 100 largest viral genome sequences) Bacterial genomes: 0.5 Mb to 13 Mb; Eukaryotic genomes: 8 Mb to 670 Gb; Database of Genome sizes:

Genome size and database increase

BIOLOGICAL DATABASE CATEGORIES Databases of nucleic acid sequences (RNA, DNA) Databases of protein sequences Databases of protein motifs and protein domains Databases of structures Databases of genomes Databases of genes Databases of expression profiles Databases of SNPs and mutations Databases of metabolic pathways and protein associations Databases of taxonomy …

Can we find a list of ‘clean’ databases ?

The NAR database issue The 2008 update includes 1078 databases, 110 more than the previous one. 98 new databases updates of 84 existing databases 25 obsolete databases removed! The complete database list and summaries are available online on the Nucleic Acids Research web site

NAR database category list Nucleotide Sequence Databases RNA sequence databases Protein sequence databases Structure Databases Genomics Databases (non-vertebrate) Metabolic and Signalling Pathways Human and other Vertebrate Genomes Human Genes and Diseases Microarray Data and other Gene Expression Databases Proteomics Resources Other Molecular Biology Databases Organelle databases Plant databases Immunological databases

Genomics Databases (non-vertebrate) –MGD - Mouse Genome Database –TIGR Gene Indices –Genome annotation terms, ontologies and nomenclature –Taxonomy and identification –General genomics databases –Viral genome databases –Prokaryotic genome databases –Unicellular eukaryotes genome databases –Fungal genome databases –Invertebrate genome databases

Three types of genome databases: Databases which collect data of all sequenced genomes (Entrez_Genomes; EBI_genomes) Databases which collect data of a category of organisms with sequenced genomes (Microbial Genomes at TIGR) Databases specific for one organism with sequenced genomes (Flybase, MGD, Ensembl)

What kind of information you find there? Genome databases contain genomic information collected from many sources. – Genome assembly – Gene predictions – Known genes, mRNA, ESTs, proteins – Genetic maps, markers and polymorphisms – Gene expression and phenotypes – Annotations – Interspecies homologues

Resources for genomes There are two main resources for genomes: EBIEuropean Bioinformatics Institute NCBINational Center for Biotechnology Information But many others resources from sequencing Institutions: SangerThe welcome Trust Sanger Institute TIGRThe Institute for Genomic Research Genolevureshttp://cbi.labri.fr/Genolevures/index.phphttp://cbi.labri.fr/Genolevures/index.php

Databases by phylogenetic groups Eucaryotic genomes: Bacteria, fungi genomes: =11:Fungi|12: =11:Fungi|12 Insects: p=11:|12:Insects p=11:|12:Insects Plant genomes:

The Entrez System Entrez PopSet Structure PubMed Books 3D Domains Taxonomy GEO/GDS UniGene Nucleotide Protein Genome OMIM CDD/CDART Journals SNP UniSTS PubMed Central

RefSeq Contig RefSeq Contig BAC WGS Other GenBank Other GenBank RefSeq Transcript RefSeq Transcript UniGene Transcript UniGene Transcript Mouse assembly

Maps and Options

Some common features of genomic databases:  Possibility to download all the sequences of the genome or part of them (chromosomes, clones, genes, CDS,..)  Most of them have a corresponding protein resource (the set of proteins obtained by translating all CDS – conceptual translation)  Example: Entrez-Genome of the NCBI Genpept

Comparative genomics Analyses of the genetic material of different species help in the understanding of the similarities and differences between genomes, their evolution and the evolution of their genes. Intra-genomic comparisons help understanding the degree of duplication (genome regions; genes) and genes organization,... Inter-genomic comparisons help understanding the degree of similarity between genomes; degree of conservation between genes; Understanding gene and genome evolution

Internet resources for whole-genome comparative analysis and associated tools UCSC Genome4 Bioinformaticshttp://genome.ucsc.edu/ Ensemblhttp:// MapViewerhttp:// VISTA Genome Browserhttp://pipeline.lbl.gov/ K-BROWSER Comparative Regulatory Genomicshttp://corg.molgen.mpg.de/ GALAhttp:// EnsMarthttp:// ETOPEhttp:// PipMaker and MultiPipMakerhttp:// VISTA serverhttp://www-gsd.lbl.gov/vista/ MAVID serverhttp://baboon.math.berkeley.edu/mavid/ zPicture serverhttp://zpicture.dcode.org/ rVISTA serverhttp://rvista.dcode.org/ COGshttp://

NCBI Homo sapiens Genome: Statistics -- Build 36 version 2 Protein coding genes: 21,541

General considerations:  Organism specific databases can be more up-to-date than general databases.  Genome databases are not a one stop shop for all information, other databases like UniProt are still needed!

Bibliographic Databases and Open Access resources

Pubmed - An access to more than 12 millions papers since 1950 (3790 jounals). Simple and advanced literature Search with keywords, author name, MESH terms, journals, single citation,.. Some papers are free from the journal website or through the editors.

Free access journals Authors pay to allow readers to get the papers free The BMC initiative The Plos initiative Other initiatives: some journals are giving immediate free online access and others after a few (1-12) months from publication

The HINARI initiative The Health InterNetwork Access to Research Initiative (HINARI) provides free or very low cost online access to the major journals in biomedical and related social sciences to local, not-for-profit institutions in developing countries. HINARI was launched in January 2002, with some 1500 journals from 6 major publishers. 22 additional publishers joined in May 2002, bringing the total number of journals to over Today more than 70 publishers are offering their content in HINARI and others will soon be joining the programme.

Wim Maurits Degrave – Pesquisador Titular Antonio Basílio de Miranda – Pesquisador Associado Nicolas Carels – Pesquisador Visitante Fábio Faria da Mota – Pesquisador Visitante Thomas Dan Otto – Pesquisador Visitante Marcos Catanho – Aluno de Doutorado (BCM – IOC) Ana Carolina Guimarães – Aluno de Doutorado (BCM – IOC) Flávio Engelke – Aluno de Mestrado (PCM - UERJ) Monete Rajão – Aluna de Mestrado (BCS – IOC) Erica Ramos Cardoso - Bolsista PIBITI Laboratório de Genômica Funcional e Bioinformática Instituto Oswaldo Cruz