2010.09-28 IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.

Slides:

Advertisements

Similar presentations

NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.

Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.

Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.

Provenance in a Collaborative Bio-database RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for.

The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Gene Ontology John Pinney

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.

EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.

Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis

Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.

Archives and Information Retrieval

InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Lecture 2.21 Retrieving Information: Using Entrez.

Protein Databases EBI – European Bioinformatics Institute

The Cell, Central Dogma and Human Genome Project.

Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center

The Protein Data Bank (PDB)

Proteins and Protein Function Charles Yan Spring 2006.

Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute

EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:

Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.

An Introduction to Bioinformatics Molecular Biology Databases.

Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD

Course Module: Introduction to Bioinformatics – CS 2001 July CS Databases.

Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.

On line (DNA and amino acid) Sequence Information

Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)

Development of Bioinformatics and its application on Biotechnology

Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose

Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.

Bioinformatics for biomedicine

X-ray crystallography NMR cryoEM Experimental approaches for structural biology.

Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas

NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.

CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.

Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.

UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.

Korea BioInformation Center Byoung-Chul Kim

Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,

BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.

Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.

Protein World SARA Amsterdam Tim Hulsen.

Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.

Asp/IEETA Health-Grid Workshop Brussels 20 th September 2002 A. Sousa Pereira Univ. Aveiro - IEETA.

Protein and RNA Families

Copyright OpenHelix. No use or reproduction without express written consent1.

Biological databases Exercises. Discovery of distinct sequence databases using ensembl.

Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.

Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.

Bioinformatics and Computational Biology

Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.

EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.

Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,

InterPro Sandra Orchard.

Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.

The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.

 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?

EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.

Cheminformatics and Metabolism Team The EBI Enzyme Portal.

Introduction to PubChem BioAssay

Biological Databases By: Komal Arora.

Archives and Information Retrieval

Functional Annotation of the Horse Genome

Genome Annotation Continued

PIR: Protein Information Resource

SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.

Presentation transcript:

IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT

IST Computational Biology2 Sizing Biological Information This week (20 Sept. 2010) the EMBL Database contained nucleotides in 195,945,264 entries.

IST Computational Biology3 Sizing Biological Information Release 2010_09 of 10-Aug-10 of UniProtKB/Swiss-Prot contains sequence entries, comprising amino acids abstracted from references. 998 sequences have been added since release 2010_08, the sequence data of 160 existing entries has been updated and the annotations of entries have been revised. Protein existence (PE): entries % Evidence at protein level % Evidence at transcript level % Inferred from homology % Predicted % Uncertain %

IST Computational Biology4 Sizing Biological Information

IST Computational Biology5 Sizing Biological Information

IST Computational Biology6 Protein Structures X-RAY NMR8588 ELECTRON MICROSCOPY306 HYBRID26 other147 Total RSCB - PDB

IST Computational Biology7 Data deluge, where from Sequencing (NGS, SMS) Microarray experiments Parallelized drug screening and testing Other

IST Computational Biology8 Gene Ontology – towards consistent descriptions The need to produce consistent effective searches Uniform terminology Controlled vocabulary Hierarchical relations

IST Computational Biology9 Gene Ontology

IST Computational Biology10 Specialized Search tools Searching on specific fields is relatively easy Using keywords allows indexed searching on text fields Searching sequence data is more complex Similarity search: BLAST is a fast way of searching sequence data for similarity Some databases of nucleotide or protein sequences are formatted for BLAST

IST Computational Biology11 Interoperability Adherence to standards Minimal experiment descriptions Ontological concerns Integration Warehousing

IST Computational Biology12 Bibliography DBs Pubmed (Medline) “Entrez” searching Data Mining in text Tagged text to avoid loss (Utopia doucuments).

IST Computational Biology13 Medical Subject Headings Part of the NLM/Pubmed effort. MESH is a seacheable database. Controlled Vocabulary Disambiguation Term relationships Spelling:Hemoglobin or Haemoglobin? Context:NMR spectrocopy or imaging?

IST Computational Biology14 More on bibliography Web of knowledge b-on Institutional repositories PubCrawler (alerts)

IST Computational Biology15 Structural Protein DBs Primary Coordinates from X-ray diffraction, NMR, etc Composition from UniprotKB Properties from annotations

IST Computational Biology16 Specialized DBs Binding sites SNPs

IST Computational Biology17 Classification of Proteins CATH Classification, Architecture, Topology, Homology SCOP Structural Classification of Proteins

IST Computational Biology18 Integrated DBs Built to aggregate other databases Provide common search Calculate cross linking tables Interpro –Results from integrating several derivative databases such as PRINTS; PROSITE; SMART; ProDom; Pfam; TIGRfam

IST Computational Biology19 Knowledge bases Uniprot (Swissprot/PIR/TREmbl) ENSEMBL (genome centered) GeneCards (gene centered)

IST Computational Biology20 GeneCards

IST Computational Biology21 GeneCards

IST Computational Biology22 GeneCards – expression data

IST Computational Biology23 Clinical OMIM Mendelian inheritance, human diseases HGMD Mutations and associated human diseases dbSNP SNPs in >1% incidence

IST Computational Biology24 The synchronization issue Many copies of public databases (version control) Content update on primary and derived databases influences integration Inconsistencies are slow to resolve Indexes need frequent recalculation

IST Computational Biology25 Purifying content Efforts are in place to enhance contents of derived databases For example, manual curation of genomic databases in specific sectors, such as eukariots, human, plants, etc.

IST Computational Biology26 HAVANA Manual annotation by chromosome in human genome.

IST Computational Biology27 ENCODE Project to review functional parts of the human genome in fine detail