Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA

Slides:



Advertisements
Similar presentations
1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Advertisements

Distributed Data Processing
SRI International Bioinformatics Comparative Analysis Q
The database approach to data management provides significant advantages over the traditional file-based approach Define general data management concepts.
Introduction to Databases
Management Information Systems, Sixth Edition
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The Pathway Tools Schema. SRI International Bioinformatics Motivations for Understanding Schema Pathway Tools visualizations and analyses depend upon.
Overview of Genome Databases Peter D. Karp, Ph.D. SRI International www-db.stanford.edu/dbseminar/seminar.html.
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
DataFoundry: An Approach to Scientific Data Integration Terence Critchlow Ron Musick Ida Lozares Center for Applied Scientific Computing Tom SlezakKrzystof.
KEGG: Kyoto Encyclopedia of Genes and Genomes Susan Seo Intro to Bioinformatics Fall 2004.
Database Management: Getting Data Together Chapter 14.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with.
Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Database Management System Lecture 2 Introduction to Database management.
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Metagenomic Analysis Using MEGAN4
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Introduction to Database Systems Motivation Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Database System Concepts and Architecture
Management Information Systems By Effy Oz & Andy Jones
Fundamentals of Database Chapter 7 Database Technologies.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
Semantic Web for Life Sciences Workshop Session VII: Semantic Aggregation, Integration, and Inference Moderator: Joanne Luciano October, Cambridge,
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
1 SRI International Bioinformatics And now for our ‘Feature’ presentation: Automatic Loading of Protein Sequence Annotation Data from UniProt to Pathway.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
A collaborative tool for sequence annotation. Contact:
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
High throughput biology data management and data intensive computing drivers George Michaels.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
The LIBI Federated database
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
An Advanced Web Query Interface for Biological Databases
The Pathway Tools Schema
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Database Architectures and the Web
Bioinformatics Research Group
Bioinformatics Capstone Project
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Mangaldai College, Mangaldai
Overview of Microbial Pathway and Genome Databases
Presentation transcript:

Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA

SRI International Bioinformatics Main Message Interoperation of molecular-biology databases is a challenging problem of critical importance DOE should initiate a program in interoperation of molecular biology databases l Pursue both warehouse approach and multidatabase approach l Major progress possible within 5 years

SRI International Bioinformatics Motivations Important biological problems require access to multiple bioinformatics databases Different problems require different sets of databases Hundreds of bioinformatics databases exist l Nucleic Acids Research 32:2004 – Database issue l Nucleic Acids Research DB list: u 350 databases listed in 2002 u 560 databases listed in 2004 Applications of integration include l Complex queries l Comparison of overlapping sources l Data mining

SRI International Bioinformatics Bioinformatics Databases Tremendous progress in point-and-click access for biologist users Less progress toward providing a computable, interoperable infrastructure for large-scale data mining Every large-scale mining/learning problem requires time consuming crafting of input/training datasets

SRI International Bioinformatics Warehouse Approach vs Multidatabase Approach Multidatabase query approaches assume databases are in a queryable DBMS Most sites that do operate DBMSs do not allow remote query access because of security and loading concerns Users want to control data stability Users want to control hardware applied to problem Internet bandwidth limits query throughput Users need to capture, integrate and publish locally produced data of different types Replicating and refreshing very large sources is expensive Multidatabase and Warehouse approaches complementary

SRI International Bioinformatics SRI BioWarehouse Project Goal Create a toolkit for constructing bioinformatics database warehouses that integrate sets of bioinformatics databases into one physical DBMS

SRI International Bioinformatics BioWarehouse Approach Warehouse schema defines many bioinformatics datatypes Create loaders for public bioinformatics DBs l Parse file format for the DB l Apply semantic transformations l Insert database into warehouse tables Oracle and MySQL implementations Warehouse query access mechanisms l SQL queries via JDBC,Lisp,Perl, ODBC, OAA

SRI International Bioinformatics Warehouse Schema Manages many bioinformatics datatypes simultaneously l Pathways, Reactions, Chemicals l Proteins, Genes, Replicons l Sequences, Sequence Features l Organisms, Taxonomic relationships l Computations (sequence matches) l Citations, Controlled vocabularies l Links to external databases Each type of warehouse object implemented through one or more relational tables (currently 43)

SRI International Bioinformatics Warehouse Schema Manages multiple datasets simultaneously l Dataset = Single version of a database l Allows version comparison l Multiple software tools or experiments require access to different versions Each dataset is a warehouse entity Every warehouse object is registered in a dataset Different databases storing the same biological datatypes are coerced into same warehouse tables Design of most datatypes inspired by multiple databases Representational tricks to decrease schema bloat l Single space of primary keys l Single set of satellite tables such as for synonyms, citations, comments, etc.

SRI International Bioinformatics Current Databases Supported by BioWarehouse BioCyc l 15 genomes and metabolic networks Swiss-Prot, TrEMBL l 1.3M proteins ENZYME KEGG NCBI Taxonomy CMR l 105 genomes, 250K genes, 250K proteins Applications: l DARPA BioSpice program on biological simulation l Study of sequence coverage of known enzymes

SRI International Bioinformatics Summary Interoperation of molecular-biology databases is a challenging problem of critical importance DOE should initiate a program in interoperation of molecular biology databases l Pursue both warehouse approach and multidatabase approach l Major progress possible within 5 years