The LIBI Federated database

Slides:



Advertisements
Similar presentations
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Advertisements

Technical Architectures
Organizing Data & Information
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Chapter 1 Overview of Databases and Transaction Processing.
Introduction to Database Systems Motivation Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Web-Enabled Decision Support Systems
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
311: Management Information Systems Database Systems Chapter 3.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
Database A database is a collection of data organized to meet users’ needs. In this section: Database Structure Database Tools Industrial Databases Concepts.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Federated Database Set Up Greg Magsamen ITK478 SIA.
Advanced Accounting Information Systems Day 10 answers Organizing and Manipulating Data September 16, 2009.
DATA RESOURCE MANAGEMENT
A collaborative tool for sequence annotation. Contact:
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
Platinum DecisionBase1 DW Product Platinum - Computer AssociatesDecisionBase Hyunsook Lim Database Laboratory Dept. of CSE.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
Data Resource Management Chapter 5 McGraw-Hill/IrwinCopyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Fundamental of Database Systems
Fundamentals of Information Systems, Sixth Edition
Introduction to Computing Lecture # 13
An Introduction to database system
Database Management:.
Bio68: Bioinformatics Databases
Fundamentals of Information Systems, Sixth Edition
Fundamentals & Ethics of Information Systems IS 201
OO Methodology OO Architecture.
Database Architectures and the Web
ICT Database Lesson 1 What is a Database?.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Database Management  .
Tools for Memory: Database Management Systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Database Management System (DBMS)
Big Data The huge amount of data being collected and stored about individuals, items, and activities and to the process of drawing useful information from.
Basic Concepts in Data Management
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
Overview of Databases and Transaction Processing
Grid Data Integration In the CMS Experiment
Data Resource Management
Chapter 1: The Database Environment
MANAGING DATA RESOURCES
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Chapter 1: The Database Environment
The Database Environment
Database Design Hacettepe University
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
DATABASES WHAT IS A DATABASE?
Data Resource Management
DATABASE TECHNOLOGIES
Chapter 3 Database Management
Database System Concepts and Architecture
The Database Environment
Database Management Systems and Enterprise Software
Data Warehouse and OLAP Technology
Presentation transcript:

The LIBI Federated database “GRID Computing” Tutorial 17 September 2008

Agenda Bioinformatic Data integration issues in grid environments Data Federation in the LIBI platform Tutorial goals: a simple case study for querying the federated DB Designing a data abstraction model on bioinformatic information: the DDQB application

Issues Concerning Data Integration In the bioinformatics’ domain, an increasing number of grid applications manage data at very large scales of both size and distribution. The complexity of data management on a grid arises from the scale, dynamism, autonomy, heterogeneity and distribution of data sources. Mission to accomplish: The goal is providing an IT layer that allows grid applications to access data without taking into account issues such as those exposed before (large scale distribution, dynamism, heterogeneity, etc.) Viable approaches: Data Federation: data are logically integrated Data Warehouse: data are physically integrated

Data Federation vs Data Warehouse DF DW Data warehousing ‘cleaning up’ data and placing it into a centralized repository works well in situations where data are relatively static and data types are not too different Moving data into a warehouse can limit the specialized search capabilities available with (through) the original data source. Building and maintaining enterprise wide warehouse on the scale required by most large research organizations with hundreds of data sources can be both costly and risky to implement. Data warehouse centralization clash with the basic grid-concepts of data replication and distribution according to monitored statistics Data federation allow to access current data from multiple, heterogeneous, dislocated data sources simultaneously, with a single query For bioinformatic problems Data Federation seems the most promising solution

Data Federation Layer LIBI Federated Database Name LIBI Federated Database Institution IBM Innovation Lab – Bari Service IBM-DB2@213.26.249.183 Content Relational DB federating MitoRes, UTRef, UTRSite, Pubmed, GenBank, OMIM, Uniprot, HmtDB, EMBL_CDS Description This database federates local and remote resources to provide a uniform, standard interface to access data. This federated DB consist of a rationalized composition of both ITB-owned DBs (MitoRes, UTRef, UTRSite) stored locally, of the resource HmtDB owned by Biology Dept. of Bari University, and public NCBI DBs accessed remotely (Pubmed, GenBank, OMIM). The Uniprot DB replicated locally has been federated as well. Federation provides unique inter-database relationship features that enable users to discover and extract relevant pieces of information with a single query, even if they would be originated from different DBs. Besides this prominent advantage for the LIBI-platform and its end-users, federated DB provides developers and service consumers with a homogeneous, standard interface (the SQL language) to access data stored locally and/or remotely. Availability For LIBI users and services Access DB2 DRDA service available at 213.26.249.183 Main components IBM DB2, WebSphere Federator Server (with relational & non-relational wrappers) Custom components EMBL/FASTA wrapper Average user access Tens of concurrent users DBMS IBM DB2 9.1 Query language SQL Local DBs MitoRes, UTRef, UTRSite, Uniprot, EMBL_CDS Remote DBs Pubmed, GenBank, OMIM, HmtDB

Accessing Federated Data: WebSphere Federator Server and related activities WebSphere Federator Server enables federated databases to access heterogeneous and distributed (local and remote) data sources. It provides a unified data-management interface (for query and insert): the SQL language Other features of the federated DB: DB user-defined functions have been built to accomplish bioinformatics-specific data handling and analyses A web graphical interface to access federated DB information has been made available (DDQB) We are developing new interfaces to expose Federated DB capabilities as Web Services Collaborations: SPACI developed a wrapper to the DB2 for GRelC, to access the federated DB from Grid environments

Data sources integrated in the Federated LIBI Schema DBName Data source location connection/type DBName Data source location connection/type GenBank Web Services Species 2000 Web Services PubMed Web Services Mitores MySQL OMIM Web Services UTRef MySQL HmtDB DB2 UTRSite MySQL Uniprot LEGENDA relational database flat file web services

Overall Federated DB Schema UTRef MitoRes UTRSite GenBank OMIM PUBMED HmtDB Uniprot EMBL_CDS

TUTORIAL: a simple case study involving federated DB (1/4) Need Studying the regulation of the expression of the topoisomerase I at mRNA level. Biologist needs to retrieve information about regulation of expression of the topoisomerase I. To this end she decides to investigate if there are UTR sequences responsible for the regulation of the expression of the corresponding gene and retrieve pieces of information about the regulatory motif Pieces of information to retrieve Mitores Entry Accession number Mitores Entry product description Mitores Gene name UTRef Accession number UTRef UTR Type UTRSite accession number UTRSite Standard name Involved Databases MitoRes UTRef UTRSite

TUTORIAL: a simple case study involving federated DB (2/4) UTRef MitoRes UTRSite GenBank OMIM PUBMED HmtDB Uniprot EMBL_CDS

TUTORIAL: a simple case study involving federated DB (3/4) UTRef MitoRes UTRSite

TUTORIAL: a simple case study involving federated DB (4/4) Need Studying the regulation of the expression of the topoisomerase I at mRNA level. Biologist needs to retrieve information about regulation of expression of the topoisomerase I. To this end she decides to investigate if there are UTR sequences responsible for the regulation of the expression of the corresponding gene and retrieve pieces of information about the regulatory motif Pieces of information to retrieve Under the cover Mitores Entry Accession number Mitores Entry product description UTRef Accession number UTRef Type UTRSite accession number UTRSite Standard name Involved Databases MitoRes UTRef UTRSite SQL Query SELECT DISTINCT "t1"."DESCRIPTION" AS "Description", "t2"."MITONUC_ID" AS "ID", "t3"."UTRDB_ID" AS "UTRef ID", "t3"."TYPE" AS "UTR type", "t4"."NAME" AS "Gene name", "t5"."UTRSITEID" AS "UTRSite ID", "t5"."STANDARDNAME" AS "Standard name" FROM "LIBI"."MITONUC_GENE" "t2" LEFT JOIN "LIBI"."MITONUC_GENE_PRODUCT" "t6" ON "t2"."GENE_ID" = "t6"."GENE_ID" RIGHT JOIN "DDQB"."MITONUC_PRODUCT_DDQB" "t1" ON "t6"."PRODUCT_ID" = "t1"."PRODUCT_ID" LEFT JOIN "LIBI"."MITONUC_GENE_MRNA" "t7" ON "t2"."GENE_ID" = "t7"."GENE_ID" RIGHT JOIN "DDQB"."MITONUC_MRNA_DDQB" "t8" ON "t7"."MRNA_ID" = "t8"."MRNA_ID" LEFT JOIN "DDQB"."MITONUC_UTR_VIEW" "t3" ON "t8"."MRNA_ID" = "t3"."MRNA_ID" LEFT JOIN "LIBI"."UTREF_UTR" "t9" ON "t3"."UTRDB_ID" = "t9"."ACCESSION" LEFT JOIN "LIBI"."UTREF_SIGNAL" "t10" ON "t9"."ACCESSION" = "t10"."ACCESSION" LEFT JOIN "DDQB"."UTRSITE_UTRSITE_DDQB" "t5" ON "t10"."UTRSITEID" = "t5"."UTRSITEID" LEFT JOIN "LIBI"."MITONUC_GENE_GENE_NAME" "t11" ON "t2"."GENE_ID" = "t11"."GENE_ID" RIGHT JOIN "LIBI"."MITONUC_GENE_NAME" "t4" ON "t11"."GENE_NAME_ID" = "t4"."GENE_NAME_ID" WHERE UPPER("t1"."DESCRIPTION") LIKE '%TOPOISOMERASE I%'

Modeling a data abstraction in bioinformatics: DDQB IBM Data Discovery and Query Builder (DDQB) is a powerful search tool with a graphical interface that enables users with various levels of expertise to easily configure queries and leverage the full spectrum of information assets. By means of DDQB researchers can query the federated DB not in term of its physical fields, but in term of more abstract entities arranged into taxonomies that have been specifically developed for LIBI users, so queries become tasks closer to their research subjects than to informatics activities. For the DDQB tutorial: http://213.26.249.183/