Download presentation
Presentation is loading. Please wait.
1
The LIBI Federated database
“GRID Computing” Tutorial 17 September 2008
2
Agenda Bioinformatic Data integration issues in grid environments
Data Federation in the LIBI platform Tutorial goals: a simple case study for querying the federated DB Designing a data abstraction model on bioinformatic information: the DDQB application
3
Issues Concerning Data Integration
In the bioinformatics’ domain, an increasing number of grid applications manage data at very large scales of both size and distribution. The complexity of data management on a grid arises from the scale, dynamism, autonomy, heterogeneity and distribution of data sources. Mission to accomplish: The goal is providing an IT layer that allows grid applications to access data without taking into account issues such as those exposed before (large scale distribution, dynamism, heterogeneity, etc.) Viable approaches: Data Federation: data are logically integrated Data Warehouse: data are physically integrated
4
Data Federation vs Data Warehouse
DF DW Data warehousing ‘cleaning up’ data and placing it into a centralized repository works well in situations where data are relatively static and data types are not too different Moving data into a warehouse can limit the specialized search capabilities available with (through) the original data source. Building and maintaining enterprise wide warehouse on the scale required by most large research organizations with hundreds of data sources can be both costly and risky to implement. Data warehouse centralization clash with the basic grid-concepts of data replication and distribution according to monitored statistics Data federation allow to access current data from multiple, heterogeneous, dislocated data sources simultaneously, with a single query For bioinformatic problems Data Federation seems the most promising solution
5
Data Federation Layer LIBI Federated Database
Name LIBI Federated Database Institution IBM Innovation Lab – Bari Service Content Relational DB federating MitoRes, UTRef, UTRSite, Pubmed, GenBank, OMIM, Uniprot, HmtDB, EMBL_CDS Description This database federates local and remote resources to provide a uniform, standard interface to access data. This federated DB consist of a rationalized composition of both ITB-owned DBs (MitoRes, UTRef, UTRSite) stored locally, of the resource HmtDB owned by Biology Dept. of Bari University, and public NCBI DBs accessed remotely (Pubmed, GenBank, OMIM). The Uniprot DB replicated locally has been federated as well. Federation provides unique inter-database relationship features that enable users to discover and extract relevant pieces of information with a single query, even if they would be originated from different DBs. Besides this prominent advantage for the LIBI-platform and its end-users, federated DB provides developers and service consumers with a homogeneous, standard interface (the SQL language) to access data stored locally and/or remotely. Availability For LIBI users and services Access DB2 DRDA service available at Main components IBM DB2, WebSphere Federator Server (with relational & non-relational wrappers) Custom components EMBL/FASTA wrapper Average user access Tens of concurrent users DBMS IBM DB2 9.1 Query language SQL Local DBs MitoRes, UTRef, UTRSite, Uniprot, EMBL_CDS Remote DBs Pubmed, GenBank, OMIM, HmtDB
6
Accessing Federated Data: WebSphere Federator Server and related activities
WebSphere Federator Server enables federated databases to access heterogeneous and distributed (local and remote) data sources. It provides a unified data-management interface (for query and insert): the SQL language Other features of the federated DB: DB user-defined functions have been built to accomplish bioinformatics-specific data handling and analyses A web graphical interface to access federated DB information has been made available (DDQB) We are developing new interfaces to expose Federated DB capabilities as Web Services Collaborations: SPACI developed a wrapper to the DB2 for GRelC, to access the federated DB from Grid environments
7
Data sources integrated in the Federated LIBI Schema
DBName Data source location connection/type DBName Data source location connection/type GenBank Web Services Species 2000 Web Services PubMed Web Services Mitores MySQL OMIM Web Services UTRef MySQL HmtDB DB2 UTRSite MySQL Uniprot LEGENDA relational database flat file web services
8
Overall Federated DB Schema
UTRef MitoRes UTRSite GenBank OMIM PUBMED HmtDB Uniprot EMBL_CDS
9
TUTORIAL: a simple case study involving federated DB (1/4)
Need Studying the regulation of the expression of the topoisomerase I at mRNA level. Biologist needs to retrieve information about regulation of expression of the topoisomerase I. To this end she decides to investigate if there are UTR sequences responsible for the regulation of the expression of the corresponding gene and retrieve pieces of information about the regulatory motif Pieces of information to retrieve Mitores Entry Accession number Mitores Entry product description Mitores Gene name UTRef Accession number UTRef UTR Type UTRSite accession number UTRSite Standard name Involved Databases MitoRes UTRef UTRSite
10
TUTORIAL: a simple case study involving federated DB (2/4)
UTRef MitoRes UTRSite GenBank OMIM PUBMED HmtDB Uniprot EMBL_CDS
11
TUTORIAL: a simple case study involving federated DB (3/4)
UTRef MitoRes UTRSite
12
TUTORIAL: a simple case study involving federated DB (4/4)
Need Studying the regulation of the expression of the topoisomerase I at mRNA level. Biologist needs to retrieve information about regulation of expression of the topoisomerase I. To this end she decides to investigate if there are UTR sequences responsible for the regulation of the expression of the corresponding gene and retrieve pieces of information about the regulatory motif Pieces of information to retrieve Under the cover Mitores Entry Accession number Mitores Entry product description UTRef Accession number UTRef Type UTRSite accession number UTRSite Standard name Involved Databases MitoRes UTRef UTRSite SQL Query SELECT DISTINCT "t1"."DESCRIPTION" AS "Description", "t2"."MITONUC_ID" AS "ID", "t3"."UTRDB_ID" AS "UTRef ID", "t3"."TYPE" AS "UTR type", "t4"."NAME" AS "Gene name", "t5"."UTRSITEID" AS "UTRSite ID", "t5"."STANDARDNAME" AS "Standard name" FROM "LIBI"."MITONUC_GENE" "t2" LEFT JOIN "LIBI"."MITONUC_GENE_PRODUCT" "t6" ON "t2"."GENE_ID" = "t6"."GENE_ID" RIGHT JOIN "DDQB"."MITONUC_PRODUCT_DDQB" "t1" ON "t6"."PRODUCT_ID" = "t1"."PRODUCT_ID" LEFT JOIN "LIBI"."MITONUC_GENE_MRNA" "t7" ON "t2"."GENE_ID" = "t7"."GENE_ID" RIGHT JOIN "DDQB"."MITONUC_MRNA_DDQB" "t8" ON "t7"."MRNA_ID" = "t8"."MRNA_ID" LEFT JOIN "DDQB"."MITONUC_UTR_VIEW" "t3" ON "t8"."MRNA_ID" = "t3"."MRNA_ID" LEFT JOIN "LIBI"."UTREF_UTR" "t9" ON "t3"."UTRDB_ID" = "t9"."ACCESSION" LEFT JOIN "LIBI"."UTREF_SIGNAL" "t10" ON "t9"."ACCESSION" = "t10"."ACCESSION" LEFT JOIN "DDQB"."UTRSITE_UTRSITE_DDQB" "t5" ON "t10"."UTRSITEID" = "t5"."UTRSITEID" LEFT JOIN "LIBI"."MITONUC_GENE_GENE_NAME" "t11" ON "t2"."GENE_ID" = "t11"."GENE_ID" RIGHT JOIN "LIBI"."MITONUC_GENE_NAME" "t4" ON "t11"."GENE_NAME_ID" = "t4"."GENE_NAME_ID" WHERE UPPER("t1"."DESCRIPTION") LIKE '%TOPOISOMERASE I%'
13
Modeling a data abstraction in bioinformatics: DDQB
IBM Data Discovery and Query Builder (DDQB) is a powerful search tool with a graphical interface that enables users with various levels of expertise to easily configure queries and leverage the full spectrum of information assets. By means of DDQB researchers can query the federated DB not in term of its physical fields, but in term of more abstract entities arranged into taxonomies that have been specifically developed for LIBI users, so queries become tasks closer to their research subjects than to informatics activities. For the DDQB tutorial:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.