Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis

Slides:



Advertisements
Similar presentations
DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
Advertisements

Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,
19 January 2007 Data Quality Meeting Alex Poulovassilis.
Intelligent Technologies Module: Ontologies and their use in Information Systems Part II Alex Poulovassilis November/December 2009.
October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.
December 2009 Data Integration in Grid Environments Alex Poulovassilis, Birkbeck, U. of London.
SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
 Goals Unambiguous description of how the investigation was performed Consistent annotation, powerful queries and data integration  Details NOT model.
Slides thanks to Steve Lynden Amy Krause EPCC Distributed Query Processing with OGSA-DQP Principles and Architectures for Structured Data Integration:
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
State of the Nation in Data Integration for Bioinformatics Carole Goble and Robert Stevens Presented by: Daya Wimalasuriya.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
OGSA Test Grid Dave Berry, Research Manager NeSC Review, 18 th March 2003.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Introduction to MDA (Model Driven Architecture) CYT.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
Information System Development Courses Figure: ISD Course Structure.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
OGSA-DAI-RDF & Its Ontology Interfaces Isao Kojima and Masahiro Kimoto Data Grid Team, Grid Technology Research Center
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.
Metadata Mòrag Burgon-Lyon University of Glasgow.
1.Registration block send request of registration to super peer via PRP. Process re-registration will be done at specific period to info availability of.
OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.
LeGE WS 16 th December 2002 SeLeNe : Self e-Learning Networks Alex Poulovassilis, Birkbeck, Univ. of London One-year Accompanying Measure for IST V.1.9.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
RUS: Resource Usage Service Steven Newhouse James Magowan
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris,
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Database Access and Integration Services Working Group
1st International Conference on Semantics, Knowledge and Grid
Grid Based Data Integration with Automatic Wrapper Generation
Scientific Workflows Lecture 15
Presentation transcript:

Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis

Project Details Members Birkbeck College European Bioinformatics Institute University of Manchester University College London

Problem Definition Biological repositories In separate locations: interoperability problems Rapidly updated/modified/evolved Overlapping data Need processing power

ISPIDER Objectives

Middleware (1/2) my Grid: collection of services/components allowing high-level integration of data/applications Taverna Workbench AutoMed heterogeneous data integration system OGSA-DAI OGSA-DQP

Middleware (2/2) AutoMed toolkit: heterogeneous data integration system - developed by Birkbeck College/Imperial College Subsumes traditional data integration approaches Handles various data models – easily extensible Virtual/materialised/ hybrid integration Data warehousing tools Schema evolution

GAV & LAV Approaches Global-As-View (GAV) approach: describe GS constructs with view definitions over LS i constructs Local-As-View (LAV) approach: describe LS i constructs with view definitions over GS constructs

GAV Example student(id,name,left,degree) = [{x,y,z,w} |  x,y,z,w,_  ug   x,_,_,_,_  phd   x,y,z,w,_  phd  w = ‘phd’] monitors(sno,id) = [{x,y} |  x,_,_,_,y  ug   x,_,_,_,_  phd   x,y  supervises ] staff(sno,sname,dept) = [{x,y,z} |  x,y,z,w,_  tutor   x,_,_  supervisor   x,y,z  supervisor ]

Both-As-View (BAV) Schema transformation approach For each pair (LS i,GS): incrementally modify LS i /GS to match GS/LS i

BAV Example Transformation pathway consists of primitive transformations Pathway contains both GAV & LAV definitions Transformations are automatically reversible Metadata in AutoMed Repository

Schema Evolution Example No need to redefine pathway Instead, simply describe the evolution as a pathway Automatic in most cases

Interoperability Sources wrapped with OGSA-DAI AutoMed wrappers extract source metadata Integration using AutoMed Queries submitted: Reformulated using AutoMed metadata Submitted to OGSA- DQP

Schema extraction AutoMed wrapper requests the schema of the data source using an OGSA-DAI service The service replies with the source schema encoded in XML The AutoMed wrapper creates the corresponding schema in the AutoMed repository

Query Processing Query is submitted to AutoMed’s GQP: Reformulated Optimised AutoMed-DQP Wrapper: IQL  OQL Submits OQL to OGSA- DQP OQL result  IQL result

Query Processing OGSA-DQP: Evaluates OQL query using QES Sends OQL result back to AutoMed DQP Wrapper: OQL result  IQL result

Future Work Workflow integration AutoMed toolkit Taverna Workbench Integration of services with XML input/output LAV Mappings from XML to one or more RDFS ontologies

Future Work AutoMed extensions: Web/Grid Services for AutoMed Data warehousing Materialised/hybrid integration Data provenance Incremental view maintenance Schema evolution

Summary ISPIDER aims to: Create an integrated platform of proteomic resources Use existing resources – produce new ones Create clients for querying, visualisation, etc. ISPIDER is using: my Grid – middleware for in silico experiments in biology OGSA-DQP – service-based distributed query processor AutoMed – heterogeneous data integration system

Project Members Birkbeck College Nigel Martin Alex Poulovassilis Lucas Zamboulis (R.A.) Hao Fan (former R.A.) European Bioinformatics Institute Rolf Apweiler Henning Hermjakob Weimin Zhu Chris Taylor Phil Jones Nisha Vinod University of Manchester Simon Hubbard Steve Oliver Suzanne Embury Norman Paton Carol Goble Robert Stevens Khalid Belhajjame (R.A.) Jennifer Siepen (R.A.) U.C.L. David Jones Christine Orengo Melissa Pentony (R.A.)