Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Slides:



Advertisements
Similar presentations
DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
Advertisements

Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
19 January 2007 Data Quality Meeting Alex Poulovassilis.
October 2007 Data integration architectures and methodologies for the Life Sciences Alexandra Poulovassilis, Birkbeck, U. of London.
December 2009 Data Integration in Grid Environments Alex Poulovassilis, Birkbeck, U. of London.
SeLeNe Kick-off Meeting 15-16/11/2002 SeLeNe-related Research At Birkbeck Alex Poulovassilis and Peter T.Wood Database and Web Technologies Group School.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Intersection Schemas as a Dataspace Integration Technique 8/21/20141 Richard BrownlowAlex Poulovassilis.
An Overview of OGSA-DAI Kostas Tourlas
Information Integration Using Logical Views Jeffrey D. Ullman.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Slides thanks to Steve Lynden Amy Krause EPCC Distributed Query Processing with OGSA-DQP Principles and Architectures for Structured Data Integration:
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
State of the Nation in Data Integration for Bioinformatics Carole Goble and Robert Stevens Presented by: Daya Wimalasuriya.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
The Integration of Peer-to-peer and the Grid to Support Scientific Collaboration Tran Vu Pham, Lydia MS Lau & Peter M Dew {tranp, llau &
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
WSRF Supported Data Access Service (VO-DAS)‏ Chao Liu, Haijun Tian, Dan Gao, Yang Yang, Yong Lu China-VO National Astronomical Observatories, CAS, China.
Introduction to MDA (Model Driven Architecture) CYT.
DAIS Grid1 Database Access and Integration Services on the Grid * * Authors: N. Paton, M. Atkinson, V.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Data Integration by Bi-Directional Schema Transformation Rules Data Integration by Bi-Directional Schema Transformation Rules By Peter McBrien and Alexandria.
1 XML Based Networking Method for Connecting Distributed Anthropometric Databases 24 October 2006 Huaining Cheng Dr. Kathleen M. Robinette Human Effectiveness.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Grids - the near future Mark Hayes NIEeS Summer School 2003.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
OGSA-DAI.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Aberdeen, 28/1/2003 AutoMed: Automatic generation of Mediator tools for heterogeneous data integration Alex Poulovassilis School of Computer Science and.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Metadata Mòrag Burgon-Lyon University of Glasgow.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
CIS 210 Systems Analysis and Development Week 8 Part II Designing Distributed and Internet Systems,
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Visit to HP Labs, 22/10/2002 Heterogeneous information integration Alex Poulovassilis Database and Web Technologies Group School of Computer Science and.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
OGSA-DAI.
A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris,
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Norman Paton University of Manchester
Grid Metadata Management
Database Access and Integration Services Working Group
Grid Based Data Integration with Automatic Wrapper Generation
Supporting High-Performance Data Processing on Flat-Files
SDMX IT Tools SDMX Registry
Presentation transcript:

Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard, S. M. Embury, N. W. Paton

Overview The ISPIDER project Data Access & Integration of Proteomics Resources Challenges Middleware Proteomics resources & global schema System architecture & query processing Future Work

ISPIDER Project Goals: Build an integrated platform of proteomic resources Use existing resources – produce new ones Create clients for querying, visualisation, etc.

ISPIDER Objective: develop an integrated platform of proteome-related resources, using existing standards Benefits: Access to increased breadth of information More reliable analyses Integration brings added value

Challenges Proteomics repositories in disparate locations  need for distributed solution: common access, distributed query processing  need for integration: overlapping data, different representations Data/schemas constantly updated/evolve  need virtual or hybrid integration  need schema evolution support

Middleware (1/2) OGSA-DAI: middleware exposing data sources on Grids via web services open-source and extensible uniform access to relational & XML data sources supports a variety of operations, e.g. querying/updating, data transformation, data delivery OGSA-DQP: service-based distributed query processor supports querying of relational OGSA-DAI data sources offers implicit parallelism for data-intensive requests

Middleware (2/2) AutoMed: heterogeneous data transformation and integration system subsumes traditional data integration approaches handles various data models – easily extensible virtual/materialised/hybrid integration schema evolution data warehousing tools

Data Integration Approaches Global-As-View (GAV) approach: describe GS constructs with view definitions over LS i constructs Local-As-View (LAV) approach: describe LS i constructs with view definitions over GS constructs

Both-As-View (BAV) Approach Schema transformation approach For each pair (LS i,GS): incrementally modify LS i /GS to match GS/LS i

BAV Example Transformation pathway consists of primitive transformations Pathway contains both GAV & LAV definitions Transformations are automatically reversible Metadata in AutoMed Repository

Proteomics Resources PEDRo collection of descriptions of experimental data sets in proteomics has been used as a format for exchanging proteomics data gpmDB contains a large number of proteins and peptide identifications initially designed to assist in the validation of peptide MS/MS spectra and protein coverage patterns PepSeeker developed as part of the ISPIDER project comprehensive resource of peptide/protein identifications PRIDE centralised, standards compliant, public proteomics repository contains protein/peptide identifications + evidence supporting them

Global Schema Trade-off between: being able to answer specific user queries a full integration Properties: Based on PEDRo’s peptide/ protein identification section and … expanded with information unique in other resources Entities identified by LSIDs

System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

Query Processing Query is submitted to AutoMed’s GQP: Reformulated Optimised AutoMed-DQP Wrapper: IQL  OQL OGSA-DQP evaluates OQL queries OQL result  IQL result

Query Processing Query is submitted to AutoMed’s GQP: Reformulated Optimised AutoMed-DQP Wrapper: IQL  OQL OGSA-DQP evaluates OQL queries OQL result  IQL result

Summary Proteomics repositories in disparate locations  need for distributed solution  need for integration Data/schemas constantly updated/evolve  need virtual or hybrid integration  support schema evolution

Future Work Schema evolution Evaluation of AutoMed advantage Expose AutoMed functionality to the Grid AutoMed and Taverna integration

Future Work Taverna: tool for Web Service orchestration in workflows Related services may be incompatible Current solution involves writing custom code for every pair of WS Use AutoMed toolkit for semi- automatic integration of XML Web Services mappings from WS to ontologies automatic integration

ISPIDER Project Members Birkbeck College Nigel Martin Alex Poulovassilis Lucas Zamboulis (R.A.) Hao Fan (former R.A.) European Bioinformatics Institute Rolf Apweiler Henning Hermjakob Weimin Zhu Chris Taylor Phil Jones Nisha Vinod University of Manchester Simon Hubbard Steve Oliver Suzanne Embury Norman Paton Carol Goble Robert Stevens Khalid Belhajjame (R.A.) Jennifer Siepen (R.A.) U.C.L. David Jones Christine Orengo Melissa Pentony (R.A.)