Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.

Slides:



Advertisements
Similar presentations
At Reading Frank Bisby, Alistair Culham, Paul Valdes, Neil Caithness, Tim Sutton, Peter Brewer At Cardiff Alec Gray, Andrew Jones, Nick Fiddian, Nick Pittas,
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Workshop on Cyber Infrastructure in Combustion Science April 19-20, 2006 Subrata Bhattacharjee and Christopher Paolini Mechanical.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
Client-Server Processing and Distributed Databases
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Virtual Laboratory for Global Biodiversity Analysis.
Accessing Biodiversity Resources in Computational Environments from Workflow Application J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
Designing and Building a Biodiversity Grid: the Biodiversity World Project A talk in the workshop “e-Research - Meeting New Research Challenges” at the.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Interoperability between Scientific Workflows Ahmed Alqaoud, Ian Taylor, and Andrew Jones Cardiff University 10/09/2008.
Andrew Jones Interop. in changing infrastructure BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July Design Decisions Interoperability.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Jian Gui WANG New Implementation of Agriculture Models APAN19---Jan New Implementations of Agriculture Models Using Mediate Architecture.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Metadata Mòrag Burgon-Lyon University of Glasgow.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
At Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson At Cardiff Alec Gray, Andrew Jones, Nick.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
1October 2006Richard White, Andrew Jones & Frank Bisby - TDWG - St Louis Federating taxonomic databases: progress with the Catalogue of Life Dynamic Checklist.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
7. Grid Computing Systems and Resource Management
Workflow Management in GridMiner Günter Kickinger, Jürgen Hofer, Peter Brezany, A Min Tjoa Institute for Software Science University of Vienna The 3rd.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
The University of Reading Frank Bisby, Alistair Culham, Neil Caithness, Tim Sutton, Peter Brewer, Chris Yesson Cardiff University Alec Gray, Andrew Jones,
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
BDWorld Alex Gray, Andrew Jones, Frank Bisby, Alastair Culham, Alex Gray, Nick Fiddian, Andrew Jones, Malcolm Scoble, Paul Valdes, Richard White, Peter.
Flanders Marine Institute (VLIZ)
Presentation transcript:

Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff University)

Overview Bioinformatics Data Access and Integration Requirements –Generic BioDA Workshop and Questionnaire –BDWorld-specific OGSA-DAI exemplar

The BioDA Project Independent Evaluation of OGSA-DAI –the suitability of that software in its present form –how to leverage OGSA-DAI in bioinformatics GRID OGSA-DAI Product Improvement –Feedbacks to the DAIT Team Knowledge Dissemination –Evaluation Report –Publications/Presentations –Workshop on OGSA-DAI for the bioinformatics eResearch community

Bioinformatics The Application and development of computing of mathematics to the management, analysis an understanding of data to solve biological question. Attwood, TK and Parry-Smith, DJ 1999 Data Management Data Analysis

Grid Computing... “... flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources…” Foster, Kesselman and Tuecke, 2001

1 st BioDA Workshop Objectives –examine bioinformatics community’s needs for data access and integration (DAI) on the grid, and –to explore the application of OGSA-DAI, a middleware developed expressly to address DAI requirements of eScience projects

The BioDA Survey

The Results 17 key requirements, top of the list include: schema integration schema mapping mixed language query complex join across databases provenance data flexible resource discovery RDF database access

The BioDA Exemplar The BioDiversity World To create a GRID-based problem solving environment. Enable collaborative exploration and analysis of global biodiversity patterns using workflow and rich data sources from around the world Example applications would be modeling species distributions against climate change, conservation prioritization and linking evolutionary changes to past climates.

BDWorld (Source: BDWolrd) Taxonomic index (Species 2000 & ITIS Catalogue of Life) Analyti c tool Thematic data source BDGrid Ontology:  Metadata  Intelligent links  Resource & analytic tool descriptions  Maintenance tools Proxy Abiotic data source User Local tools Problem Solving Environment user interface Problem Solving Environment:  Broker agents  Facilitator agents  Presentation agents Proxy Analyti c tool GSD

BDWorld Data Resources : Key Issues geographically distributed and autonomous –heterogeneous in structure and data standards –mainly read via HTTP/XML protocols using custom wrappers SQL queries are limited to the EBI EMBL store and BDWorld cache databases potentially resource-intensive to harvest –a single taxa name may resolve into a large number of ‘accepted’ taxon names –same query repeated on different data collections

Resource Wrapping (Source:BDWorld) Remote Resource The GRID Workflow enactment engine User BDWorld-GRID Interface (BGI) BGI API BDWorld-GRID Interface (BGI) BGI API Wrapper

Implications for BioDA abstraction layer (BGI)  Proprietary invocation mechanism –InvokeOperation (ResourceHandler, Operation, XmlDataCollection) prepared search statements defined in individual data resource wrapper BGI protocols  BDW communication objects. Search parameters and results passed as XmlDataCollecton

BioDA Exemplar Two main possibilities within BDW: 1.Augment BGI to support inclusion of queries in workflows and to be sent directly to OGSA-DAI enabled databases. Distributed query processing facilities could assist in planning execution & distribution of data-orientated parts of a workflow. (For the current status of OGSA-DQP see Section 4.) –Very major revision to BDW protocols; also, –many resources of interest are simply not exposed as databases. 2.Provide facilities within individual wrappers that benefit from OGSA-DAI.

OGSA-DAI Prototype (What we’d have liked) OGSA-DAI R5 GDS deliverFromURL(xsl) OGSA-DAI Client BDWQueryActivity Wrapper Module Wrapper 2. Create GDS and query 3. Invoke wrapper Web DBs 4. Query deliverFromURL(url) 5. Download URL XSLTransform deliverToURL/GFTP 6. Download url7. url 8. XSL transform to BDW format 9. To WF unit 1. BGI InvokeOperation()

Key Issues encountered Complex client-side coding to orchestrate the application flow –require several GDS perform requests… Difficult to synchronise –Remote web databases have different response time (or not response at all!) Different data transformation series applicable to different data resources BDW Protocols specify data returned as a BDW XmlDataCollection object

OGSA-DAI Prototype (What we ended up doing) OGSA-DAI R5 GDS OGSA-DAI Client BDWQueryActivity Wrapper Module Wrapper 2. Create GDS and query 3. Invoke wrapper/s Web DBs 4. Query, transform 1. BGI InvokeOperation() Cache File 5. Write cache file 6. return XmlRemoteData 7. return XmlDataCollection

Conclusion Highlighted key bioinformatics eScience project requirements for OGSA-DAI –support for a metadata-driven two-step access to data and data integration… Reviewed BDWorld DAI requirements –uniform access to disparate, heterogeneous data resources including anonymous access to web information system Reviewed the BDWorld OGSA-DAI exemplar and issues encountered