Www.neresc.ac.uk A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris,

Slides:



Advertisements
Similar presentations
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
Advertisements

Self-managing Grid Services for Efficient Data Management Anastasios Gounaris (University of Cyprus, University of Manchester) CoreGRID Summer School Budapest,
OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University.
Dynasoar Dynamic Deployment of Web Services on a Grid or the Internet or Why its good to be Jobless Paul Watson School of Computing Science.
Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.
Grid Database Projects Paul Watson, Newcastle Norman Paton, Manchester.
M.Nedim Alpdemir, Anastasios Gounaris¹, Arijit Mukherjee², Desmond Fitzgerald, Norman W. Paton¹, Paul Watson², Rizos Sakellariou¹, Alvaro A.A. Fernandes¹,
16-17 October 2003 Grids and Applied Language Theory: Declarative Grid Service Orchestration with OGSA-DQP (A A A Fernandes) 1 Declarative Grid Service.
Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
Institute for Software Science – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University of.
Slides thanks to Steve Lynden Amy Krause EPCC Distributed Query Processing with OGSA-DQP Principles and Architectures for Structured Data Integration:
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Combining the strengths of UMIST and The Victoria University of Manchester Utility-based Adaptive Workflow Execution on the Grid Kevin Lee School of Computer.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Containment and Integrity for Mobile Code Security policies as types Andrew Myers Fred Schneider Department of Computer Science Cornell University.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data and the UK e-Science Programme Paul Watson Director North-East Regional e-Science Centre School of Computing Science University of.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
A Dynamic Service Deployment Infrastructure for Grid Computing or Why it’s good to be Jobless Paul Watson School of Computing Science.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Metadata Mòrag Burgon-Lyon University of Glasgow.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.
Running BLAST on the cluster system over the Pacific Rim.
Introduction to OGSA-DAI Neil Chue Hong OGSA-DAI Project Manager 14 th February 2006 GGF16, Athens.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
IAnywhere Solutions Mobile Computing on Linux Eyun Lindberg
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
CSF4 Meta-Scheduler Zhaohui Ding College of Computer Science & Technology Jilin University.
Dynamic Accounts: Identity Management for Site Operations Kate Keahey R. Ananthakrishnan, T. Freeman, R. Madduri, F. Siebenlist.
Business System Development
Optimising the OGSA-DAI Enactment Model
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
Wide Area Workload Management Work Package DATAGRID project
Workflow Adaptation as an Autonomic Computing Problem
Evaluate the integral {image}
Presentation transcript:

A Grid Data Integration Service (OGSA-DQP) Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris, Alvaro Fernandes, Rizos Sakellariou University of Manchester Jim Smith, Arijit Mukherjee, Paul Watson University of Newcastle-upon-Tyne Paul Watson, University of Newcastle-upon-Tyne based on the work of… Norman Paton, Tasos Gounaris, Alvaro Fernandes, Rizos Sakellariou University of Manchester Jim Smith, Arijit Mukherjee, Paul Watson University of Newcastle-upon-Tyne

2 The Problem Many grid applications would benefit from access to distributed data Data sources are scattered and autonomous Integration is often done by tedious manual process or (recently) hand-coded workflows We are interested in how to simplify the process of querying distributed data Focussing initially on information held in (relational) databases

3 Distributed Query Processing Queries are expressed in OQL allows computations to be included in the query A single query may reference data at multiple sites the data locations may be transparent to the query author select p.proteinId, Blast(p.sequence) from protein p, proteinTerm t where t.termId = ‘S92’ and p.proteinId = t.proteinId

4 Query Compiler Logical Optimiser Physical Optimiser PartitionerScheduler Evaluator OQL Parser Single-node optimiser Multi-node optimiser OGSA-DQP automatically compiles and executes the query on a set of Grid nodes - in parallel where possible

5 Execution Plan select p.proteinId, Blast(p.sequence) from protein p, proteinTerm t where t.termId = ‘S92’ and p.proteinId = t.proteinId The plan is split in to a set of partitions Grid resources are acquired to execute the partitions in parallel where possible, required and affordable table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 1 2 9,10 3-8

6 Evaluation on the Grid The OGSA-DQP builds on OGSA-DAI accesses relational databases wrapped by OGSA-DAI Oracle, DB2, MySQL Data streams between nodes flow control All services are OGSI-compliant built on GT3

7 Execution on the Grid

8 Mutual Benefit The Grid needs DQP: Declarative, high-level resource integration with implicit parallelism Cost based optimisation DQP needs the Grid: Systematic access to remote data and computational resources Dynamic resource discovery and allocation

9 Summary DQP is a potentially important technology for the Grid OGSA-DQP supports: declarative expression of queries location transparency access to both data and computational resources dynamic deployment on Grid resources implicit parallelism First release made in September 2003 available for download Dynamic adaptation now being investigated fault-tolerance, performance, cost

10 Experiences and Issues Remote service deployment not yet available for Grids, but some work… PhD Project at Newcastle (Chris Fowler) dynamically deploy individual services remotely initial prototype by end of November 2003 working on security issues WS only GridShed project (Newcastle + BT) design of hosting environments for Grids install execution images on nodes as required

11 Experiences & Issues DQP vs Workflow? for what space of problems is each better DQP advantages? declarative expression of intent cost-based choice of execution plans implicit parallelisation Investigating with Bioinformatics applications in the my Grid project DQP with workflows & workflows with DQP

12 Projects/Sponsors Projects OGSA-DAI Polar Polar* my Grid Sponsors