OGSA-DQP - A Service-Based Distributed Query Processor for The Grid Arijit Mukherjee University of Newcastle Arijit Mukherjee University of Newcastle
2 Motivation behind OGSA-DQP 1)High level Data Access and Integration services are needed if data-intensive distributed applications running on heterogeneous platforms are to benefit from the Grid. 2)Emerging standards for Data Access - OGSA-DAI supports exposure of data resources onto Grids. 3)DQP is an approach to deliver (1) given the availability of (2) 2
3 Service-based in what sense? OGSA-DQP is service-based in two orthogonal senses - Supports querying over data storage and analysis services factored out as services Hence resource virtualisation via SOA Construction of the distributed query plan and their execution over the grid are factored out as services 3
4 OGSA-DQP Goals To benefit from homogeneous access to heterogeneous data sources [OGSA-DAI]. To benefit from Grid abstractions for on-demand allocation of resources required for a task [Condor, OMII, GT*]. To provide transparent, implicit support for parallelism and distribution. [Polar*] To orchestrate the composition of data retrieval and analysis services. To expose this orchestration capability as a Grid data service. 4
5 OGSA-DQP Approach OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. It promises bottom-lines regarding: efficiency: leave to schedule in parallel; effectiveness: leave to to orchestrate your services; usability: use it as a Grid data service. DBMS data OGSA-DQP DBMS data QueryResults OGSA-DAI 5
6 OGSA-DQP Innovations OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator. This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule. OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not. 6
7 OGSA-DQP Architecture Extends the OGSA-DAI with two new services Grid Distributed Query Service Exposed to client Finds and retrieves service descriptions Parses, compiles, optimizes, schedules the query execution plans over a union of distributed data resources Query Evaluation Service Not exposed to the client Implements the physical query algebra Implements the query execution model and semantics Evaluates a partition of the query execution plan generated by the GDQS Interacts with other QESs/GDSs/Web Services 7
8 Example Query Plan select p.proteinId, Blast(p.sequence) from p in protein, t in proteinTerm where t.termId=GO: and p.proteinId=t.proteinId select (proteinId, sequence) table_scan (proteinTerms) (termed=ABCD) table_scan (proteins) select (proteinId, sequence) scan (proteinTerms) (termed=ABCD) scan (proteins) select (p.proteinId, blast) operation call (blast(p.sequence)) join (p.proteinId=t.proteinId) select (proteinId) select (p.proteinId, blast) operation_call (blast(p.sequence)) hash_join (p.proteinId=t.proteinId) select (proteinId) (a) Single-node logical plan (b) Single-node physical plan select (p.proteinId, blast) operation_call (blast(p.sequence)) exchange hash_join (p.proteinId=t.proteinId) select (proteinId) select (proteinId, sequence) table_scan (proteinTerms) (termed=ABCD) table_scan (proteins) 4, 5 3, 6 2, 3 6 (c) partitioned plan 8
9 Another Example 9
10 OGSA-DQP: Query Evaluation Query installation stage: As many QE services are utilised as there are partitions specified. Each partition is sent to the QE service it is scheduled for. Query evaluation stage: Each QES evaluates its partition using an iterator model. Queries execute under pipelined and partitioned parallelism. Results are conveyed to client. 10
11 OGSA-DQP Execution Flow GDQS GDS-1 GDS-2 Web Service Client Resource list wsdl schema OQL Parser Logical Optimizer Physical Optimizer Scheduler Partitioner Polar* Query Optimizer Engine query schema QES Partition1 Partition3 partition2 results 11
12 What we provide Resource virtualisation through a service-oriented architecture: Data Resource Discovery using service registries; Computational Resource Discovery via Index Services (not implemented yet); Reliance on GDSs for metadata and data access Coarse-grained services with document-oriented interfaces By acquiring and manipulating data in a data-flow architecture that is constructed dynamically, OGSA-DQP constructs, on-the-fly, a lightweight Distributed Query Processing Engine. 12
13 Timeline Release 1.0 in September 2003 Improved Release 2.0 in July 2004 (based on GT3.2 and OGSA-DAI 4.0) Around 700 downloads New Release coming soon! Whats new: Based on OGSA-DAI R7.0 GDQS closer to OGSA-DAI GQES refactored as QES (WS-I) 13
14 Working on… More friendly (!) - Use SQL More portable - Support Cygwin, Solaris for the compiler/optimiser Cygwin - DONE Better performance – we are working on it Some bottlenecks removed More functional - Semi-structured data; Streams. Working on incorporating XML DBs More dynamic - Use Index Services; dynamically install services QES is DynaSOAr-READY More application test-beds - Sensor networks. More adaptive - Queries may be long running, environment is constantly changing - static optimisation is likely to become stale fast. Monitor, assess and respond (e.g., switch operators/ algorithms, spawn more copies, relocate). Ongoing More widely deployable - As OGSA-DAI Introduce Virtual Machines Looking into it. 14
15 Where to find out more Papers - M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. Service Based Distributed Querying on the Grid. 1 st International Conference on Service Oriented Computing, 2003, LNCS 2910 M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. OGSA-DQP: A Service for Distributed Querying on the Grid, in Proceedings of the Advances in Database Technology - EDBT 2004, LNCS 2992 M N Alpdemir, A Mukherjee, A Gounaris, A A A Fernandes, N W Paton, P Watson, J Smith. An Experience Report on Designing and Building OGSA-DQP: A Service Based Distributed Query Processor for the Grid. GGF9 Workshop on Designing and Building Grid Services, M N Alpdemir, A Mukherjee, A Gounaris, N W Paton, P Watson, A A A Fernandes, J Smith. OGSA-DQP: A Service-Based Distributed Query Processor for the Grid. 2nd UK e-Science All Hands Meeting, J Smith, A Gounaris, P Watson, N W Paton, A A A Fernandes, R Sakellariou. Distributed Query Processing on the Grid. GRID 2002, LNCS 2536 (papers available at ) Software
16 Peoples and Partners Prof. Paul Watson Dr. Jim Smith Arijit Mukherjee Prof. Norman Paton Dr. Alvaro AA Fernandez Dr. Rizos Sakellariou Anastasios Gounaris Steven Lynden 16
Thank You ?