Presentation is loading. Please wait.

Presentation is loading. Please wait.

OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester.

Similar presentations


Presentation on theme: "OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester."— Presentation transcript:

1 OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester

2 People, Places, Funding Manchester M Nedim Alpdemir Anastasios Gounaris Norman W Paton Alvaro A A Fernandes Rizos Sakellariou Newcastle upon Tyne Arijit Mukherjee Jim Smith Paul Watson

3 Outline Requirements, Motivation Background OGSA-DQP Overview OGSA-DQP Details – How does it work OGSA-DQP Details – A query Example Summary

4 Story: Ad-hoc data integration

5 Story: Middleware to the rescue

6 Motivation Pull by applications: overwhelming amounts of semantically complex data in very diverse, structurally dissimilar, and autonomous, geographically dispersed data sources requiring computationally demanding analysis. Push from technologies and infrastructure: Web service impetus Grid services and protocols that enable: dynamic resource discovery dynamic resource allocation and use Data access standards

7 The Grid needs DQP: Declarative, high- level resource integration with implicit parallelism. DQP-based solutions should in principle run faster than those manually coded. DQP needs the Grid: Systematic access to remote data and computational resources. Dynamic resource discovery and allocation. Mutual Benefits

8 Background: mediator/wrapper middleware A well-known approach to data integration is to use mediator/wrapper middleware. wrappers reconcile differences and impose a global schema. Often there is only one mediator in one fixed location, which is limiting. DBMS data mediator DBMS data QueryResults wrapper

9 OGSA-DQP adopts mediator/wrapper OGSA-DQP uses a middleware approach. It can be seen as a mediator over OGSA-DAI wrappers. It promises bottom-lines regarding: efficiency: “leave to it to schedule in parallel”; effectiveness: “leave to it to orchestrate your services”; usability: “use it as a Grid data service”. DBMS data OGSA-DQP DBMS data QueryResults OGSA-DAI

10 Background: Service-based approaches Virtualisation of Resources A convenient cooperation model for Distributed Systems (e.g. Grid) facilitate Leads to

11 Background: Grid Database Service (GDS) Databases are made available on the Grid through integration with other Grid services, and provision of standard interfaces Build upon OGSA to deliver high-level data management functionality for the Grid. Seek to provide two classes of components: Data access components Data integration components

12 DQP design issues A DQP engine (DQPE) is a distributed data-flow engine. How to map distributed query processing concepts and techniques to a service- based infrastructure? What are DQPE components internally realized as? What is the DQPE externally exposed as? How to partition functionalities and responsibilities among components? What interaction discipline to impose on components? Which control flows and which data flows among components? What component lifetime management policies to adopt?

13 Innovations OGSA-DQP dynamically allocates evaluators to do work on behalf of the mediator. This allows for runtime circumstances to be taken into account when the optimiser decides how to partition and schedule. OGSA-DQP uses a parallel physical algebra: most mediator-based query processors do not.

14 OGSA-DQP overview: A Service- based framework Service-based in two orthogonal sense: data storageanalysis services Supports querying over data storage and analysis resources made available as services Constructionexecution Construction of distributed query plans and their execution over the grid are factored out as services Uses the emerging standard for GDSs to provide consistent access to database metadata and to interact with databases on the Grid. A query may refer to database (GDS) and computational services.

15 OGSA-DQP Overview: Two Separate Grid Services Grid Distributed Query Services (GDQSs) that: interact with clients; find and retrieve service descriptions; parse, compile, partition and schedule the query execution over a union of distributed data sources. The query plan is an orchestration of GQESs Grid Query Evaluation Services (GQESs) that: implement the physical query algebra; implement the query execution model and semantics; run a partition of a query execution plan generated by a GDQS; interact with other GQESs/GDSs/WSs but not with clients.

16 OGSA-DQP Overview: An illustration http://phoebus.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://rpc676.cs.man.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://mygrid.ncl.cs.ac.uk:8080/ogsa/services/ogsadai/GridDataServiceFactory http://phoebus.cs.man.ac.uk:9090/axis/services/EntropyAnalyserService?WSDL varchar 12 varchar 12 varchar 12 id 130.88.192.230 http://phoebus.cs.man.ac.uk:9090/ogsa/services/ogsadai/GridDataServiceFactory http://130.88.198.195:9090/ogsa/services/ogsadai/dqp/GridQueryEvaluationFactory/hash-11025450- 1076603541049 goterm goterm.OID string goterm.id string goterm.type string goterm.name goterms http://130.88.192.230:9090/ogsa/services/ogsadai/GridDataServiceFactory/hash- 31056514-1076603576481 goterms LIKE...

17 Details: Setting up GDQS Setup phase involves: Importing schemas of participating data sources Results in a global schema (currently no schema integration support) Importing WSDL documents of participating analysis services Service metadata obtained (e.g. Port types, operations, types etc.) Collecting computational resource metadata (implicit) Set-up strategy depends on the life-time model of GDQS and GDSs GDQS instance is created per-client But it can serve multiple-queries This model avoids complexity of multi-user interactions while ensures that the set-up cost is not high

18 Details: How is a query compiled? logical optimiser physical optimiser partitionerscheduler evaluators parser single-node optimiser multi-node optimiser query query results

19 what is logical optimisation? Plan is expressed using a logical algebra. By heuristic-based application of equivalence laws; Multiple equivalent plans are generated. scan termId=GO:0005942 (proteinTerm) scan (protein) reduce join (proteinId) op_call (Blast) reduce

20 what is physical optimisation? Plan is expressed using a physical algebra, With logical operators replaced by physical operators. By cost-based ranking of equivalent plans, a probably, acceptably efficient execution plan is chosen. index_scan termId=GO:0005942 (proteinTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce

21 what is query partitioning? Plan is expressed in a parallel algebra, Where parallel algebra = physical algebra + data exchange. Exchange operators are placed where movement of data between evaluator nodes is required. index_scan termId=GO:0005942 (proteinTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange

22 what is query scheduling? Plan is expressed by decorating parallel algebra expression with allocated Grid nodes. Using a heuristic algorithm based on memory use and network costs, an allocation of sub- plans to nodes is decided. index_scan termId=GO:0005942 (proteinTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 2 15

23 how is a query evaluated? Query installation stage: As many GQES instances are created as there are partitions specified. Each partition is sent to the GQES instance it is scheduled for. Query evaluation stage: Each GQES evaluates its partition using an iterator model. Queries execute under pipelined and partitioned parallelism. Results are conveyed to client.

24 Details: A query Example Given two DBMSs and one analysis tool (e.g., a WS): proteinTerm to a GO Gene Ontology running as a remote mySQL DB, protein to a GIMS Genome Warehouse running as a remote ODMG-compliant DB, Blast (sequence alignment scoring); We can obtain alignment scores for a sequence against proteins of a certain kind: select p.proteinId, Blast(p.sequence) from protein p, proteinTerm t where t.termId = ‘GO:0005942’ and p.proteinId = t.proteinId Then, OGSA-DQP acts as an enactor of a declarative orchestration of services on the Grid: index_scan termId=GO:0005942 (proteinTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 2 1 5

25 Details: what parallelisms do I benefit from? 3 4 partitioned parallelism pipelined parallelism table_scan termId=GO:0005942 (proteinTerm) table_scan (protein) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 2 15

26 Details: What goes on behind the scene?

27 summary 1. High-level data access and integration services are needed if applications that have data with complex structure and complex semantics are to benefit from the Grid. 2. Standards for data access are emerging, and middleware products that are reference implementations of such standards are already available. 3. Distributed query processing technology is one approach to delivering (1.) given the availability of (2.). 4. OGSA-DQP is a service-based distributed query processor for the Grid that is: Exposed as a service; Implemented as an orchestration of services. Improves on Grid portals when only retrieval and analysis is involved.

28 where to find out more OGSA-DQP Grid middleware to query distributed data sources www.ogsadai.org.uk/dqp OGSA-DAI Grid middleware to interface with data(bases) www.ogsadai.org.uk/


Download ppt "OGSA-DQP:Service-Based Distributed Query Processing on the Grid M.Nedim Alpdemir Department of Computer Science University of Manchester."

Similar presentations


Ads by Google