Download presentation
Presentation is loading. Please wait.
1
OLAP Query Processing in Grids
DMG 2007 OLAP Query Processing in Grids Nelson Kotowski Federal University of Rio de Janeiro, Brazil Alexandre A. B. Lima University of Grande Rio, Brazil Esther Pacitti, Patrick Valduriez INRIA and University of Nantes, France Marta Mattoso
2
Agenda OLAP in Grids Database clusters GParGRES
Preliminary experimental results Conclusion
3
Figure thanks to Peter Kacsuk and Gergely Sipos
OLAP using Grids Problem How to fulfill OLAP needs within current grid software infrastructure ? Grid Services ? Adapting database cluster techniques to grids ? Grid Figure thanks to Peter Kacsuk and Gergely Sipos
4
Using Database Clusters in Grids
Middleware DBMS PC Cluster Clients A sequential “black-box” DBMS runs at each node It is based on database replication The middleware coordinates parallel query execution Applications and databases are easily migrated from sequential environments Both inter and intra-query parallelism can be explored
5
Inter-query Parallelism
Improves overall system throughput Good for OLTP applications Not adequate for OLAP DBMS Node 1 DBMS Q1 Node 2 Q2 Q3 DBMS Node 3 Q4 São duas as técnicas de paralelismo utilizadas pelos DBCs para aumentar a vazão do sistemas ou reduzir o tempo de processamento das consultas. Inter-c - capacidade de processamento de mais de uma consulta simultamente. Bastante empregada em aplicações OLTP, características por atender a muitos usuários simultaneamente. Quatro consultas são recebidas e enviadas ao mesmo tempo para quatro SGBDs distintos DBMS Node 4
6
Intra-query Parallelism
Reduces individual query execution time Required for high-performance OLAP DBMS Node 1 Q1 Q11 DBMS Node 2 Q12 Virtual Partitioning Q13 Q14 DBMS Node 3 Q2 Q3 DBMS Node 4 Q4
7
ParGRES Database cluster middleware developed by our research group
Optimized for OLAP support Provides inter and intra-query parallelism Offers high-performance for heavy-weight query processing over large databases using non-expensive components in a non-intrusive way Making no changes to database applications Keeping the same DBMS Keeping the same logical database schema Shows super-linear speedup
8
GParGRES
9
GParGRES: a Database Grid Middleware
Middleware that provides Transparent access to distributed databases in a grid Intra-query parallelism during heavy-weight query processing Based on ParGRES Assumes that grid nodes are PC clusters running ParGRES instances Intra-query parallelism is achieved through virtual partitioning Two levels of query splitting Grid-level splitting: implemented by GParGRES Node-level splitting: implemented by ParGRES
10
GParGRES: Architecture
11
GParGRES: Architecture
Concentrates metadata concerning GParGRES services, such as the state of each FS and DQS instance, and ParGRES execution in the nodes
12
GParGRES: Architecture
GParGRES entry point, responsible for creating new instances of DQS
13
GParGRES: Architecture
Manages global query execution. Receives the query and splits it into subqueries by using virtual partitioning to implement intra-query parallelism. It also performs final result composition
14
GParGRES: Architecture
Grid Local Query Service (GLQS) – local component responsible for receiving subqueries from DQS and passing them to the local ParGRES instance
15
GParGRES: Architecture
16
GParGRES: a Database Grid Middleware
17
GParGRES: a Database Grid Middleware
18
GParGRES: a Database Grid Middleware
19
GParGRES: a Database Grid Middleware
20
GParGRES: a Database Grid Middleware
select o_orderpriority, count(*) from orders where o_orderdate >= date ' ' group by o_orderpriority;
21
GParGRES: a Database Grid Middleware
create table temp_result_1 ( o_orderpriority varchar(2), order_count integer);
22
GParGRES: a Database Grid Middleware
select o_orderpriority, count(*) from orders where o_orderdate >= date ' ' and o_orderkey >= ? and o_orderkey < ? group by o_orderpriority;
23
GParGRES: a Database Grid Middleware
24
GParGRES: a Database Grid Middleware
25
GParGRES: a Database Grid Middleware
26
GParGRES: a Database Grid Middleware
insert into temp_result_1 values (?,?);
27
GParGRES: a Database Grid Middleware
select o_orderpriority, sum(order_count) from temp_result_1 group by o_orderpriority;
28
GParGRES: a Database Grid Middleware
29
GParGRES: Preliminary Experimental Results
A preliminary GParGRES prototype has been implemented in Java Simple versions of DQS and GLQS (using ParGRES components) were implemented Experimental Setup Two clusters from Grid’5000 Parasol cluster: 64 nodes, each with 2 Opteron 2.2GHz CPUs, 2GB RAM and 73 GB HD Paraquad cluster: 64 nodes, each with 2 Dual Core Xeon 2.33GHz CPUs, 4GB RAM and 160GB HD Kadeploy Generate customized images of operating systems and applications PostgreSQL 8.2.4 ParGRES TPC-H database and queries SF = 1
30
GParGRES: Preliminary Experimental Results (cont.)
Two kinds of experiments Isolated clusters Mixed Configuration
31
GParGRES: Preliminary Experimental Results (cont.)
Isolated cluster - Parasol
32
GParGRES: Preliminary Experimental Results (cont.)
Isolated cluster - Paraquad
33
GParGRES: Preliminary Experimental Results (cont.)
Mixed Configuration
34
GParGRES – Implementation Issues
Goals To implement all components as grid services WSRF-compliant components: RS, FS and GLQS When running in a grid managed by Globus Toolkit 4, RS can be implemented by Web Service Monitoring and Discovery Service (WS MDS) Techniques employed in OGSA-DAI will help implementing some components (e.g. FS)
35
Related Work OGSA-DAI OGSA-DQP New data models for grid warehouses
Open Grid Services Architecture - Data Access and Integration OGSA-DQP Open Grid Services Architecture - Distributed Query Processing New data models for grid warehouses Wehrle et al. propose a data model for distributing and querying a data warehouse in computing grids The warehouse is formed by data “chunks” Special structures are needed (e.g. X-Tree) Recursos de Dados – Qualquer objeto que pode ser uma fonte de dados – Em geral o foco está nos bancos de dados Serviços de Dados – Interface comum aos recursos de dados – Expõe as capacidades dos recursos de dados – Consultas em SQL, Consultas em X-Path – Transformações de dados
36
Conclusion GParGRES is a grid service for OLAP query processing
It provides transparent inter and intra-query processing with No need for application migration No need for database schema migration DBMS independence GParGRES explore successful techniques implemented in ParGRES Two levels of query splitting Grid-level splitting: implemented by GParGRES Node-level splitting: implemented by ParGRES Components are WSRF-compliant, easing the compatibility with existing grid solutions Preliminary results obtained in Grid’5000 show good performance
37
Future Work Integration with OGSA-DAI
Support for partial database replication Support for top-k queries Extension of best position algorithms
38
A different view of the Grid
DMG 2007 Thanks! A different view of the Grid Kandinsky the Grid, 1923 Albertina Museum Vienna
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.