Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN

Slides:



Advertisements
Similar presentations
Dynamic Task Assignment Load Index for Geographically Distributed Web Services PhD Research Proposal By: Dhiah Al-Shammary Supervised.
Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Fast Algorithms For Hierarchical Range Histogram Constructions
Antfarm: Efficient Content Distribution with Managed Swarms Ryan S. Peterson, Emin Gun Sirer USENIX NSDI 2009 Presented by: John Otto, Hongyu Gao 2009.
CLUSTERING IN WIRELESS SENSOR NETWORKS B Y K ALYAN S ASIDHAR.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Query Processing & Optimization
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
A Novel Adaptive Distributed Load Balancing Strategy for Cluster CHENG Bin and JIN Hai Cluster.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Classification and Analysis of Distributed Event Filtering Algorithms Sven Bittner Dr. Annika Hinze University of Waikato New Zealand Presentation at CoopIS.
The Forest and the Trees Julia Stoyanovich Candidacy Exam in Database Systems Fall 2005.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Query Execution on NetTraveler Angel L. Villalaín-García Manuel Rodríguez-Martínez University of Puerto Rico - Mayaguez Campus.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
CMSC 691B Multi-Agent System A Scalable Architecture for Peer to Peer Agent by Naveen Srinivasan.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Chiu Luk CS257 Database Systems Principles Spring 2009
Introduction to Load Balancing:
Database Management System
Applying Control Theory to Stream Processing Systems
Outline Introduction Routing in Mobile Ad Hoc Networks
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
Delay Optimization using SOP Balancing
Database Performance Tuning and Query Optimization
ICICLES: Self-tuning Samples for Approximate Query Answering
Supporting Fault-Tolerance in Streaming Grid Applications
Chapter 15 QUERY EXECUTION.
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
Join Processing in Database Systems with Large Main Memories (part 2)
Introduction to Database Systems
Database Query Execution
Outline Introduction Background Distributed DBMS Architecture
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Market-based Dynamic Task Allocation in Mobile Surveillance Systems
Lecture 27: Optimizations
Parallel Algorithm Models
Resource Allocation in a Middleware for Streaming Data
Chapter 11 Database Performance Tuning and Query Optimization
Kabra and DeWitt presented by Zack Ives CSE 590DB, May 11, 1998
Delay Optimization using SOP Balancing
Resource Allocation for Distributed Streaming Applications
The Gamma Database Machine Project
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN GeoLoc: Robust Resource Allocation Method for Query Optimization in Data Grid Systems Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN Baltic DB&IS'2012

Table of contents Introduction Existing methods classification Contributions Allocation Space Allocation Algorithm Performance Evaluation Conclusion

Introduction Data Grid Heterogeneity Dynamicity Large Scale

Introduction Query processing Query execution Parsing Query rewrite Resource allocation Resource discovery

Introduction Problem Input: Set of query operations (dependent) Set of nodes Distribution of Relations Dynamic and Static characteristics of Data Grid Objectives: Select optimal subset of nodes to allocate resources for query operations

Existing Methods Classification Control structure: Centralized Hierarchical Decentralized

Existing Methods Classification Algorithms: Heuristic Exact

Existing Methods Classification Strategies: Static Resource Allocation Execution Dynamic Resource Allocation Execution Hybrid Execution with Dynamic Reallocation Resource Allocation

Existing Methods Classification Cooperation type: Classic Incentive-based Economic / Reputation

Contributions Allocation Space Restriction Algorithm of Resource Allocation Parallelism: pipeline, intra-operation, inter-operation Distributed and duplicated relations

Allocation Space Source nodes Nearest nodes

Allocation Algorithm Each relation is distributed by N equal parts Assumptions Each relation is distributed by N equal parts Hybrid Hash Join algorithm Results are being retransferred from the nodes Memory is using for reducing I/O operations

Overall Node Bandwidth Allocation Algorithm Stage 1. Definition of Allocation Space Input: All nodes with fragments of queried relations (1) All nodes nearest to (1) CPU NET I/O Overall Node Bandwidth Algorithm: Selection of source nodes on the base of their performance Placement of Scan operations Generation of Allocation Space (source nodes + nearest nodes)

Allocation Algorithm Stage 2. Generation of execution plan Algorithm: Input: Query logic plan Generated Allocation Space Idea: Parity in bandwidth between Scan and Join operations Algorithm: BEGIN FOR each join DO Count the time of source relations read and transferring, Tscan_exec DO Choose the most efficient node Neff from a set of AS for placing join operation Add Neff to the join allocation plan, Pjoin Estimate the execution time of join, Tjoin_exec WHILE (Tjoin_exec > Tscan_exec) Add Pjoin to the query allocation plan, Pquery ENDFOR END

Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 Example Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 R2: n3, n4 S1: n5, n6 S2: n7, n8 n5 n2 n8 n6 n1 n3 n7 n4

Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 Example Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 R2: n3, n4 S1: n5, n6 S2: n7, n8 n5 n2 n8 n6 n1 n3 n7 n4

Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 Example Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 n5 n2 n8 n21 n22 n10 n11 n6 n1 n3 n23 n12 n20 n14 n24 n13 n15 n16 n7 n25 n4 n17 n19 n26 n18

Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 Example Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 n5 n2 n8 n21 n22 n10 n11 n6 n1 n3 n23 n12 n20 n14 n24 n13 n15 n16 n7 n25 n4 n17 n19 n26 n18

Allocation Algorithm Example Source Nodes Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 Resulted Execution Plan Scans: n1, n4, n7, n6 Joins: n18, n25, n10, n26, n13, n12, n19 n1 n4 n7 n6 Nodes’ Bandwidth: 2000 lines/sec Nodes allocated for Join n18 n25 n10 n26 n13 n12 n19 Nodes’ Bandwidth: 1790 lines/sec 1920 lines/sec 1650 lines/sec 2000 lines/sec 1500 lines/sec 1300 lines/sec 900 lines/sec

Performance Evaluation Experimental conditions Data Grid simulator 6000 heterogeneous nodes Simple, Average and Complex queries Distributed and duplicated relations Comparison Method GeoLoc Method Gounaris2004

Performance Evaluation Optimization Time

Performance Evaluation Response Time

Conclusion Proposed method is: Efficient Scalable Adapted to heterogeneous decentralized Data Grid Perspective: Adaptation to the Dynamicity of Data Grid

Thank you for your attention!