Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN GeoLoc: Robust Resource Allocation Method for Query Optimization in Data Grid Systems Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN Baltic DB&IS'2012
Table of contents Introduction Existing methods classification Contributions Allocation Space Allocation Algorithm Performance Evaluation Conclusion
Introduction Data Grid Heterogeneity Dynamicity Large Scale
Introduction Query processing Query execution Parsing Query rewrite Resource allocation Resource discovery
Introduction Problem Input: Set of query operations (dependent) Set of nodes Distribution of Relations Dynamic and Static characteristics of Data Grid Objectives: Select optimal subset of nodes to allocate resources for query operations
Existing Methods Classification Control structure: Centralized Hierarchical Decentralized
Existing Methods Classification Algorithms: Heuristic Exact
Existing Methods Classification Strategies: Static Resource Allocation Execution Dynamic Resource Allocation Execution Hybrid Execution with Dynamic Reallocation Resource Allocation
Existing Methods Classification Cooperation type: Classic Incentive-based Economic / Reputation
Contributions Allocation Space Restriction Algorithm of Resource Allocation Parallelism: pipeline, intra-operation, inter-operation Distributed and duplicated relations
Allocation Space Source nodes Nearest nodes
Allocation Algorithm Each relation is distributed by N equal parts Assumptions Each relation is distributed by N equal parts Hybrid Hash Join algorithm Results are being retransferred from the nodes Memory is using for reducing I/O operations
Overall Node Bandwidth Allocation Algorithm Stage 1. Definition of Allocation Space Input: All nodes with fragments of queried relations (1) All nodes nearest to (1) CPU NET I/O Overall Node Bandwidth Algorithm: Selection of source nodes on the base of their performance Placement of Scan operations Generation of Allocation Space (source nodes + nearest nodes)
Allocation Algorithm Stage 2. Generation of execution plan Algorithm: Input: Query logic plan Generated Allocation Space Idea: Parity in bandwidth between Scan and Join operations Algorithm: BEGIN FOR each join DO Count the time of source relations read and transferring, Tscan_exec DO Choose the most efficient node Neff from a set of AS for placing join operation Add Neff to the join allocation plan, Pjoin Estimate the execution time of join, Tjoin_exec WHILE (Tjoin_exec > Tscan_exec) Add Pjoin to the query allocation plan, Pquery ENDFOR END
Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 Example Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 R2: n3, n4 S1: n5, n6 S2: n7, n8 n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 Example Query: R S R = R1 U R2 S = S1 U S2 R1: n1, n2 R2: n3, n4 S1: n5, n6 S2: n7, n8 n5 n2 n8 n6 n1 n3 n7 n4
Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 Example Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 n5 n2 n8 n21 n22 n10 n11 n6 n1 n3 n23 n12 n20 n14 n24 n13 n15 n16 n7 n25 n4 n17 n19 n26 n18
Allocation Algorithm Query: R S R = R1 U R2 S = S1 U S2 Example Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 n5 n2 n8 n21 n22 n10 n11 n6 n1 n3 n23 n12 n20 n14 n24 n13 n15 n16 n7 n25 n4 n17 n19 n26 n18
Allocation Algorithm Example Source Nodes Allocation space n1, n4, n6, n7, n10 n11, n12, n13, n14 n15, n16, n17, n18 n19, n20, n21, n22 n23, n24, n25, n26 Resulted Execution Plan Scans: n1, n4, n7, n6 Joins: n18, n25, n10, n26, n13, n12, n19 n1 n4 n7 n6 Nodes’ Bandwidth: 2000 lines/sec Nodes allocated for Join n18 n25 n10 n26 n13 n12 n19 Nodes’ Bandwidth: 1790 lines/sec 1920 lines/sec 1650 lines/sec 2000 lines/sec 1500 lines/sec 1300 lines/sec 900 lines/sec
Performance Evaluation Experimental conditions Data Grid simulator 6000 heterogeneous nodes Simple, Average and Complex queries Distributed and duplicated relations Comparison Method GeoLoc Method Gounaris2004
Performance Evaluation Optimization Time
Performance Evaluation Response Time
Conclusion Proposed method is: Efficient Scalable Adapted to heterogeneous decentralized Data Grid Perspective: Adaptation to the Dynamicity of Data Grid
Thank you for your attention!