Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010.

Slides:



Advertisements
Similar presentations
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.14/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Advertisements

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Dependence Precedence. Precedence & Dependence Can we execute a 1000 line program with 1000 processors in one step? What are the issues to deal with in.
Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph Freytag.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
T h e G a s L a w s. T H E G A S L A W S z B o y l e ‘ s L a w z D a l t o n ‘ s L a w z C h a r l e s ‘ L a w z T h e C o m b i n e d G a s L a w z B.
Fall 2008Parallel Query Optimization1. Fall 2008Parallel Query Optimization2 Bucket Sizes and I/O Costs Bucket B does not fit in the memory in its entirety,
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
Parallel Database Systems
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems Prasanna Ganesan Beverly Yang Hector Garcia-Molina Stanford University.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Optimization and evaluation of parallel I/O in BIPS3D parallel irregular application Performance Modelling, Evaluation, and optimization of Parallel and.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
PMIT-6102 Advanced Database Systems
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Institute for Mathematical Modeling RAS 1 Dynamic load balancing. Overview. Simulation of combustion problems using multiprocessor computer systems For.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,
1 中華大學資訊工程學系 Ching-Hsien Hsu ( 許慶賢 ) Localization and Scheduling Techniques for Optimizing Communications on Heterogeneous.
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Integrated Maximum Flow Algorithm for Optimal Response Time Retrieval of Replicated Data Nihat Altiparmak, Ali Saman Tosun The University of Texas at San.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
Parallelizing Video Transcoding Using Map-Reduce-Based Cloud Computing Speaker : 童耀民 MA1G0222 Feng Lao, Xinggong Zhang and Zongming Guo Institute of Computer.
Chap 7: Consistency and Replication
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Lecturer : Assoc. Prof. Dang Tran Khah Presenter: Tran Thach Lam 1.
Mapping the Data Warehouse to a Multiprocessor Architecture
Ten Thousand SQLs Kalmesh Nyamagoudar 2010MCS3494.
Parallel Evaluation of Conjunctive Queries Paraschos Koutris and Dan Suciu University of Washington PODS 2011, Athens.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.
Large-Scale and Cost-Effective Video Services CS587x Lecture Department of Computer Science Iowa State University.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 18: Parallel.
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Auburn University
Parallel Databases.
Parallel Tasks Decomposition
Edinburgh Napier University
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Chapter 15 QUERY EXECUTION.
On Spatial Joins in MapReduce
Akshay Tomar Prateek Singh Lohchubh
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 01: Introduction
Presentation transcript:

Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September ADBIS 2010

Outline 22 September 2010 ADBIS  Motivation  Problem Statement  Background  Partial mirroring method  Results  Future work

Motivation 22 September Top500 Cluster: 84.8% MPP: 14.8% Others: 0.4% ADBIS 2010

Problem Statement 22 September  Not expensive parallel hardware needs not expensive parallel database management system  Today we have no such chip parallel database management system ADBIS 2010

Background 22 September ADBIS 2010

Exchange operator 22 September 2010 ADBIS  p: port  ψ : distributing function

Parallel plan for query Q = R  S 22 September 2010 ADBIS 20107

Query processing in cluster system 22 September ADBIS 2010

The problem 22 September 2010 ADBIS  Load balancing

Partial mirroring method  Fragmentation strategy  Replication strategy 10

Fragmentation strategy 22 September 2010 ADBIS FRAGMENTATION  Source relation P0P0 P1P1 PnPn Relation is divided into fragments distributed among cluster nodes Each fragment is divided into sequence of segments with an equal length Segment is the minimal unit of replication...

Replication strategy 22 September 2010 ADBIS Disk D i Disk D j ρ j = 50% RiRi

Load balancing method 22 September 2010 ADBIS D1D1 D2D2 t1t1 D1D1 D2D2 t2t2 D1D1 D2D2 - scheduled to process D1D1 D2D2 t4t4 - processed t3t3 P1P1 P2P2 P1P1 P2P2 P1P1 P1P1 P2P2 P2P2 - not used

Parallel agent with two input streams 22 September 2010 ADBIS

Load balancing algorithm 22 September 2010 ADBIS

Parameters of experiments 22 September 2010 ADBIS

Speedup versus replication factor 22 September 2010 ADBIS

Speedup versus skew factor θ 22 September 2010 ADBIS corresponds to the ”80-20” rule (80 percents of tuples of the relation will be stored in 20 percents of fragments) 0 corresponds to the uniform distribution

Future Work 22 September 2010 ADBIS  To incorporate the proposed technique of parallel query execution into open source PostgreSQL DBMS.  To extend this approach on GRID DBMS for clusters with multicor processors.

22 September 2010 ADBIS  Thank you