Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany.

Slides:



Advertisements
Similar presentations
Building Dynamic Market Places Using HyperQueries Christian Wiesner Peter Winklhofer Alfons Kemper Universität Passau.
Advertisements

Revisiting Co-Processing for Hash Joins on the Coupled CPU- GPU Architecture School of Computer Engineering Nanyang Technological University 27 th Aug.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Improving Hash Join Performance By Exploiting Intrinsic Data Skew by Bryce Cutt supervised by Dr. Ramon Lawrence.
Testing adaptive workload management Harumi Kuno HP Labs Stefan Krompass (TUM), Kevin Wilkinson, Umeshwar Dayal, Goetz Graefe, Janet Wiener.
City of Farmersville, Texas Water and Wastewater Rate Study February 2011.
Automating Performance … Joe Chang SolidQ
Linked Bernoulli Synopses Sampling Along Foreign Keys Rainer Gemulla, Philipp Rösch, Wolfgang Lehner Technische Universität Dresden Faculty of Computer.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
Efficient Management of Inconsistent and Uncertain Data Renée J. Miller University of Toronto.
Query Optimization Dr. Karen C. Davis Professor School of Electronic and Computing Systems School of Computing Sciences and Informatics.
Data Mining for Query Optimization. 2 Outline Semantic Query Optimization Soft Constraints Query Optimization via Soft Constraints Selectivity Estimation.
Copyright 2006 MySQL ABThe World’s Most Popular Open Source Database Bryan Alsdorf Manager of Support Systems November 2006 MySQL AB.
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan.
Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance TILMANN RABL, MEIKEL POESS, HANS- ARNO JACOBSEN, PATRICK.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha Philippe Bonnet
1 Database Tuning Principles, Experiments and Troubleshooting Techniques Dennis Shasha Philippe Bonnet
Dynamic Database Integration in a JDBC Driver Terrence Mason and Dr. Ramon Lawrence Iowa Database and Emerging Application Laboratory University of Iowa.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Tuning Relational Systems I. Schema design  Trade-offs among normalization, denormalization, clustering, aggregate materialization, vertical partitioning,
Motivation Mobile devices often work offline, and users often need to download large query results for later use. Results are often accessed in small pieces.
SQL on Hadoop CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
AutoJoin: Providing Freedom from Specifying Joins Terrence Mason Lixin Wang
A Cost-based Approach For Converting Relational Schemas To XML Ramon Lawrence University of Iowa
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
Jingren Zhou, Per-Ake Larson, Ronnie Chaiken ICDE 2010 Talk by S. Sudarshan, IIT Bombay Some slides from original talk by Zhou et al. 1.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
Ingres Plus X100 Equals Ingres Vectorwise. Agenda  Why?  Introduction to Vectorwise  Groundwork  Vectorwise and OPF  Vectorwise and QEF.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Benchmarking Advice & Recommendations August 2008.
1 Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
Getting Started With Ingres VectorWise
Analyzing Plan Diagrams of Database Query Optimizers Naveen Reddy Jayant Haritsa Database Systems Lab Indian Institute of Science Bangalore, INDIA.
CURE for Cubes: C ubing U sing a R OLAP E ngine Konstantinos Morfonios Yannis Ioannidis University of Athens VLDB 2006.
Ingres/VectorWise Doug Inkster – Ingres Development.
CS Data Warehouse & Performance Tuning Xiaofang Zhou School of Computing, NUS Office: S URL:
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
© Dennis Shasha, Alberto Lerner, Philippe Bonnet 2004 DBMS Performance Monitoring.
1 Schema Refinement, Normalization, and Tuning. 2 Design Steps v The design steps: 1.Real-World 2. ER model 3. Relational Schema 4. Better relational.
DAQ: A New Paradigm for Approximate Query Processing Navneet Potti Jignesh Patel VLDB 2015.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp With additional slides from.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Semantic Query Optimization Techniques November 16, 2005 By : Mladen Kovacevic.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
Database Techniek Martin Kersten Peter Boncz CWI.
© 1999 FORWISS FORWISS MISTRAL Performance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical.
PrefJoin: An Efficient Preference- Aware Join Operator Mohamed E. Khalefa Mohamed F. Mokbel Justin Levandoski.
Schema Tuning. Outline Database design: Normalization –Problem of redundancy –Why? Functional dependency –How to solve? Decomposition –Objective of the.
Exploiting Asynchronous IO using the Asynchronous Iterator Model Suresh Iyengar * S. Sudarshan Santosh Kumar # Raja Agrawal & IIT Bombay Current affiliations:
Multi-Way Hash Join Effectiveness M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence 2.
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
POP/FED: Progressive Query Optimization for Federated Queries in DB2 Wook-Shin Han, Volker Markl, Stephan Ewen Vijayshankar Raman, Holger Kache Goal: Add.
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Random Sampling in Database Systems: Techniques and Applications Ke Yi Hong Kong University of Science and Technology Big Data.
What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a.
Some TPC-H queries on Teradata and PostgreSQL
Wander Join: Online Aggregation via Random Walks
A Resource-minimalist Flow Size Histogram Estimator
SQL Server: A Data Platform for Large-Scale Applications
Teradata Physical Implementation – Case Study
Optimizing Queries Using Materialized Views
Physical Storage materialized views indexes partitions ETL CDC.
LEARNING & DEVELOPMENT STRATEGY: PROCESS OVERVIEW
Presentation transcript:

Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams2VLDB´99 Outline oMotivating Example oStandard Hash Teams oGeneralized Hash Teams for Joins oGeneralized Hash Teams for Joins/Grouping oFalse Drops Analysis oApplication Examples (TPC-D) oPerformance Evaluation

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams3VLDB´99 Traditional Join Plan Result R S A A T R S T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams4VLDB´99 Traditional Hash Team Join Plan [Graefe, Bunker, Cooper: VLDB 98] R S A A T Result A AR.A S.A T.A R A A S T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams5VLDB´99 Generalized Hash Teams R B A S T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams6VLDB´99 Generalized Hash Teams R B A S T R B A S T 6 mod 5 =1 Partition on B odd: yellow even: green

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams7VLDB´99 Generalized Hash Team for Grouping/Aggregation oselect c.City, sum(o.Value) from Customer c, Order o where c.C# = o.C# group by c.City Agg Bit- maps (BM) Order Customer Ptn on C# Ptn on City Order Customer Ptn on City Ptn on BM Agg Join and grouping team

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams8VLDB´99 Group (Customer Order ) C# City Customer Order C# City C# Partition on City and generate bitmaps for C# Partition with bitmaps for C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams9VLDB´99 Group (Customer Order Lineitem) C# City O# Customer Order Lineitem O# C# City C# O# Partition on City and generate bitmaps for C# Partition with bitmaps for O# Partition with bitmaps for C# and generate bitmaps for O#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams10VLDB´99 False Drops R B A S T R B A S T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams11VLDB´99 Overlapping Partitions T S R Customer Order Lineitem Partition on C# and generate bitmaps for O# Partition with Bitmaps Partition on B and generate bitmaps for A Partition based on the bitmaps for A (Customer Order Lineitem) C#O#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams12VLDB´99 Applicability of Generalized Hash Teams for partitioning hierarchical structures A B Partition on B Partition on bitmaps for A but it is also correct for non-strict hierarchies A B (but performance deteriorates)

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams13VLDB´99 Non-strict hierarchy A B R B A S T R B A S T T S R

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams14VLDB´99 False Drops Estimation b: cardinality of the bitmaps n: number of partitions probability that some s sets a bit leading to a false drop of an r into a particular partition: total number of false drops: conservative approximation:

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams15VLDB´99 Implementation Details: Fine Tuning the Partitioning Bitmaps Bloom-Filter [Bratbergsengen] [Valduriez]

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams16VLDB´99 Implementation Details: Teaming up Join and Grouping Group (Customer Order ) C# City Customer Order C# City C# Partition on City and generate bitmaps for C# Partition with bitmaps for C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams17VLDB´99 Teaming Up Join and Grouping: Build Phase 5 PA M

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams18VLDB´99 5 PA M Teaming Up Join and Grouping: Probe Phase

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams19VLDB´99 Performance Comparison: Group (Customer Order ) C# City Memory [MB]

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams20VLDB´99 False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams21VLDB´99 Performance Comparison: Group (Customer Order Lineitem) C# City O# Memory [MB]

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams22VLDB´99 False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams23VLDB´99 Conclusion and Future Work oLook-Ahead Partitioning for Joins and Grouping oApplicable for hierarchical data structures ocorrectness does not depend on strict hierarchies oApplicable for several TPC-D (TPC-H and TPC-R) queries: e.g., Q5, Q10, Q18 oCombining Generalized Hash Teams and Order Preserving Hash Joins (OHJ)

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams24VLDB´99 TPC-D Q5 SELECT N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = '[region]' AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR GROUP BY N_NAME ORDER BY REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams25VLDB´99 TPC-D Q10 SELECT C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM CUSTOMER, ORDER, LINEITEM, NATION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH AND L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT ORDER BY REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams26VLDB´99 Indirectly Partitioning a Hierarchical Structure Lineitem Order Customer O# C# City Partition 1 Partition 3Partition 2