M.Kersten Dec 31, 20041 Cracking the database store The far side of the Moon Martin Kersten, Stefan Manegold Centre for Mathematics and Computer Science.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Unit 1:Parallel Databases
Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph Freytag.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
A Paper on RANDOM SAMPLING OVER JOINS by SURAJIT CHAUDHARI RAJEEV MOTWANI VIVEK NARASAYYA PRESENTED BY, JEEVAN KUMAR GOGINENI SARANYA GOTTIPATI.
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
HANA database lectures March ©2013 SAP AG or an SAP affiliate company. All rights reserved.2 Outline Part 1 Motivation - Why main memory processing.
CS 540 Database Management Systems
Parallel Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Physical Database Monitoring and Tuning the Operational System.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Processing & Optimization
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Overview of a Database Management System
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz Data Distilleries B.V. Amsterdam The Netherlands Stefan.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
Deferred Maintenance of Disk-Based Random Samples Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University of Technology Dresden)
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Executing SQL over Encrypted Data in Database-Service-Provider Model Hakan Hacigumus University of California, Irvine Bala Iyer IBM Silicon Valley Lab.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Distributed Query Processing. Agenda Recap of query optimization Transformation rules for P&D systems Memoization Queries in heterogeneous systems Query.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
©Silberschatz, Korth and Sudarshan20.1Database System Concepts 3 rd Edition Chapter 20: Parallel Databases Introduction I/O Parallelism Interquery Parallelism.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
CERN 21 January 2005Piotr Nyczyk, CERN1 R-GMA Basics and key concepts Monitoring framework for computing Grids – developed by EGEE-JRA1-UK, currently used.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Column Oriented Database By: Deepak Sood Garima Chhikara Neha Rani Vijayita Gumber.
Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost.
Database cracking Stratos Idreos, Martin Kersten and Stefan Manegold
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Efficient Evaluation of XQuery over Streaming Data
CS 540 Database Management Systems
Physical Changes That Don’t Change the Logical Design
Parallel Databases.
Chapter 15 QUERY EXECUTION.
April 30th – Scheduling / parallel
1 Demand of your DB is changing Presented By: Ashwani Kumar
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Self-organizing Tuple Reconstruction in Column-stores
Presentation transcript:

M.Kersten Dec 31, Cracking the database store The far side of the Moon Martin Kersten, Stefan Manegold Centre for Mathematics and Computer Science Amsterdam

M.Kersten Dec 31, The Moon The dark side of the moon

M.Kersten Dec 31, The Moon The far side of the moon Database research tends to look at just one side of the moon

M.Kersten Dec 31, Duality issues in Science Physics Matter and anti-matter Mathematics A graph and its dual – graph Biology The DNA string of pairs Computer science ??? Database technology ?? What is the duality architecture for query dominant settings

M.Kersten Dec 31, Outline Database processing problem the far side of a DBMS architecture Cracking the store issues Keeping track of decisions Optimizer issues A multi-step query benchmark You can’t improve what you can’t measure Realization & evaluation Legacy technology blocks progress …? Outlook

M.Kersten Dec 31, The moon

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr create table

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr insert into table

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr scan select * from table where pred optimize

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr create index on table scan

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr scan optimize select * from table where pred

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr Insert into table scan

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr scan optimize Observations: The DBA decides on the indices Maintenance cost is taken during update Queries have ‘uniform’ good access select * from table where pred

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr Table mgr Qry mgr SQL mgr create table

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr insert into table Table mgr Qry mgr SQL mgr insert into table

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr select * from table where pred Table mgr Qry mgr SQL mgr select * from table where pred scan Optimize access Optimize access & Reorganize table

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr Create index on table Table mgr Qry mgr SQL mgr select * from table where pred Q1 answer rest

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr select * from table where pred Table mgr Qry mgr SQL mgr select * from table where pred Q1 answer rest optimize Optimize & reorganize

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr select * from table scan Table mgr Qry mgr SQL mgr select * from table Q1 optimize

M.Kersten Dec 31, DBMS architecture Table mgr Qry mgr SQL mgr Insert into table scan Table mgr Qry mgr SQL mgr Insert into table Q1

M.Kersten Dec 31, DBMS architecture Observations: The DBA decides on the indices Maintenance cost is taken during update Queries have ‘uniform’ good access Observations: The DBA does not decide on the indices Maintenance cost is taken during query Updates have ‘uniform’ good access

M.Kersten Dec 31, This is crazy Reorganization is utterly expensive This ultimately leads to 1-tuple tables (partitions) Better to have many (update) users pay less then one (query) user a lot It defeats the role of a query optimizer…. It does not fit the Volcano-style query processor.. It just doesn’t work that way…….

M.Kersten Dec 31, What if it isn’t crazy? Database hotspot is properly indexed with fast access, incrementally faster cracking Simplifies the query optimizer to finding the right piece, query tracks are carved in the database Natural fragmentation appears for use in a grid setting Supports incremental construction using ordinary distributed database techniques

M.Kersten Dec 31, Cracking the database store Research hypothesis: It is feasible to take database cracking as a basis for physical database organization It can be made performance competitive CIDR contribution: How to keep track of the database parts ? What are the optimizer issues ? Can we measure performance improvements ? Simulation using micro-benchmark ? How expensive is it to save a result in a new table? What kernel extensions are required ?

M.Kersten Dec 31, Micro-benchmark - Simulation result confirm theoretical expectation

M.Kersten Dec 31, Cracker lineage Cracking can be aligned with the relational algebra operators Psi-cracking produces two vertical fragments for each projection Phi-cracking produces two horizontal fragments for each selection Diamond-cracking produces the derived fragmentation for each join Omega-cracking a horizontal fragmentation based on the grouping attributes …

M.Kersten Dec 31, Cracker lineage Select * from R where R.a<10

M.Kersten Dec 31, Cracker lineage Select * from R where R.a<10 Select * from R,S where R.k=S.k and R.a<5

M.Kersten Dec 31, Cracker lineage Select * from R where R.a<10 Select * from R,S where R.k=S.k and R.a<5 Select * from S where S.b>25

M.Kersten Dec 31, Cracker lineage Select * from R where R.a<10 Select * from R,S where R.k=S.k and R.a<5 Select * from S where S.b>25

M.Kersten Dec 31, Cracker lineage Arbitrary cracking an n-ary relation results in an exponential number of pieces Every projection produces 2 pieces Every selection produces >=2 pieces Every equi join produces 4 pieces Every aggregate produces K pieces Cracking the database store calls for optimization decisions To limit the number of fragments To reduce the reorganization cost To avoid cracker administration overhead This optimization issue is still an open area for research How to measure progress?

M.Kersten Dec 31, A multi-step query benchmark You can’t improve what you can’t measure Requirements: Simple database structure Scaleable Controllable generation of multi-query sequences Examples: Home run Walker Strolling

M.Kersten Dec 31, A multi-step query benchmark Sequences are controlled by length and contraction factor Homerun:

M.Kersten Dec 31, Micro-benchmark MonetDB/SQL0.34 N44 MySQL25.1 N238 PostgreSQL10.6 N1230 Commercial39.0 N800 In milliseconds/K Fixed cost in milleseconds Keeping the query result in a new table is often too expensive A light-weight index structure is needed!

M.Kersten Dec 31, Realization & evaluation Cracking produces a lot of fragments to be glued together using union and join. MySQL, PostgreSQL,.. Call for large investment to handle lengthy joins A cracker index with supportive operations is a necessity !

M.Kersten Dec 31, Realization & evaluation Realization of a cracker index in MonetDB/SQL About 5 pages of C Homerun experiment Strolling experiment Cracker index works! Cumulative cost Below sorting Better than naive

M.Kersten Dec 31, Future research Cracking becomes an integral part of the MonetDB 5.0 experimentation platform to control resource management It is the basis for organically distributed databases Many, many implementation and optimization issues When to stop cracking ? When to fuse pieces that become too small ? ….

M.Kersten Dec 31, Conclusions Cracking a database store is a paradigm wide open for further detailed investigation It complements current technology The far side of the moon

M.Kersten Dec 31, Conclusions MonetDB 4.4 is available fully functional SQL DBMS ODBC,JDBC,Perl,Python,… Embedded version XQuery officially release scheduled for March’05 And on sourceforge The far side of the moon

M.Kersten Dec 31,