Data Partitioning in VLDB Tal Olier

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

What is a Database By: Cristian Dubon.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
CS 540 Database Management Systems
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Query Processing & Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
10/3/2000SIMS 257: Database Management -- Ray Larson Relational Algebra and Calculus University of California, Berkeley School of Information Management.
AN INTRODUCTION TO EXECUTION PLAN OF QUERIES These slides have been adapted from a presentation originally made by ORACLE. The full set of original slides.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Database Lecture # 1 By Ubaid Ullah.
Project Implementation for COSC 5050 Distributed Database Applications Lab2.
Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Basic SQL. Implementation Schemes: Once we have a set of relation schemes, we can translate them into implementation schemes. We use to express implementation.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
10/17/2012ISC471/HCI571 Isabelle Bichindaritz 1 Technologies Databases.
1 Chapter 14 DML Tuning. 2 DML Performance Fundamentals DML Performance is affected by: – Efficiency of WHERE clause – Amount of index maintenance – Referential.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
SQL Performance and Optimization l SQL Overview l Performance Tuning Process l SQL-Tuning –EXPLAIN PLANs –Tuning Tools –Optimizing Table Scans –Optimizing.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
David Konopnicki –1997, Rev. MS Optimizing Join Statements To choose an execution plan for a join statement, the optimizer must choose: ä Access.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Chapter 1: Introduction. 1.2 Database Management System (DBMS) DBMS contains information about a particular enterprise Collection of interrelated data.
 MySQL  DDL ◦ Create ◦ Alter  DML ◦ Insert ◦ Select ◦ Update ◦ Delete  DDL(again) ◦ Drop ◦ Truncate.
SQL Basics Review Reviewing what we’ve learned so far…….
CS 540 Database Management Systems
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Evaluation of Relational Operations
Tools for Memory: Database Management Systems
Database Management Systems (CS 564)
Physical Join Operators
File Processing : Query Processing
File Processing : Query Processing
Relational Operations
Lecture 2- Query Processing (continued)
Advance Database Systems
Introduction to Execution Plans
Implementation of Relational Operations
Lecture 13: Query Execution
EXECUTION PLANS Quick Dive.
Introduction to Execution Plans
Introduction to Execution Plans
Presentation transcript:

Data Partitioning in VLDB Tal Olier

Why am I here? Tal Olier – ~15 years in various software development positions. All of them involved database practice. I work in HP Software, I love working there and I came to tell you this; the lecture is just an excuse getting me into the building :)

Agenda RDBMS in short (basic terms) SQL reminder A bit about (RDBMS) architecture Performance - access paths What is table join VLDB - the size factor VLDB - industry practice How joins are executed Summary

R elational D atabase M anagement S ystem

A little history Was invented in 1970 – By Edgar Frank "Ted" Codd – In IBM labs – Oracle emerged first to the market

Basics – a table Rows Columns Primary key Emp_idEmp_nameSalary 1Dany10,000 2Yosi20,000 3Moshe30,000 4Eli40,000

Basics – a relation A foreign key (constraint) A reference – Source table – Source column/s – Target table – Target column/s

People example People: name, height, smoking, father Books read: title, author Schedule details: from, to, activity Resume details: from, to, salary

People example

S tructured Q uery L anguage

Query language SQL – Structured Query Language Declarative (vs. procedural) Requires Internal optimization

SELECT query structure SELECT FROM… JOIN WHERE GROUP BY HAVING ORDER BY

SQL modules DML (+Select) – Data manipulation language DDL – Data definition language TC – Transaction controls (commit/rollback) DCL – Data control language (grant/revoke) PE – Procedural extensions

A bit about architecture

Database server Memory Process I/O System Client Process Data Files Log Files Server Process Buffer cache Log cache Other cache Everything is blocks

IO bound vs. CPU bound CPU – what is it consumed for? IO – what is it consumed for?

Performance?

FTS – full table scan Scan the whole table – from top to bottom

B Tree index B tree – allows great spanning that derives small tree height

B+ tree The leaves are organized in a doubly linked list B+ tree – allows searching through all values by searching the leaf level only

Database index Data is sorted according to the index columns The leaf contain pointers to rows in the table Search of 1 value in a tree - o (log n) Smaller index height in B+ trees Index (database) operations: – Add/remove values – Index seek – Index scan

Index seek/scan …

Join (logical)

Inner join Use join predicate to match rows from 2 table: A and B Each row in table A is compared to each row in table B to find the pairs of rows that satisfy the join predicate Than column values for each matched pairs are combined into a result row

dept_idDept_nam e 1Sales 2Engineering 3Marketing department employee Emp_nameDept_id Rina1 Moshe2 Shira2 Yossinull emp_dept_ id emp_namedept_dep t_id dept_name 1Rina1Sales 1Rina2Engineering 1Rina3Marketing 2Moshe1Sales 2Moshe2Engineering 2Moshe3Marketing 2Shira1Sales 2Shira2Engineering 2Shira3Marketing nullYossi1Sales nullYossi2Engineering nullYossi3Marketing Cartesian product

Equi join A inner join that uses equality comparison in the join predicate Example: select * from employee emp join department dept on emp.dept_id = dept.dept_id

Equi join OK emp_dept_i d emp_namedept_dept _id dept_name 1Rina1Sales 1Rina2Engineering 1Rina3Marketing 2Moshe1Sales 2Moshe2Engineering 2Moshe3Marketing 2Shira1Sales 2Shira2Engineering 2Shira3Marketing nullYossi1Sales nullYossi2Engineering nullYossi3Marketing

RDBMS – summary in a nutshell Tables References Joins Indexes Blocks I/O

V ery L arge D ata B ase

RDBMS – summary in a nutshell Tables References Indexes Blocks I/O

VLDB – a table – size factor

Use case: Sales Information Table: – Customer name – Order number – Order date and time – List of items, amount and prices

Order details ( )

Remove 2007’s orders …

Order details kept in 4 tables 2007 … 2008 … 2009 … 2010 …

… 4 tables – remove 2007’s data 2007 … 2008 … 2009 … 2010 …

Union view Select * from t2007 Union all Select * from t2008 Union all Select * from t2009 Union all Select * from t2010

Order details kept in 4 tables and a view 2007 … 2008 … 2009 … 2010 …

Partitioned table 2007 … 2008 … 2009 … 2010 …

Get back to: Remove 2007’s data? 2007 … 2008 … 2009 … 2010 …

Impact on index behavior … 2007 … 2008 … 2009 … 2010 …

Partitioned index (local index) … 2007 … 2008 … 2009 … 2010 …

Local indexes Index is bound to it’s partition Drop partition derives drop index Smaller index heights Index is always usable Harder to maintain uniqueness with it

Partitioned table - concepts Partition column is the key for dividing the data Performance – only relevant partitions used Add/drop partition – DDL Local index – index is bound to a partition

Star schema

Data tables block T a b l e - A T a b l e - B T a b l e - C

Let’s get back to our partitioned table 2007 … 2008 … 2009 … 2010 …

Dimension referencing YearCust_id… … … Cust_idNameSerial signature… … 1715Bank of America Ltd/FFAA … … … …

Making fact tables thin 2007 … 2008 … 2009 … 2010 … Dimension

Join (physical)

Join To perform a join the optimizer need make the following decisions: Access path how to access each table Join order if more than 2 tables/views are joined, which join to do first Join method for each pair of row resource how to perform the join

Join methods– nested loop One input is the outer loop, the other input is the inner loop The inner loop is executed for each row in the outer loop Effective when – The outer loop is small – The inner loop is pre indexed

Join methods– hash The smaller of the 2 inputs is named the build input The second is probe input Hash table is build from build input Each row in the build input is put in the appropriate bucket The entire probe input is scanned

Join methods– hash cont’ For each row the hash value is calculated The corresponding hash bucket is scanned to find matched rows in the build input Good for joining large amount of data

Join methods– merge There is no concept of driving table Both input sources are sorted according the join key ( or use sorted source such as index) The sorted lists are merged together The merge itself is very fast, but it can be expensive to sort the sources

Summary It’s all about I/O Star schema – facts and dimensions Partitions + local indexes SQL joins (probably hash)

Q & A