Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Slides:



Advertisements
Similar presentations
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Advertisements

1 Lecture 23: Query Execution Friday, March 4, 2005.
Join Processing in Database Systems with Large Main Memories ACM Transactions on Database Systems Vol. 11, No. 3, Sep 1986 Leonard D. Shapiro Donghui Zhang,
Lecture-19 ETL Detail: Data Cleansing
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Data Warehousing 1 Lecture-25 Need for Speed: Parallelism Methodologies Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-5 Types & Typical Applications of DWH Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Lecture 24: Query Execution Monday, November 20, 2000.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
DWH-Ahsan Abdullah 1 Data Warehousing Lab Lect-1 DTS: Introduction Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Lecture-33 DWH Implementation: Goal Driven Approach (1)
Chapter 5 Parallel Join 5.1Join Operations 5.2Serial Join Algorithms 5.3Parallel Join Algorithms 5.4Cost Models 5.5Parallel Join Optimization 5.6Summary.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Lecture-1 Introduction and Background
DWH-Ahsan Abdullah 1 Data Warehousing Lab Lect-2 Lab Data Set Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-17 Issues of ETL Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-37 Case Study: Agri-Data Warehouse Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Ahsan Abdullah 1 Data Warehousing Lecture-7De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-4 Introduction and Background Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Ahsan Abdullah 1 Data Warehousing Lecture-18 ETL Detail: Data Extraction & Transformation Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. &
Ahsan Abdullah 1 Data Warehousing Lecture-9 Issues of De-normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
1 Data Warehousing Lecture-14 Process of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-20 Data Duplication Elimination & BSN Method Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-2 Introduction and Background Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Ahsan Abdullah 1 Data Warehousing Lecture-16 Extract Transform Load (ETL) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 CPS216: Data-intensive Computing Systems Operators for Data Access (contd.) Shivnath Babu.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-29 Brief Intro. to Data Mining Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
CS4432: Database Systems II Query Processing- Part 3 1.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-22 DQM: Quantifying Data Quality Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Ahsan Abdullah 1 Data Warehousing Lecture-6Normalization Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
Ahsan Abdullah 1 Data Warehousing Lecture-8 De-normalization Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-21 Introduction to Data Quality Management (DQM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof.
Lecture-3 Introduction and Background
Lecture-32 DWH Lifecycle: Methodologies
Naveed Iqbal, Assistant Professor (Lecture Slides Week # 15)
Evaluation of Relational Operations
Lecture-38 Case Study: Agri-Data Warehouse
Lecture-35 DWH Implementation: Pitfalls, Mistakes, Keys
Digital Signal Processing
Presentation transcript:

Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad

Data Warehousing 2 Need for Speed: Join Techniques

Data Warehousing 3 Background

4 About Nested-Loop Join

Data Warehousing 5 FOR i = 1 to N DO BEGIN /* N rows in T1*/ IF i th row of T1 qualifies THEN BEGIN For j = 1 to M DO BEGIN /* M rows in T2*/ IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN produce output row produce output row END END Nested-Loop Join: Code GOES TO GRAPHICS

Data Warehousing 6 “What is the average GPA of undergraduate male students?” For each qualifying row of Personal table, Academic table is examined for matching rows. Student Personal TableStudent Academic Table Nested-Loop Join: Working Example Results Search Results Search Results Search GOES TO GRAPHICS

Data Warehousing 7 Nested-Loop Join: Order of Tables

Data Warehousing 8 Nested-Loop Join: Cost Formula Join cost = Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A  Blocks of Table_B to be scanned for each qualifying row OR Join cost = Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A  Blocks accessed for Table_B GOES TO GRAPHICS

Data Warehousing 9 Nested-Loop Join: Cost of reorder Table_A = 500 blocks and Table_B = 700 blocks. Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100 Join cost A&B =  700 = 35,500 I/Os Join cost B&A =  500 = 50,700 I/Os i.e. an increase in I/O of about 43%. GOES TO GRAPHICS

Data Warehousing 10 Nested-Loop Join: Variants

Data Warehousing 11 Sort-Merge Join

Data Warehousing 12 Sort-Merge Join: Process

Data Warehousing Table_A Table_B Sort-Merge Join Example

Data Warehousing 14 Sort-Merge Join: Note

Data Warehousing 15 Hash-Based join

Data Warehousing 16 Hash-Based Join: Working

Data Warehousing 17 Hash-Based Join: Example Table_B on disk Disk Original Relation Table_A hash function h Join Result... Table_B M N N Table_A in main memory MAIN MEMORY GOES TO GRAPHICS

Data Warehousing 18 Hash-Based Join: Large “small” Table

Data Warehousing 19 Hash-Based Join: Partition Skew

Data Warehousing 20 Hash-Based Join: Intrinsic Skew