Slides adapted from Donghui Zhang, UC Riverside

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Join Processing in Database Systems with Large Main Memories ACM Transactions on Database Systems Vol. 11, No. 3, Sep 1986 Leonard D. Shapiro Donghui Zhang,
Join Processing in Databases Systems with Large Main Memories
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 11 External Sorting.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
External Sorting R & G Chapter 11 One of the advantages of being disorderly is that one is constantly making exciting discoveries. A. A. Milne.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
External Sorting 198:541. Why Sort?  A classic problem in computer science!  Data requested in sorted order e.g., find students in increasing gpa order.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Example. Bulk Nested-Loop Joins Using Buffers: e.g. 22 blocks.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Sorting.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Lecture 24 Query Execution Monday, November 28, 2005.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 6.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Join Processing in Database Systems with Large Main Memories (part 2)
Implementation of Relational Operations (Part 2)
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
External Joins Query Optimization 10/4/2017
Selected Topics: External Sorting, Join Algorithms, …
CS222P: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Lecture 2- Query Processing (continued)
CS222: Principles of Data Management Lecture #10 External Sorting
CS505: Intermediate Topics in Database Systems
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Wednesday, 5/8/2002 Hash table indexes, physical operators
CS222P: Principles of Data Management Lecture #10 External Sorting
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #09 External Sorting Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Slides adapted from Donghui Zhang, UC Riverside JOIN PROCESSING E0 261 Prasad Deshpande, Jayant Haritsa Computer Science and Automation Indian Institute of Science Slides adapted from Donghui Zhang, UC Riverside

Today’s Paper Computing joins in systems with “large” amounts of main memory From |S| to |S| ACM Transactions on Database Systems Vol. 11, No. 3, Sep 1986 Basic Issues IO and computational costs How to use available memory to minimize these costs

External Sort Merge Join Average run length = 2*|S|

External Sort-Merge Join (cont.) Optimization: omit the final pass of merge sort by pipelining the sort result to join; If buffer size  , can sort by reading R and S twice; E.g. page size=8KB, each relation has 10,000 pages (80MB), buffer size=100 pages (<1MB), two passes are enough.

Cost of Sort-Merge Make use of any extra memory beyond |S| to save IO

Classic Hash Join Works when the smaller relation R fits in memory. Build a in-memory hash table for the smaller relation; For each record in the larger relation, probe the hash table.

Simple Hash Join for each logical bucket j for each record r in R if r is in bucket j then insert r into the hash table; for each record s in S if s is in bucket j then probe the hash table; Classic hash join is a special case, with one bucket; Optimization: write the tuples not in bucket j to disk; Works good when memory is large (nearly as large as |R|).

Simple Hash Number of passes =

Cost of Simple Hash

GRACE Hash Join partition R into n buckets so that each bucket fits in memory; partition S into n buckets; for each bucket j do for each record r in Rj do insert into a hash table; for each record s in Sj do probe the hash table. Works good when memory is small.

Grace Hash

Cost of Grace Hash

Hybrid Hash Join Hybrid of simple hash join and GRACE; When partitioning R, keep the records of the first bucket in memory as a hash table; When partitioning S, for records of the first bucket, probe the hash table directly; Saving: no need to write R1 and S1 to disk or read back to memory. Works good for large and small memory.

Hybrid Hash Join

Hybrid Hash Join Cost q=

Comparison Hybrid dominates simple hash Hybrid dominates GRACE hash Grace dominates Sort-Merge In terms of computation cost

Handle Partition Overflow Case 1, overflow on disk: an R partition is larger than memory size (note: don’t care about the size of S partitions). Solution (a) small partitions first and combine before join; Solution (b) recursive partition. Case 2, overflow in memory: the in-memory hash table of R becomes too large. Solution: revise the partitioning scheme and keep a smaller partition in memory.

Conclusions Addressed equi-join problem in the external-memory environment; With decreasing cost of memory, hash-based join is better than nested-loop and sort-merge joins; Proposed three hash-based algorithms (simple hash join, GRACE join and hybrid join), out of which the hybrid hash join is the best.

END JOINS E0 261