CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 40T1 60T2 30T3 10T4 20T5 10T6 60T7 40T8 20T9 R S C C R JOIN S?
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
15.3 Nested-Loop Joins By: Saloni Tamotia (215). Introduction to Nested-Loop Joins  Used for relations of any side.  Not necessary that relation fits.
Cs4432optimization1 CS4432: Database Systems II Lecture #18 Query Optimizer – Wrap Up Professor Elke A. Rundensteiner.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Lecture 24: Query Execution Monday, November 20, 2000.
Query Processing and Optimization
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
Nested Loops Joins Book Section of chapter 15.3 Submitted to : Prof. Dr. T.Y. LIN Submitted by: Saurabh Vishal.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS186 Final Review Query Optimization.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
15.3 Nested-Loop Joins - Medha Pradhan - ID: CS 257 Section 2 - Spring 2008.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators) Shivnath Babu.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators) Shivnath Babu.
CSCE Database Systems Chapter 15: Query Execution 1.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 6.
1 Lecture 23: Query Execution Monday, November 26, 2001.
Fan Qi Database Lab 1, com1 #01-08 CS3223 Tutorial 5.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
CS 540 Database Management Systems
CS 440 Database Management Systems
Query Processing Exercise Session 4.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Lecture 16: Relational Operators
Database Management Systems (CS 564)
Database Management Systems (CS 564)
Yan Huang - CSCI5330 Database Implementation – Access Methods
Example R1 R2 over common attribute C
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
CS222: Principles of Data Management Lecture #10 External Sorting
Chapter 12 Query Processing (1)
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 11: B+ Trees and Query Execution
Lecture 24: Query Execution
Lecture 20: Query Execution
CS4432: Database Systems II
Presentation transcript:

CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner

CS 4432query processing - lecture 162 Join : R1 R2, R2 R1 –Iteration Join –Sort-Merge Join –Index-Join –Hash Join

CS 4432query processing - lecture 163 Factors that affect performance (1)Tuples of relation stored physically together? (2)Buffer-aware data movement (3)Relations sorted by join attribute? (3)Indexes exist?

CS 4432query processing - lecture 164 Example R1 R2 over common attribute C T(R1) = 10,000 T(R2) = 5,000 S(R1) = S(R2) = 1/10 block Memory available = 101 blocks  Metric: # of IOs (ignoring writing of result)

CS 4432query processing - lecture 165 Join : R1 R2, R2 R1 –Iteration (nested loops) –Merge join –Join with index –Hash join

CS 4432query processing - lecture 166 Iteration join (nested-loop join) for each r  R1 do for each s  R2 do if r.C = s.C then output r,s pair

CS 4432query processing - lecture 167 Example 1(a) Iteration Join R1 R2 Relations not contiguous Recall T(R1) = 10,000 T(R2) = 5,000 S(R1) = S(R2) =1/10 block MEM=101 blocks Cost: for each R1 tuple: [Read 1 R1 tuple + Read R2] Total =10,000 [1+5000]=50,010,000 IOs

CS 4432query processing - lecture 168 Can we do better? Use our memory (1)Read 100 blocks of R1 (2)Read all of R2 (using 1 block) + join (3)Repeat until done

CS 4432query processing - lecture 169 Cost: for each R1 chunk: Read chunk: 1000 IOs Read R2: 5000 IOs 6000 Total= 10,000 x ( ) =60,000 IOs 1,000

CS 4432query processing - lecture 1610 Does Reverse Join Order Matter?  Reverse join order: R2 R1 Total = 5000 x ( ,000) = x 11,000 = 55,000 IOs

CS 4432query processing - lecture 1611 Relations contiguous Example Iteration Join R2 R1 Cost For each R2 chunk: Read chunk: 100 IOs Read R1: 1000 IOs 1,100 Total= 5 chunks x 1,100 = 5,500 IOs

CS 4432query processing - lecture 1612 So far Iterate R2 R1 50,010,000 Be memory access aware: Iterate R2 R1 55,000 Blocked-access: Iterate R2 R1 5,500 contiguous not contiguous

CS 4432query processing - lecture 1613 Join : R1 R2, R2 R1 –Iteration (nested loops) –Sort-Merge join –Join with index –Hash join

CS 4432query processing - lecture 1614 Merge join (conceptually) (1) if R1 and R2 not sorted, sort them (2) i  1; j  1; While (i  T(R1))  (j  T(R2)) do if R1{ i }.C = R2{ j }.C then outputTuples else if R1{ i }.C > R2{ j }.C then j  j+1 else if R1{ i }.C < R2{ j }.C then i  i+1

CS 4432query processing - lecture 1615 Procedure Output-Tuples While (R1{ i }.C = R2{ j }.C)  (i  T(R1)) do [jj  j; while (R1{ i }.C = R2{ jj }.C)  (jj  T(R2)) do [output pair R1{ i }, R2{ jj }; jj  jj+1 ] i  i+1 ]

CS 4432query processing - lecture 1616 Example i R1{i}.CR2{j}.Cj

CS 4432query processing - lecture 1617 Example : Merge Join Both R1, R2 ordered by C; relations contiguous Memory R1 R2 ….. R1 R2 Total cost: Read R1 cost + read R2 cost = = 1,500 IOs

CS 4432query processing - lecture 1618 Example Merge Join R1, R2 not ordered, but contiguous --> Need to sort R1, R2 first…. HOW?

CS 4432query processing - lecture 1619 One way to sort: Merge Sort (i) For each 100 blk chunk of R: - Read chunk - Sort in memory - Write to disk sorted chunks Memory R1 R2...

CS 4432query processing - lecture 1620 (ii) Read all chunks + merge + write out Sorted file Memory Sorted Chunks...

CS 4432query processing - lecture 1621 Cost: Sort Each tuple is read,written, read, written so... Sort cost R1: 4 x 1,000 = 4,000 Sort cost R2: 4 x 500 = 2,000

CS 4432query processing - lecture 1622 Example Merge Join R1,R2 contiguous, but unordered Total cost = sort cost + join cost = 6, ,500 = 7,500 IOs

CS 4432query processing - lecture 1623 Comparison ? Iteration cost = 5,500 IOs Merge cost = 7,500 IOs Conclusion : so merge join does not pay off? True sometimes? or always ???

CS 4432query processing - lecture 1624 Where sort-merge join beats the iteration join. Next a case

CS 4432query processing - lecture 1625 For : R1 = 10,000 blocks contiguous R2 = 5,000 blocks not ordered Iterate: 5000 x (100+10,000) = 50 x 10, = 505,000 IOs Merge join: 5(10,000+5,000) = 75,000 IOs Merge Join (with sort) WINS!