Download presentation
Presentation is loading. Please wait.
Published byLauren Kelley Modified over 9 years ago
1
Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com
2
Data Warehousing 2 Need for Speed: Join Techniques
3
Data Warehousing 3 Background
4
4 About Nested-Loop Join
5
Data Warehousing 5 FOR i = 1 to N DO BEGIN /* N rows in T1*/ IF i th row of T1 qualifies THEN BEGIN For j = 1 to M DO BEGIN /* M rows in T2*/ IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN produce output row produce output row END END Nested-Loop Join: Code GOES TO GRAPHICS
6
Data Warehousing 6 “What is the average GPA of undergraduate male students?” For each qualifying row of Personal table, Academic table is examined for matching rows. Student Personal TableStudent Academic Table 298----------------- ---------------------- ---------------------- 62------------------ ---------------------- ---------------------- 440------------------ Nested-Loop Join: Working Example Results Search Results Search Results Search GOES TO GRAPHICS
7
Data Warehousing 7 Nested-Loop Join: Order of Tables
8
Data Warehousing 8 Nested-Loop Join: Cost Formula Join cost = Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A Blocks of Table_B to be scanned for each qualifying row OR Join cost = Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A Blocks accessed for Table_B GOES TO GRAPHICS
9
Data Warehousing 9 Nested-Loop Join: Cost of reorder Table_A = 500 blocks and Table_B = 700 blocks. Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100 Join cost A&B = 500 + 50 700 = 35,500 I/Os Join cost B&A = 700 + 100 500 = 50,700 I/Os i.e. an increase in I/O of about 43%. GOES TO GRAPHICS
10
Data Warehousing 10 Nested-Loop Join: Variants
11
Data Warehousing 11 Sort-Merge Join
12
Data Warehousing 12 Sort-Merge Join: Process
13
Data Warehousing 13 11222455566666781122245556666678 13344455666677771334445566667777 Table_A Table_B 11222455566666781122245556666678 13344455666677771334445566667777 11222455566666781122245556666678 13344455666677771334445566667777 Sort-Merge Join Example
14
Data Warehousing 14 Sort-Merge Join: Note
15
Data Warehousing 15 Hash-Based join
16
Data Warehousing 16 Hash-Based Join: Working
17
Data Warehousing 17 Hash-Based Join: Example Table_B on disk Disk Original Relation Table_A hash function h Join Result... Table_B M N N 2 1...... 1 2...... Table_A in main memory MAIN MEMORY GOES TO GRAPHICS
18
Data Warehousing 18 Hash-Based Join: Large “small” Table
19
Data Warehousing 19 Hash-Based Join: Partition Skew
20
Data Warehousing 20 Hash-Based Join: Intrinsic Skew
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.