Data Warehousing 1 Lecture-28 Need for Speed: Join Techniques Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research National University of Computers & Emerging Sciences, Islamabad
Data Warehousing 2 Need for Speed: Join Techniques
Data Warehousing 3 Background
4 About Nested-Loop Join
Data Warehousing 5 FOR i = 1 to N DO BEGIN /* N rows in T1*/ IF i th row of T1 qualifies THEN BEGIN For j = 1 to M DO BEGIN /* M rows in T2*/ IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the i th row of T1 matches to j th row of T2 on join key THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN IF the j th row of T2 qualifies THEN BEGIN produce output row produce output row END END Nested-Loop Join: Code GOES TO GRAPHICS
Data Warehousing 6 “What is the average GPA of undergraduate male students?” For each qualifying row of Personal table, Academic table is examined for matching rows. Student Personal TableStudent Academic Table Nested-Loop Join: Working Example Results Search Results Search Results Search GOES TO GRAPHICS
Data Warehousing 7 Nested-Loop Join: Order of Tables
Data Warehousing 8 Nested-Loop Join: Cost Formula Join cost = Join cost = Cost of accessing Table_A + # of qualifying rows in Table_A Blocks of Table_B to be scanned for each qualifying row OR Join cost = Join cost = Blocks accessed for Table_A + Blocks accessed for Table_A Blocks accessed for Table_B GOES TO GRAPHICS
Data Warehousing 9 Nested-Loop Join: Cost of reorder Table_A = 500 blocks and Table_B = 700 blocks. Qualifying blocks for Table_A QB(A) = 50 Qualifying blocks for Table_B QB(B) = 100 Join cost A&B = 700 = 35,500 I/Os Join cost B&A = 500 = 50,700 I/Os i.e. an increase in I/O of about 43%. GOES TO GRAPHICS
Data Warehousing 10 Nested-Loop Join: Variants
Data Warehousing 11 Sort-Merge Join
Data Warehousing 12 Sort-Merge Join: Process
Data Warehousing Table_A Table_B Sort-Merge Join Example
Data Warehousing 14 Sort-Merge Join: Note
Data Warehousing 15 Hash-Based join
Data Warehousing 16 Hash-Based Join: Working
Data Warehousing 17 Hash-Based Join: Example Table_B on disk Disk Original Relation Table_A hash function h Join Result... Table_B M N N Table_A in main memory MAIN MEMORY GOES TO GRAPHICS
Data Warehousing 18 Hash-Based Join: Large “small” Table
Data Warehousing 19 Hash-Based Join: Partition Skew
Data Warehousing 20 Hash-Based Join: Intrinsic Skew