Join Processing in Database Systems with Large Main Memories (part 2)

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Join Processing in Database Systems with Large Main Memories ACM Transactions on Database Systems Vol. 11, No. 3, Sep 1986 Leonard D. Shapiro Donghui Zhang,
Join Processing in Databases Systems with Large Main Memories
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Multiprocessing Memory Management
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
CS 540 Database Management Systems
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
Chapter 2 Memory and process management
Memory COMPUTER ARCHITECTURE
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Lecture 16: Data Storage Wednesday, November 6, 2006.
External Sorting Chapter 13
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Hash-Based Indexes Chapter 11
Chapter 12: Query Processing
Evaluation of Relational Operations
Chapter 8: Main Memory.
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Operating System Concepts
Implementation of Relational Operations (Part 2)
Relational Operations
Lecture#12: External Sorting (R&G, Ch13)
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
CS222P: Principles of Data Management Notes #12 Joins!
Database Query Execution
Module 2: Computer-System Structures
Hash-Based Indexes Chapter 10
Main Memory Background Swapping Contiguous Allocation Paging
CS222: Principles of Data Management Notes #12 Joins!
External Sorting Chapter 13
Selected Topics: External Sorting, Join Algorithms, …
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Module 2: Computer-System Structures
Chapter 12 Query Processing (1)
Implementation of Relational Operations
Slides adapted from Donghui Zhang, UC Riverside
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
General External Merge Sort
Wednesday, 5/8/2002 Hash table indexes, physical operators
Database Implementation Issues
Chapter 2: Computer-System Structures
Chapter 2: Computer-System Structures
Module 2: Computer-System Structures
Department of Computer Science
Module 2: Computer-System Structures
External Sorting Chapter 13
The Gamma Database Machine Project
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall Notes #11 Join!
CS222P: Principles of Data Management UCI, Fall 2018 Notes #11 Join!
Presentation transcript:

Join Processing in Database Systems with Large Main Memories (part 2) Yongjoo (Young) Park [1] Query Optimization for Parallel Execution, Ganguly et al. [2] Logic per track devices, Slotnick [3] Implementing a relational database by means of specialzed hardware, Bobb [4] Database System Concepts, 5ed, p.861-862 [5] Netezza_Appliance_Architecture_WP.pdf 11/12/2018 EECS 584, Fall 2009

Contents Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy 11/12/2018 EECS 584, Fall 2009

First Comparison of Four Join Methods Partition Overflow Other Tools Merge, Simple Hash, GRACE, Hybrid Partition Overflow Other Tools Memory Management Strategy 11/12/2018 EECS 584, Fall 2009

Assumptions Made No relations ordered or indexed Minimum memory size for Sort-merge for Hash Fixed real memory size 11/12/2018 EECS 584, Fall 2009

CPU + I/O Times Try to guess what’s what among.. GRACE Simple hash Sort Merge Hybrid hash Students are expected to answer that each index corresponds to which hashing methods 11/12/2018 EECS 584, Fall 2009

CPU + I/O Times Answer is Sort-merge Simple hash GRACE Hybrid hash 11/12/2018 EECS 584, Fall 2009

CPU + I/O Times Large memory area Simple hash dominates Hybrid acts like Simple hash 11/12/2018 EECS 584, Fall 2009

CPU + I/O Times Small memory area GRACE dominates Hybrid generally acts like GRACE It also use remaining memory 11/12/2018 EECS 584, Fall 2009

Comparisons Made In Paper 1st and 2nd are covered here Hybrid > Simple Hybrid > GRACE Hybrid > Merge GRACE > Merge 1st and 2nd are covered here 11/12/2018 EECS 584, Fall 2009

Hybrid vs. Simple 11/12/2018 EECS 584, Fall 2009

GRACE and Hybrid They behave slightly diff, when there’s remaining memory Comparison on next slides with: |R| = 1000, F = 1.4, |M| = 100 11/12/2018 EECS 584, Fall 2009

GRACE and Hybrid GRACE (1st phase): … Used as output buffer Each = 38 11/12/2018 EECS 584, Fall 2009

GRACE and Hybrid GRACE (1st phase modified): … Used to save passed-over Used as output buffer = 38 = 62 … 11/12/2018 EECS 584, Fall 2009

GRACE and Hybrid Hybrid (1st phase): … Used as Output buffer Used to build hash table = 86 = 14 … 11/12/2018 EECS 584, Fall 2009

GRACE and Hybrid GRACE: Hybrid: Saved = 62 blocks Saved = 86 blocks 11/12/2018 EECS 584, Fall 2009

Other Advantage of Hash over Merge Your thinking? 11/12/2018 EECS 584, Fall 2009

Other Advantage of Hash over Merge Good response time Easier to pipeline to next join [1] Wait until J1 finish, if Merge join Not ordered R1 R2 R3 J2 J1 11/12/2018 EECS 584, Fall 2009

Hybrid Always Win? Scenario that Sort-merge is better than Hybrid? If either or both relations are sorted. Partition overflow 11/12/2018 EECS 584, Fall 2009

Sort-merge Worth to Consider Scenario that Sort-merge is better than Hybrid Either relation is sorted Hard to estimate data distribution 11/12/2018 EECS 584, Fall 2009

Second Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy 11/12/2018 EECS 584, Fall 2009

Partition Overflow Previous assumption Ideal hash function that partition into same size of sets GRACE handle overflow by ‘tuning’ Begin with small partitions Combine them when size is known 11/12/2018 EECS 584, Fall 2009

Other Approach to Prevent O/F Hash function with random property Uniformly distribute data Store statistics about distribution System catalog in previous paper Use of histogram These are Not Enough! 11/12/2018 EECS 584, Fall 2009

On Disk Overflow Partition is too large to fit in memory Reprocess the partition into smaller ones One that exactly fit in memory, plus add smaller one to another partition Relation S should use same function too 11/12/2018 EECS 584, Fall 2009

In Memory Overflow Currently hashed subset is too large Reassign some buckets onto disk Use different hash function Relation S should use same function too 11/12/2018 EECS 584, Fall 2009

Third Comparison of Four Join Methods Partition Overflow Other Tools Disk filters Bobb array Semijoin Memory Management Strategy 11/12/2018 EECS 584, Fall 2009

Database Filter Mechanism to process records as they come off disk Use per-track wired logic circuits [2] Company Netezza use similar approach with FPGAs [5] “The Netezza architecture is based on … to filter out extraneous data as early in the data stream as possible and as fast as data streams off the disk.” 11/12/2018 EECS 584, Fall 2009

Bobb Array Researched as an extension to CAFS Content Addressable File Store (CAFS), company product More on Wikipedia : ) Disk-head-attached logic device [3] Highly dependent on layout of data 11/12/2018 EECS 584, Fall 2009

Semijoin How to calculate Construct Join and , which is semijoin of and Join semijoin to Good in distributed environment and with low selectivity of [4] 11/12/2018 EECS 584, Fall 2009

Fourth Comparison of Four Join Methods Partition Overflow Other Tools Memory Management Strategy Hot set + VM Why bad utilization 11/12/2018 EECS 584, Fall 2009

Challenges in Memory Mgmt No fixed amount of memory OS needs to consider future processes What if several processes request 11/12/2018 EECS 584, Fall 2009

Hot Set + Virtual Memory All requests in virtual memory Compete for real memory Hot set wired down to real memory Hot set size is estimated by the access planner 11/12/2018 EECS 584, Fall 2009

Good Case If Only two sets and are constructed Read back in opposite order All blocks in real memory are processed before paged out 11/12/2018 EECS 584, Fall 2009

Bad Case If Only blocks are read directly is ratio of ‘in memory’ See next slide for calculation 11/12/2018 EECS 584, Fall 2009

Bad Case C1, C2, C3 in C C1 were on disk, and processed C2 were in memory, and processed C3 were in memory, and not processed 11/12/2018 EECS 584, Fall 2009

Thank You 11/12/2018 EECS 584, Fall 2009