CS561 - XJoin1 XJoin: A Reactively-Scheduled Pipelined Join Operator IEEE Bulletin, 2000 by Tolga Urhan and Michael J. Franklin.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Maintaining Sliding Widow Skylines on Data Streams.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Join Processing in Databases Systems with Large Main Memories
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin University of Maryland Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony.
Unary Query Processing Operators CS 186, Spring 2006 Background for Homework 2.
XJoin: Getting Fast Answers From Slow and Bursty Networks T. Urhan M. J. Franklin IACS, CSD, University of Maryland Presented by: Abdelmounaam Rezgui CS-TR-3994.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CSCE Database Systems Chapter 15: Query Execution 1.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
CS 540 Database Management Systems
Cost-based Query Scrambling for Initial Delays Tolga Urhan Michael J. Franklin Laurent Amsaleg.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
CS 540 Database Management Systems
CS 440 Database Management Systems
Joining Punctuated Streams
Database Query Execution
Evaluating Window Joins over Punctuated Streams
Presentation transcript:

CS561 - XJoin1 XJoin: A Reactively-Scheduled Pipelined Join Operator IEEE Bulletin, 2000 by Tolga Urhan and Michael J. Franklin

CS561 - XJoin 2 Goal of XJoin Efficiently evaluate equi-join in online query processing over distributed data sources Optimization objectives:  Having small memory footprint  Fast initial result delivery  Hiding intermittent delays in data arrival

CS561 - XJoin 3 Outline Hash Join History Motivation of XJoin Challenges in Developing XJoin Three Stages of XJoin Preventing Duplicates Experimental Results Conclusion

CS561 - XJoin 4 Classic Hash Join key2R tuples key1 R tuples key3R tuples key4R tuples Key5R tuples 1. Build S tuple 1 S tuple 2 S tuple 3 S tuple 4 S tuple 5 2. Probe 2-phase: build and probe Only one table is hashed in memory

CS561 - XJoin 5 Hybrid Hash Join One table is hashed both to disk and memory (partitions) G. Graefe, “Query Evaluation Techniques for Large Databases”. ACM Disk Bucket i Bucket i+1 Bucket i+2 Bucket … Bucket j-1 Bucket j R tuples Bucket n Bucket n+1 Bucket n+2 Bucket … Bucket m-1 Bucket m R tuples MemoryS tuple 1 S tuple 2 S tuple 3 S tuple 4 S tuple …

CS561 - XJoin 6 Symmetric Hash Join (Pipelined) Both tables are hashed (both kept in main memory only) A. Wilschut, P. M.G. Apers, “Dataflow Query Execution in a Parallel Main-Memory Environment”, DPD Source R OUTPUT Source S Key n Key n+1 Key n+2 Key … Key m-1 Key m R tuples BUILD PROBE R tuple S tuple Key i Key i+1 Key i+2 Key … Key j-1 Key j S tuples BUILD PROBE R tuple S tuple

CS561 - XJoin 7 Problems of SHJ: Rather memory intensive  Won’t work for large input streams.  Won’t allow for many joins to be processed in a pipeline (or even in parallel).

CS561 - XJoin 8 New Problems in Online Query Processing over Distributed Data Sources Unpredictable data access due to link congestion, load balances, etc. Three classes of delays  Initial Delay: first tuple arrives from remote source more slowly than usual  Slow Delivery: data arrives at a constant, but slower than expected rate  Bursty Arrival: data arrives in a fluctuating manner

CS561 - XJoin 9 Question: Why are delays undesirable?  Prolongs the time for first output  Slows the processing if wait for data to first be there before acting  If too fast, you want to avoid loosing any data  Waste time if you sit idle while no data is coming  Unpredictable, one single strategy won’t work

CS561 - XJoin 10 Motivation of XJoin Produce results incrementally when available  Tuples returned as soon as produced Exploit available main memory as long as possible  Favor main-memory join when possible Allow progress to be made when one or more sources experience delays by:  Background processing performed on previously received tuples so results are produced even when both inputs are stalled

CS561 - XJoin 11 XJoin Design Tuples are stored in partitions (Hash Join):  A memory-resident (m-r) portion  A disk-resident (d-r) portion

CS561 - XJoin 12 Memory-resident partitions of source B Tuple B hash(Tuple B) = n SOURCE-BSOURCE-A D I S K M E M O R Y 1... n n 1 Memory-resident partitions of source A n 1 Disk-resident partitions of source A... n Disk-resident partitions of source B... 1 n k k flush Tuple A hash(Tuple A) = 1

CS561 - XJoin 13 Challenges in Developing XJoin Manage flow of tuples between memory and secondary storage (when and how to do it) Control background processing when inputs are delayed (reactive scheduling idea) Provide both quick initial result as well as good overall throughput Ensure the full answer is produced Ensure duplicate tuples are not produced

CS561 - XJoin 14 XJoin Stages XJoin proceeds in 3 stages (separate threads) M : M M : D D : D

CS561 - XJoin 15 M E M O R Y Partitions of source B i j SOURCE-B hash(record B) = j Tuple B SOURCE-A Tuple A hash(record A) = i i j Partitions of source A Output Insert Probe Insert Probe 1 st Stage: Memory-to-Memory Join

CS561 - XJoin 16 1 st Stage: Memory-to-Memory Join Join processing continues as long as:  Memory permits, and  One of the inputs is producing tuples If memory is full, one partition is picked to be flushed to disk and appended to end of disk- resident portion If no new input, then stage 1 is blocked and stage 2 starts

CS561 - XJoin 17 Why Stage 1? In-memory operations are much faster and cheaper than on-disk operations Thus this guarantees that results are produced as soon as possible.

CS561 - XJoin 18 Question:  What does the 2 nd Stage do?  When does the 2 nd Stage start?  Hint: What occurs when data input (tuples) are too large for memory?  Answer: The 2 nd Stage joins Memory-to-Disk Occurs when both inputs are blocking

CS561 - XJoin 19 Output i i M E M O R Y Partitions of source BPartitions of source A D I S K Partitions of source BPartitions of source A i i..... DP iA MP iB Stage 2

CS561 - XJoin 20 2 nd Stage: Memory-to-Disk Join Activated when 1 st Stage is blocked Performs 3 steps: 1. Choose partition according to throughput and size of partition from one source 2. Use tuples from d-r portion to probe m-r portion of other source and output matches, until d-r completely processed 3. Check if either input resumed producing tuples. If yes, resume 1 st Stage. If no, choose another d-r portion and continue 2 nd Stage.

CS561 - XJoin 21 Controlling 2 nd Stage Cost of 2 nd Stage is hidden when both inputs experience delays Tradeoffs ? What are the benefits of using second stage?  Produces results when input sources are stalled  Allows varying input rates What is the disadvantage?  The second stage must complete a d-r portion before checking for new input (overhead) To address tradeoff, use an activation threshold:  Pick a partition likely to produce many tuples right now

CS561 - XJoin 22 3 rd Stage: Disk-to-Disk Join Clean-up stage  Assume that all data for both inputs has arrived  Assume that 1 st and 2 nd stage have completed Why is this step necessary?  Completeness of answer: make sure that all result tuples are being produced.  Reason: some tuples in disk-resident portions may not have had chance to join each other.

CS561 - XJoin 23 Preventing Duplicates When could duplicates be produced?  Duplicates could be produced in both 2 nd and 3 rd stages which may perform overlapping work. How to address it?  XJoin prevents duplicates with timestamps. When address this?  During processing when trying to join two tuples.

CS561 - XJoin 24 Time Stamping : Part 1 2 fields are added to each tuple:  Arrival TimeStamp (ATS) Indicates time when tuple first arrived in memory  Departure TimeStamp (DTS) Indicates time when tuple was flushed to disk  [ATS, DTS] indicates when tuple was in memory When did two tuples get joined in 1 st state?  If Tuple A’s DTS is within Tuple B’s [ATS, DTS] Tuples that meet this overlap condition are not considered for joining at 2 nd or 3 rd stage

CS561 - XJoin 25 Tuple B Tuples joined in first stage B1 arrived after A and before A was flushed to disk Tuple A DTSATS Tuple B Tuples not joined in first stage B2 arrived after A and after A was flushed to disk Tuple A DTSATS Non-Overlapping Detecting Tuples Joined in 1 st Stage Overlapping

CS561 - XJoin 26 Time Stamping : Part 2 For each partition, keep track of :  ProbeTS: time when a 2 nd stage probe was done  DTS last : the DTS of last tuple of disk-resident portion Several such probes may occur  Keep an ordered history of such probe descriptors Meaning :  All tuples before and including at time DTS last were joined in stage 2 with all tuples in main memory at time ProbeTS

CS561 - XJoin 27 Detecting Tuples Joined in 2 nd stage All A tuples in Partition 2 up to DTSlast 350, were joined with m-r tuples that arrived before Partition 2’s ProbeTS Tuple A Tuple B ATSDTS ATSDTS overlap DTS last ProbeTS History list for corresponding partition. Partition 2

CS561 - XJoin 28 Experiments HHJ (Hybrid Hash Join) XJoin (with 2 nd stage and with caching) XJoin (without 2 nd stage) XJoin (with aggressive usage of 2 nd stage)

CS561 - XJoin 29 Case 1: Slow Network Both Sources Are Slow

CS561 - XJoin 30 Case 1: Slow Network Both Sources Are Slow (Bursty) XJoin improves delivery time of initial answers -> interactive performance The reactive background processing is an effective solution to exploit intermittent delays to keep continued output rates Shows that 2 nd stage is very useful if there is time for it

CS561 - XJoin 31 Case 2: Fast Network Both Sources Are Fast

CS561 - XJoin 32 Case 2: Fast Network Both Sources Are Fast All XJoin variants deliver initial results earlier. XJoin also can deliver the overall result in equal time to HHJ HHJ delivers the 2nd half of the result faster than XJoin. 2 nd stage cannot be used too aggressively if new data is coming in continuously

CS561 - XJoin 33 Conclusion Can be conservative on space (small footprint) Can produce initial result as early as possible Can hide intermittent data delays Can be used in conjunction with online query processing to manage data streams (limited)

CS561 - XJoin 34 How to Further Optimize XJoin? Resuming Stage 1 as soon as data arrives Removing no-longer-joining tuples in timely manner Other ideas ? …

CS561 - XJoin 35 References Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.” Urhan, Tolga and Franklin, Michael J. “XJoin: A Reactively- Scheduled Pipelined Join Operator.” Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.” Babu and Wisdom, Jennifer. “Continuous Queries Over Data Streams”.

CS561 - XJoin 36 Stream: New Query Context Challenges faced by XJoin  P otentially unbounded growing join state  Indefinite delay of some join results Solutions  Exploit semantic constraints to remove no-longer- joining data in timely manner  Constraints: sliding window punctuations

CS561 - XJoin 37 Punctuation Punctuation is predicate on stream elements that evaluates to false for every element following the punctuation Edward Justin Janet18 **(0, 18] no more tuples for students whose age are less than or equal to 18! IDNameAge Anna20 …

CS561 - XJoin 38 An Example Open Stream Group-by item_id (sum(…) ) Open Stream item_id | seller_id | open_price | timestamp 1080 | jsmith | | Nov :03: | melissa | | Nov :10:00 … item_id | bidder_id | bid_price | timestamp 1080 | pclover | | Nov :27: | smartguy | | Nov :30: | richman | | Nov :52:00 … Bid Stream Query: For each item that has at least one bid, return its bid-increase value. Select O.item_id, Sum (B.bid_price - O.open_price) From Open O, Bid B Where O.item_id = B.item_id Group by O.item_id Bid Stream Join item_id Out 1 (item_id) Out 2 (item_id, sum) No more bids for item 1080!

CS561 - XJoin 39 PJoin Execution Logic Hash Table Join State (Disk-Resident Portion) Join State (Memory-Resident Portion) … … Hash Table … State of Stream A (S a ) State of Stream B (S b ) Stream A Stream B 3 Hash(t a ) = 1 Tuple t a 3 3 Purge Cand. Pool 3 Hash Table … <10 Punct. Set (PS b )Punct. Set (PS a )

CS561 - XJoin 40 PJoin Execution Logic Hash Table Join State (Disk-Resident Portion) Join State (Memory-Resident Portion) … … Hash Table … State of Stream A (S a ) State of Stream B (S b ) Stream A Stream B 3 Hash(p a ) = 1 Punctuation p a Purge Cand. Pool 3 Hash Table … <10 Punct. Set (PS b )Punct. Set (PS a )

CS561 - XJoin 41 PJoin vs. XJoin: Memory Overhead Tuple inter-arrival: 2 milliseconds Punctuation inter-arrival: 40 tuples/punctuation

CS561 - XJoin 42 PJoin vs. XJoin: Tuple Output Rate Tuple inter-arrival: 2 milliseconds Punctuation inter-arrival: 30 tuples/punctuation

CS561 - XJoin 43 Conclusion Memory requirement for PJoin state almost insignificant compared to XJoin’s. Increase in join state of XJoin leading to increasing probe cost, thus affecting tuple output rate. Eager purge is best strategy for minimizing join state. Lazy purge with appropriate purge threshold provides significant advantage in increasing tuple output rate.