CS561 - XJoin1 XJoin: A Reactively-Scheduled Pipelined Join Operator IEEE Bulletin, 2000 by Tolga Urhan and Michael J. Franklin
CS561 - XJoin 2 Goal of XJoin Efficiently evaluate equi-join in online query processing over distributed data sources Optimization objectives: Having small memory footprint Fast initial result delivery Hiding intermittent delays in data arrival
CS561 - XJoin 3 Outline Hash Join History Motivation of XJoin Challenges in Developing XJoin Three Stages of XJoin Preventing Duplicates Experimental Results Conclusion
CS561 - XJoin 4 Classic Hash Join key2R tuples key1 R tuples key3R tuples key4R tuples Key5R tuples 1. Build S tuple 1 S tuple 2 S tuple 3 S tuple 4 S tuple 5 2. Probe 2-phase: build and probe Only one table is hashed in memory
CS561 - XJoin 5 Hybrid Hash Join One table is hashed both to disk and memory (partitions) G. Graefe, “Query Evaluation Techniques for Large Databases”. ACM Disk Bucket i Bucket i+1 Bucket i+2 Bucket … Bucket j-1 Bucket j R tuples Bucket n Bucket n+1 Bucket n+2 Bucket … Bucket m-1 Bucket m R tuples MemoryS tuple 1 S tuple 2 S tuple 3 S tuple 4 S tuple …
CS561 - XJoin 6 Symmetric Hash Join (Pipelined) Both tables are hashed (both kept in main memory only) A. Wilschut, P. M.G. Apers, “Dataflow Query Execution in a Parallel Main-Memory Environment”, DPD Source R OUTPUT Source S Key n Key n+1 Key n+2 Key … Key m-1 Key m R tuples BUILD PROBE R tuple S tuple Key i Key i+1 Key i+2 Key … Key j-1 Key j S tuples BUILD PROBE R tuple S tuple
CS561 - XJoin 7 Problems of SHJ: Rather memory intensive Won’t work for large input streams. Won’t allow for many joins to be processed in a pipeline (or even in parallel).
CS561 - XJoin 8 New Problems in Online Query Processing over Distributed Data Sources Unpredictable data access due to link congestion, load balances, etc. Three classes of delays Initial Delay: first tuple arrives from remote source more slowly than usual Slow Delivery: data arrives at a constant, but slower than expected rate Bursty Arrival: data arrives in a fluctuating manner
CS561 - XJoin 9 Question: Why are delays undesirable? Prolongs the time for first output Slows the processing if wait for data to first be there before acting If too fast, you want to avoid loosing any data Waste time if you sit idle while no data is coming Unpredictable, one single strategy won’t work
CS561 - XJoin 10 Motivation of XJoin Produce results incrementally when available Tuples returned as soon as produced Exploit available main memory as long as possible Favor main-memory join when possible Allow progress to be made when one or more sources experience delays by: Background processing performed on previously received tuples so results are produced even when both inputs are stalled
CS561 - XJoin 11 XJoin Design Tuples are stored in partitions (Hash Join): A memory-resident (m-r) portion A disk-resident (d-r) portion
CS561 - XJoin 12 Memory-resident partitions of source B Tuple B hash(Tuple B) = n SOURCE-BSOURCE-A D I S K M E M O R Y 1... n n 1 Memory-resident partitions of source A n 1 Disk-resident partitions of source A... n Disk-resident partitions of source B... 1 n k k flush Tuple A hash(Tuple A) = 1
CS561 - XJoin 13 Challenges in Developing XJoin Manage flow of tuples between memory and secondary storage (when and how to do it) Control background processing when inputs are delayed (reactive scheduling idea) Provide both quick initial result as well as good overall throughput Ensure the full answer is produced Ensure duplicate tuples are not produced
CS561 - XJoin 14 XJoin Stages XJoin proceeds in 3 stages (separate threads) M : M M : D D : D
CS561 - XJoin 15 M E M O R Y Partitions of source B i j SOURCE-B hash(record B) = j Tuple B SOURCE-A Tuple A hash(record A) = i i j Partitions of source A Output Insert Probe Insert Probe 1 st Stage: Memory-to-Memory Join
CS561 - XJoin 16 1 st Stage: Memory-to-Memory Join Join processing continues as long as: Memory permits, and One of the inputs is producing tuples If memory is full, one partition is picked to be flushed to disk and appended to end of disk- resident portion If no new input, then stage 1 is blocked and stage 2 starts
CS561 - XJoin 17 Why Stage 1? In-memory operations are much faster and cheaper than on-disk operations Thus this guarantees that results are produced as soon as possible.
CS561 - XJoin 18 Question: What does the 2 nd Stage do? When does the 2 nd Stage start? Hint: What occurs when data input (tuples) are too large for memory? Answer: The 2 nd Stage joins Memory-to-Disk Occurs when both inputs are blocking
CS561 - XJoin 19 Output i i M E M O R Y Partitions of source BPartitions of source A D I S K Partitions of source BPartitions of source A i i..... DP iA MP iB Stage 2
CS561 - XJoin 20 2 nd Stage: Memory-to-Disk Join Activated when 1 st Stage is blocked Performs 3 steps: 1. Choose partition according to throughput and size of partition from one source 2. Use tuples from d-r portion to probe m-r portion of other source and output matches, until d-r completely processed 3. Check if either input resumed producing tuples. If yes, resume 1 st Stage. If no, choose another d-r portion and continue 2 nd Stage.
CS561 - XJoin 21 Controlling 2 nd Stage Cost of 2 nd Stage is hidden when both inputs experience delays Tradeoffs ? What are the benefits of using second stage? Produces results when input sources are stalled Allows varying input rates What is the disadvantage? The second stage must complete a d-r portion before checking for new input (overhead) To address tradeoff, use an activation threshold: Pick a partition likely to produce many tuples right now
CS561 - XJoin 22 3 rd Stage: Disk-to-Disk Join Clean-up stage Assume that all data for both inputs has arrived Assume that 1 st and 2 nd stage have completed Why is this step necessary? Completeness of answer: make sure that all result tuples are being produced. Reason: some tuples in disk-resident portions may not have had chance to join each other.
CS561 - XJoin 23 Preventing Duplicates When could duplicates be produced? Duplicates could be produced in both 2 nd and 3 rd stages which may perform overlapping work. How to address it? XJoin prevents duplicates with timestamps. When address this? During processing when trying to join two tuples.
CS561 - XJoin 24 Time Stamping : Part 1 2 fields are added to each tuple: Arrival TimeStamp (ATS) Indicates time when tuple first arrived in memory Departure TimeStamp (DTS) Indicates time when tuple was flushed to disk [ATS, DTS] indicates when tuple was in memory When did two tuples get joined in 1 st state? If Tuple A’s DTS is within Tuple B’s [ATS, DTS] Tuples that meet this overlap condition are not considered for joining at 2 nd or 3 rd stage
CS561 - XJoin 25 Tuple B Tuples joined in first stage B1 arrived after A and before A was flushed to disk Tuple A DTSATS Tuple B Tuples not joined in first stage B2 arrived after A and after A was flushed to disk Tuple A DTSATS Non-Overlapping Detecting Tuples Joined in 1 st Stage Overlapping
CS561 - XJoin 26 Time Stamping : Part 2 For each partition, keep track of : ProbeTS: time when a 2 nd stage probe was done DTS last : the DTS of last tuple of disk-resident portion Several such probes may occur Keep an ordered history of such probe descriptors Meaning : All tuples before and including at time DTS last were joined in stage 2 with all tuples in main memory at time ProbeTS
CS561 - XJoin 27 Detecting Tuples Joined in 2 nd stage All A tuples in Partition 2 up to DTSlast 350, were joined with m-r tuples that arrived before Partition 2’s ProbeTS Tuple A Tuple B ATSDTS ATSDTS overlap DTS last ProbeTS History list for corresponding partition. Partition 2
CS561 - XJoin 28 Experiments HHJ (Hybrid Hash Join) XJoin (with 2 nd stage and with caching) XJoin (without 2 nd stage) XJoin (with aggressive usage of 2 nd stage)
CS561 - XJoin 29 Case 1: Slow Network Both Sources Are Slow
CS561 - XJoin 30 Case 1: Slow Network Both Sources Are Slow (Bursty) XJoin improves delivery time of initial answers -> interactive performance The reactive background processing is an effective solution to exploit intermittent delays to keep continued output rates Shows that 2 nd stage is very useful if there is time for it
CS561 - XJoin 31 Case 2: Fast Network Both Sources Are Fast
CS561 - XJoin 32 Case 2: Fast Network Both Sources Are Fast All XJoin variants deliver initial results earlier. XJoin also can deliver the overall result in equal time to HHJ HHJ delivers the 2nd half of the result faster than XJoin. 2 nd stage cannot be used too aggressively if new data is coming in continuously
CS561 - XJoin 33 Conclusion Can be conservative on space (small footprint) Can produce initial result as early as possible Can hide intermittent data delays Can be used in conjunction with online query processing to manage data streams (limited)
CS561 - XJoin 34 How to Further Optimize XJoin? Resuming Stage 1 as soon as data arrives Removing no-longer-joining tuples in timely manner Other ideas ? …
CS561 - XJoin 35 References Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.” Urhan, Tolga and Franklin, Michael J. “XJoin: A Reactively- Scheduled Pipelined Join Operator.” Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.” Babu and Wisdom, Jennifer. “Continuous Queries Over Data Streams”.
CS561 - XJoin 36 Stream: New Query Context Challenges faced by XJoin P otentially unbounded growing join state Indefinite delay of some join results Solutions Exploit semantic constraints to remove no-longer- joining data in timely manner Constraints: sliding window punctuations
CS561 - XJoin 37 Punctuation Punctuation is predicate on stream elements that evaluates to false for every element following the punctuation Edward Justin Janet18 **(0, 18] no more tuples for students whose age are less than or equal to 18! IDNameAge Anna20 …
CS561 - XJoin 38 An Example Open Stream Group-by item_id (sum(…) ) Open Stream item_id | seller_id | open_price | timestamp 1080 | jsmith | | Nov :03: | melissa | | Nov :10:00 … item_id | bidder_id | bid_price | timestamp 1080 | pclover | | Nov :27: | smartguy | | Nov :30: | richman | | Nov :52:00 … Bid Stream Query: For each item that has at least one bid, return its bid-increase value. Select O.item_id, Sum (B.bid_price - O.open_price) From Open O, Bid B Where O.item_id = B.item_id Group by O.item_id Bid Stream Join item_id Out 1 (item_id) Out 2 (item_id, sum) No more bids for item 1080!
CS561 - XJoin 39 PJoin Execution Logic Hash Table Join State (Disk-Resident Portion) Join State (Memory-Resident Portion) … … Hash Table … State of Stream A (S a ) State of Stream B (S b ) Stream A Stream B 3 Hash(t a ) = 1 Tuple t a 3 3 Purge Cand. Pool 3 Hash Table … <10 Punct. Set (PS b )Punct. Set (PS a )
CS561 - XJoin 40 PJoin Execution Logic Hash Table Join State (Disk-Resident Portion) Join State (Memory-Resident Portion) … … Hash Table … State of Stream A (S a ) State of Stream B (S b ) Stream A Stream B 3 Hash(p a ) = 1 Punctuation p a Purge Cand. Pool 3 Hash Table … <10 Punct. Set (PS b )Punct. Set (PS a )
CS561 - XJoin 41 PJoin vs. XJoin: Memory Overhead Tuple inter-arrival: 2 milliseconds Punctuation inter-arrival: 40 tuples/punctuation
CS561 - XJoin 42 PJoin vs. XJoin: Tuple Output Rate Tuple inter-arrival: 2 milliseconds Punctuation inter-arrival: 30 tuples/punctuation
CS561 - XJoin 43 Conclusion Memory requirement for PJoin state almost insignificant compared to XJoin’s. Increase in join state of XJoin leading to increasing probe cost, thus affecting tuple output rate. Eager purge is best strategy for minimizing join state. Lazy purge with appropriate purge threshold provides significant advantage in increasing tuple output rate.