1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

TRIPS Primary Memory System Simha Sethumadhavan 1.
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Join Processing in Databases Systems with Large Main Memories
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin University of Maryland Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
XJoin: Getting Fast Answers From Slow and Bursty Networks T. Urhan M. J. Franklin IACS, CSD, University of Maryland Presented by: Abdelmounaam Rezgui CS-TR-3994.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CS561 - XJoin1 XJoin: A Reactively-Scheduled Pipelined Join Operator IEEE Bulletin, 2000 by Tolga Urhan and Michael J. Franklin.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
CHAPTER 2: COMPUTER-SYSTEM STRUCTURES Computer system operation Computer system operation I/O structure I/O structure Storage structure Storage structure.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Optimized Transaction Time Versioning Inside a Database Engine Intern: Feifei Li, Boston University Mentor: David Lomet, MSR.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
File Systems cs550 Operating Systems David Monismith.
CSCE Database Systems Chapter 15: Query Execution 1.
CS6502 Operating Systems - Dr. J. Garrido Memory Management – Part 1 Class Will Start Momentarily… Lecture 8b CS6502 Operating Systems Dr. Jose M. Garrido.
ECE 4110 – Internetwork Programming
Query Processing CS 405G Introduction to Database Systems.
Adaptive Ordering of Pipelined Stream Filters Babu, Motwani, Munagala, Nishizawa, and Widom SIGMOD 2004 Jun 13-18, 2004 presented by Joshua Lee Mingzhu.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Cost-based Query Scrambling for Initial Delays Tolga Urhan Michael J. Franklin Laurent Amsaleg.
Chapter 5 Record Storage and Primary File Organizations
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
Chapter 12: Query Processing
/ Computer Architecture and Design
External Memory Hashing
Database Implementation Issues
Database Query Execution
(A Research Proposal for Optimizing DBMS on CMP)
DATABASE IMPLEMENTATION ISSUES
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Database Implementation Issues
Database Implementation Issues
Presentation transcript:

1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena Razzaq

2 Motivation Data delivery issues in terms of: – unpredictable delay from some remote data sources – wide-area network with possibly communication links, congestion, failures, and overload Goal: – Not just overall query processing time matters – Also when initial data is delivered – Overall throughput and rate throughout query process

3 Overview Hash Join History 3 Classes of Delays Motivation of XJoin Challenges of Developing XJoin Three Stages of XJoin Handling Duplicates Experimental Results

4 Hash Join Only one table is hashed key2R tuples key1 R tuples key3R tuples key4R tuples Key5…R tuples… S tuple 1 S tuple 2 S tuple 3 S tuple 4 S tuple 5…. 1. BUILD 2. Probe

5 Hybrid Hash Join One table is hashed both to disk and memory (partitions) G. Graefe. “Query Evaluation Techniques for Large Databases”, ACM1993. Disk Bucket i Bucket i+1 Bucket i+2 Bucket … Bucket j-1 Bucket j R tuples Bucket n Bucket n+1 Bucket n+2 Bucket … Bucket m-1 Bucket m R tuples MemoryS tuple 1 S tuple …

6 Symmetric Hash Join (Pipeline) Both tables are hashed (both kept in main memory only) Z. Ives, A. Levy, “An Adaptive Query Execution”, VLDB 99 Source R OUTPUT Source S Key n Key n+1 Key n+2 Key … Key m-1 Key m R tuples BUILD PROBE R tuple S tuple Key i Key i+1 Key i+2 Key … Key j-1 Key j S tuples BUILD PROBE R tuple S tuple

7 Problem of SHJ: Memory intensive : – Won’t work for large input streams – Wont’ allow for many joins to be processed in a pipeline (or even in parallel)

8 New Problems: Three Delays – Initial Delay First tuple arrives from remote source more slowly than usual (still want initial answer out quickly) – Slow Delivery Data arrives at a constant, but slower than expected rate (at the end, still overall good throughput behavior) – Bursty Arrival Data arrives in a fluctuating manner (how to avoid sitting idle in periods of low input stream rates)

9 Question: Why are delays undesirable? – Prolongs the time for first output – Slows the processing if wait for data to first be there before acting – If too fast, you want to avoid loosing any data – Waste of time if you sit idle while no data is incoming – Unpredictable, one single strategy won’t work

10 Challenges for XJoin Manage flow of tuples between memory and secondary storage (when and how to do it) Control background processing when inputs are delayed (reactive scheduling idea) Ensure the full answer is produced Ensure duplicate tuples are not produced Both quick initial output as well as good overall throughput

11 Motivation of XJoin Produces results incrementally when available – Tuples returned as soon as produced – Good for online processing Allows progress to be made when one or more sources experience delays by: – Background processing performed on previously received tuples so results are produced even when both inputs are stalled

12 Stages (in different threads) M :M M :D D:D

13 Tuple B hash(Tuple B) = n SOURCE-B Memory-resident partitions of source B SOURCE-A D I S K M E M O R Y 1... n n 1 Memory-resident partitions of source A n 1 Disk-resident partitions of source A... n Disk-resident partitions of source B... 1 n k k flush Tuple A hash(Tuple A) = 1 XJoin

14 1 st Stage of XJoin Memory - to - Memory Join Tuples are stored in partitions: – A memory-resident (m-r) portion – A disk-resident (d-r) portion Join processing continues as usual: – If space permits, M to M – If memory full, then pick one partition as victim, flush to disk and append to end of disk partition 1 st Stage runs as long as one of the inputs is producing tuples If no new input, then block stage1 and start stage 2

15 M E M O R Y Partitions of source B i j SOURCE-B hash(record B) = j Tuple B SOURCE-A Tuple A hash(record A) = i i j Partitions of source A Output Insert Probe Insert Probe 1 st Stage Memory-to-Memory Join

16 Why Stage 1? Use Memory as it is the fastest whenever possible Use any new coming data as it’s already in memory Don’t stop to go and grab stuff out of disk for new data joins

17 Question: – What does Second Stage do? – When does the Second Stage start? – Hints: Xjoin proposes a memory management technique What occurs when data input (tuples) are too large for memory? – Answer: Second Stage joins Mem-to-Disk Occurs when both the inputs are blocking

18 2 nd Stage of XJoin Activated when 1 st Stage is blocked Performs 3 steps: 1. Chooses the partition according to throughput and size of partition from one source 2. Uses tuples from d-r portion to probe m-r portion of other source and outputs matches, till d-r completely processed 3. Checks if either input resumed producing tuples. If yes, resume 1 st Stage. If no, choose another d-r portion and continue 2 nd Stage.

19 Output i i M E M O R Y Partitions of source BPartitions of source A D I S K Partitions of source BPartitions of source A i i..... DP iA MP iB Stage 2: Disk-to-memory Joins

20 Controlling 2 nd Stage Cost of 2 nd Stage is hidden when both inputs experience delays Tradeoff ? What are the benefits of using the second stage? – Produce results when input sources are stalled – Allows variable input rates What is the disadvantage? – The second stage must complete a d-r portion before checking for new input (overhead) To address the tradeoff, use an activation threshold: – Pick a partition likely to produce many tuples right now

21 3 rd Stage of XJoin Disk-to-Disk Join Clean-up stage – Assumes that all data for both inputs has arrived – Assumes that first and second stage completed – Makes sure that all tuples belonging in the result are being produced. Why is this step necessary? – Completeness of answer

22 Handling Duplicates When could duplicates be produced? Duplicates could be produced in all 3 stages as multiple stages may perform overlapping work. How address it: – XJoin prevents duplicates with timestamps. When address this: – During processing as continuous output

23 Time Stamping : part 1 2 fields are added to each tuple: – Arrival TimeStamp (ATS) indicates when the tuple arrived first in memory – Departure TimeStamp (DTS) used to indicated time the tuple was flushed to disk [ATS, DTS] indicates when tuple was in memory When did two tuples get joined? – If Tuple A’s DTS is within Tuple B’s [ATS, DTS] Tuples that meet this overlap condition are not considered for joining by the 2 nd or 3 rd stages

24 Tuple B Tuples joined in first stage B1 arrived after A, and before A was flushed to disk Tuple A DTSATS Tuple B Tuples not joined in first stage B2 arrived after A, and after A was flushed to disk Tuple A DTSATS Non-Overlapping Detecting tuples joined in 1st stage Overlapping

25 Time Stamping : part 2 For each partition, keep track off: –ProbeTS: time when a 2 nd stage probe was done –DTSlast: the latest DTS time of all the tuples that were available on disk at that time Several such probes may occur: –Thus keep an ordered history of such probe descriptors Usage: –All tuples before and including at time DTSlast were joined in stage 2 with all tuples in main memory (ATS,DTS) at time ProbeTS

26 Tuple A DTS ATS Tuple B DTS ATS Detecting tuples joined in 2nd stage ProbeTS DTS last Partition 1 Overlap Partition Partition 3 History list for the corresponding partitions Partition Partition 2 All tuples before and including DTSlast were joined in Stage 2 At time ProbeTS All A tuples in Partition 2 up to DTSlast 250, Were joined with m-r tuples that arrived before Partition 2’s ProbeTS.

27 Experiments HHJ (Hybrid Hash Join) Xjoin (with 2 nd stage and with caching) Xjoin (without 2 nd stage) Xjoin (with aggressive usage of 2 nd stage)

28 Case 1: Slow Network Both sources are slow (bursty) XJoin improves delivery time of initial answers -> interactive performance The reactive background processing is an effective solution to exploit intermittant delays to keep continued output rates. Shows that 2 nd stage is very useful if there is time for it

29 Slow Network: both resources are slow

30 Case 2: Fast Network Both sources are fast All XJoin variants deliver initial results earlier. XJoin also can deliver the overall result in equal time to HHJ HHJ delivers the 2nd half of the result faster than XJoin. 2 nd stage cannot be used too aggressively if new data is coming in continuously

31 Case 2: Fast Network Both sources are fast

32 Conclusion Can be conservative on space (small footprint) Can be used in conjunction with online query processing to manage the streams Resuming Stage 1 as soon as data arrives Dynamically choosing techniques for producing results

33 References Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.” Urhan, Tolga, Franklin, Michael J. “XJoin: A Reactively- Scheduled Pipelined Join Operator.” Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.” Babu and Wisdom, Jennefer. “Continuous Queries Over Data Streams”.