XJoin: Getting Fast Answers From Slow and Bursty Networks T. Urhan M. J. Franklin IACS, CSD, University of Maryland Presented by: Abdelmounaam Rezgui CS-TR-3994.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
1 Lecture 23: Query Execution Friday, March 4, 2005.
Join Processing in Databases Systems with Large Main Memories
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
1 Implementation of Relational Operations Module 5, Lecture 1.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin University of Maryland Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony.
Adaptive Query Processing for Wide-Area Distributed Data Michael Franklin UC Berkeley Joint work with Tolga Urhan, Laurent Amsaleg, and Anthony Tomasic.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CS561 - XJoin1 XJoin: A Reactively-Scheduled Pipelined Join Operator IEEE Bulletin, 2000 by Tolga Urhan and Michael J. Franklin.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.
1 Implementation of Relational Operations: Joins.
Early Hash Join: A Configurable Algorithm for the Efficient and Early Production of Join Results Ramon Lawrence University of Iowa
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
RELATIONAL JOIN Advanced Data Structures. Equality Joins With One Join Column External Sorting 2 SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
1 Database Systems ( 資料庫系統 ) December 7, 2011 Lecture #11.
Multi-Way Hash Join Effectiveness M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence 2.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Lecture 3 - Query Processing (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Database Query Execution
Selected Topics: External Sorting, Join Algorithms, …
Implementation of Relational Operations
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Overview of Query Evaluation: JOINS
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Wednesday, 5/8/2002 Hash table indexes, physical operators
Lecture 24: Query Execution
CS222P: Principles of Data Management UCI, Fall 2018 Notes #11 Join!
Virtual Memory 1 1.
Presentation transcript:

XJoin: Getting Fast Answers From Slow and Bursty Networks T. Urhan M. J. Franklin IACS, CSD, University of Maryland Presented by: Abdelmounaam Rezgui CS-TR-3994

The Problem How to improve the interactive performance of queries over widely distributed data sources ? 2

RS Tuples 3 The Problem Source BSource A

Why is the response-time unpredictable ? Remote sources Intermediate sites Communication links Overloading Congestion Failures are vulnerable to  { 4  Significant and unpredictable delays  Unresponsive and unusable systems

Different classes of delays Initial delay: a longer than expected wait to receive the first tuple. Slow delivery: data arrive at a fairly constant but slower than expected rate. Bursty arrival: bursts of data followed by long periods of no arrivals. 5

Some Join variants Nested Loops Join Block Nested Loops Join Index Nested Loops Join Sort-Merge Join Classic Hash Join Simple Hash Join Grace Hash Join Hybrid Hash Join (HHJ) TID Hash Join Symmetric Hash Join (SHJ) XJoin 6

Query Scrambling reacts to data delivery pbs. by on-the-fly rescheduling of query operators and restructuring of the query execution plan. 7 improve the response time for the entire query may slow down the return of some initial results To be presented on November 22, 1999

Traditional query processing techniques Reduce the memory requirements Reduce Disk I/O Delivery of the entire query result (on-line users would like to receive initial results asap.) Slow and bursty delivery of data from remote sources can stall query execution. 8

XJoin: Fundamental principles improves the interactive performance by producing results incrementally (as they become available) allows progress to be made even when one or more sources experience delays (delays are exploited to produce more tuples earlier) 9

XJoin : The key idea When inputs are delayed  run a background processing on the previously received results 10

Managing the flow of tuples between memory and secondary storage. Controlling the background processing. Full answer (all the tuples are produced). No duplicate tuples are generated. XJoin : The challenges 11

SHJoin (Symmetric Hash Join) Hash table 2 Matching Hash table 1 Source 2 Source 1 12

SHJoin requires: 13 Hash tables for both of its inputs be memory resident.  Unacceptable for complex queries.

XJoin 14 Partioning: each input is partitioned into a number of partitions based on a hash function. each partition i of source A, P iA : P iA = MP iA  DP iA MP iA  DP iA = 

D I S K Tuple B hash(Tuple B) = n SOURCE-B Memory-resident partitions of source B... k1 n flush Disk-resident partitions of source B... Disk-resident partitions of source A Memory-resident partitions of source A SOURCE-A M E M O R Y... n 1n1kn 15 Tuple A hash(Tuple A) = 1

hash(record B) = j Partitions of source B ii M E M O R Y j 16 Stage 1: Memory-to-memory Joins Partitions of source A j SOURCE-B Tuple B SOURCE-A Tuple A hash(record A) = i insert probe Output

Partitions of source BPartitions of source A M E M O R Y i i i D I S K i Output 17 Stage 2: Disk-to-memory Joins Partitions of source BPartitions of source A..... DP iA MP iB

18 Stage 3: Clean-up Stage 1 fails to join tuples that were not in the memory at the same time. Stage 2 fails to join two tuples if one of them is not in the memory when the other is brought from the disk. Stage 3 joins all the partitions (memory- resident and disk-resident portions) of the two sources.

19 Handling duplicates Timestamps Tuple X ATSDTS Example Tuple X99235 Counter 51

20 Detecting tuples joined in the 1st stage Tuple A Tuple B Tuples joined in the first stage DTSATS Overlapping Tuple A Tuple B Tuples not joined in the first stage DTSATS Non-Overlapping

21 Detecting tuples joined in the 2nd stage Tuple A DTS ATS ProbeTS DTS last Tuple B DTS ATS Overlap History list for the corresponding partitions

22 Optimization 1: Adding a cache Stage 2 joins DP iA and MP iB Tuples of DP iA are discarded after use. The idea: retain some tuples of DP iA (cached)  Could be used by a subsequent run of stage 2 joining DP iB and MP iA

23 i... i i i i CACHE Partitions of Source B Partitions of Source A i... i i i i CACHE Partitions of Source B Partitions of Source A MEMORY DISK probe insert Output Partitions of Source B Partitions of Source A Second run of stage 2First run of stage 2 probe

24 Optimization 2: Controlling Stage 2 Overhead incured by Stage 2 is hidden only when both inputs experience delays  Reduce the aggressiveness of Stage 2  Dynamic activation threshold (e. g., )

Experiment Environment 25 PREDATOR, an Object-Relational DBMS Xjoin operator added. Query optimizer extended to: account for XJoin. provide some of the statistics and calculations required by XJoin.

Arrival Patterns 2 have been chosen: Fig. 1: Bursty arrival. Avg. Rate: 23.5 KB/s Fig. 2: Fast arrival. Avg. Rate: KB/s 26

tuple Wisconsin benchmark relations. each tuple: 288 bytes Unique unclustered integer join attribute Result cardinality: Sun Ultra 5 WS: –Solaris 2.6 –128 MB of real memory –Disk space (approx.): 4 GB –Disk & Memory pages: 8 KB Storage manager buffer size: 800 KB 27

Results Experiment 1 Basic performance of XJoin Memory space allocated to the join operators: 3 MB. Input relations: 28.8 MB each Activation threshold (of stage 2): delay scenarios 28

29

Case 1: Slow Network Both sources are slow XJoin improves the delivery time of initial answers. The reactive background processing is an effective solution to exploit delays. The use of cache can further improve performance. 30

Case 2: Mixed Network Slow build/Fast probe Fast build/Slow probe XJoin variants perform better. (/Case 1) XJoins with the 2nd Stage perform better. 31

XJoin variants deliver initial results earlier. HHJ delivers the 2nd half of the result faster than XJoin-NoCache and XJoin. XJoin-No2nd delivers the last 60 % of the result faster than the other XJoin variants. 32 Case 3: Fast Network Both sources are fast

33 Experiment 2 : Controlling the 2nd stage improves inter. perf. with slow and bursty data sources. degrades the overall response-time in the case of fast/reliable sources. Fig. 7: Slow relations. Fig. 8: Fast relations.

Stage 2 should be employed less aggressively (less often). A dynamic activation threshold.  34

XJoin-Dyn aggressive in the early stages of the query. becomes less aggressive as more of the results are produced. starts with a low activation treshold (0.01) and then linearly increases it to

36 Experiment 3 : the effect of memory size Recall ! The prime motivation for designing XJoin was the huge memory requirements of the symmetric hash join. XJoin reduces the memory requirements but adds overhead (disk I/O & duplicate detection).

Size of the input relations: 8.6 MB. 3 different memory allocations: - 3 MB (neither of the inputs fit into the memory) - 10 MB (one input fits into the memory) - 20 MB (both inputs fit into the memory) Fig. 9: Slow Network, Varying memory Fig. 10: Fast Network, Varying memory 37

XJoin performs better both in: - interactive performance - completion time. 38

Experiment 4 : impact of query complexity 2 to 6 relations (1 to 5 joins) 3 MB to each join operator Fig. 11. Tuple production rates of XJoin and HHJ (secs) - Slow Network 39

Experiment 4 : impact of query complexity Fig. 12. Tuple production rates of XJoin and HHJ (secs) - Fast Network 40 XJoin delivers the initial results faster

XJoin An effective query processing technique for providing fast query responses to users in the presence of slow and bursty remote sources. 41 Conclusions

lowers the memory requirements (partitioning) improves the interactive performance. reacts to delays and takes advantage of silent periods to produce more tuples faster. 42

What de you think about PJoin A Multithreaded Parallel XJoin Using the Cilk Language ? 43 Perspectives