Distributed Query Processing using different Semijoin operations.

Slides:



Advertisements
Similar presentations
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Advertisements

1 Lecture 23: Query Execution Friday, March 4, 2005.
Lecture 24: Query Execution Monday, November 20, 2000.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Master Thesis Defense Jan Fiedler 04/17/98
Query optimization in distributed database systems.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
CS4432: Database Systems II Query Processing- Part 3 1.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
CS4432: Database Systems II Query Processing- Part 1 1.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008
INTRODUCTION TO COMPUTER NETWORKS
CS 540 Database Management Systems
CS 440 Database Management Systems
Parallel Databases.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Database Management Systems (CS 564)
R*: An Overview of the Architecture
Evaluation of Relational Operations: Other Operations
Private and Secure Secret Shared MapReduce
1 Demand of your DB is changing Presented By: Ashwani Kumar
Database.
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Distributed Query Processing using different Semijoin operations.
Selected Topics: External Sorting, Join Algorithms, …
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
INTRODUCTION TO COMPUTER NETWORKS
Databases.
Distributed computing deals with hardware
Distributed Databases and DBMSs: Concepts and Design
INTRODUCTION TO COMPUTER NETWORKS
INTRODUCTION TO COMPUTER NETWORKS
DISTRIBUTED DATABASES
2018, Spring Pusan National University Ki-Joune Li
Chapter 11 Database Performance Tuning and Query Optimization
INTRODUCTION TO COMPUTER NETWORKS
Distributed Database Management Systems
Lecture 22: Query Execution
Distributed Databases
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Lecture 11: B+ Trees and Query Execution
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Distributed Databases
Presentation transcript:

Distributed Query Processing using different Semijoin operations.

Presentation Outline: 1.Overview. 2.Semijoin Operation. 3. Different semijoin operations. a. 2 way semijoin. b. Hash Semijoin.

1.1 What is distributed database system? A distributed database system is characterized by the distribution of the system components of hardware ,control and data. For this research, a distributed system is a collection of independent computers interconnected via point-to-point communication lines.

1.2 Node Characteristics: Each computer , known as a node in the network, has a processing capability, a data storage capability, and is capable of operating autonomously in the system. Each node contains a version of a distributed DBMS.

1.3 What is distributed query processing? The retrieval of data from different sites in a network is known as distributed query processing.

1.4 Phases of distributed query processing with a semijoin operator. 1. Initial Local processing (Selections and Projects are processed at each site.) 2. Semijoin processing ( A semijoin program) is derived from the remaining join operations and executed to reduce the size of the relations in a cost-effective way) 3. Final processing (all relations involved are transmitted to final site and all joins are performed there. qs: query site)

2.1 Semijoin: A semijoin from Ri to Rj on attribute A can be denoted as Rj⋉ Ri .It is used to reduce the data transmission cost. Computing steps: 1) Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ; 2) Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A] .

2.2 Example: Example (semijoin s: R1—AR2): Site 2 Site 1 qs 3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 1 2 3 R1[A] projection Ship(3) 3 7 R2’ reduce qs Ship(2) Ship(6) Benefit (s) = 6 -2 = 4 Cost (s) = 3 Cost effectiveness D(s) = B(s)-C(s) >0

3.a.1 Definition of 2 way semijoin. 2-way Semijoin—an extended version of the semijoin Definition: A 2-way semijoin (t) of Ri and Rj on attribute A can be denoted as RiARj = {Ri—ARj, Rj—ARi } So t reduces Ri and Rj to Ri’ and Rj’ respectively.

3.a.2 Properties of 2 way semijoin. Computing steps: 1) Send Ri [A] from site i to site j ; 2) Reduce Rj to Rj’ by eliminating tuples whose attribute A are not matching any of Ri [A] and at the same time partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri [A]nm(Ri [A]- Ri [A]m) ; 3) Send min(Ri [A]m , Ri [A]nm) back to site i ; 4) Reduce Ri to Ri ’ using Ri [A]m (or Ri [A]nm) . Evaluation: Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)] Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)] If the benefit exceeds the cost (D(t) >0) then it is called a cost-effective 2-way semioin

3.a.3 2-way semijoin example. 1 2 3 R1[A] projection Ship(3) 3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 3 R1[A]m 1 2 R1[A]nm partition 7 R2’ reduce Ship(1) 3 6 R1’ reduce qs Ship(2)

3.a.4 Semijoin Vs 2-way semijoin. - It is an extended version of semijoin. - It has more reduction power than semijoin. - The propagation of reduction effects by the 2-way semijoin is further than by the semijoin.

3.b.1 Hash-semijoin operator. Main idea : use a search filter which represents the semijoin projection with a small bit array . Definition: The hash-semijoin of Ri and Rj is denoted Rj∝ Ri. It is computed as follow: The Semijoin projection of Ri is represented as a bit array; Shipping this bit array to the site of Rj ; finally, the tuples of Rj are screened by the search filter.

3.b.2 hash semijoin example. R2 R1 1 B H(x)=X Hij((Ri))Bij S#(R1) 1 3 4 8 projection S# Phone 2 222 3 333 4 444 5 555 6 666 Ship(Bij) Rj S# Name 1 Cindy 3 Jemal 4 Sunny 8 Maggie reduce 3 333 4 444 14

3.b.3 Semijoin Vs Hash Semijoin. Advantages: Hash-semijoin is more cost-effective than semijoin The search filter in the hash-semijoin achieves considerable savings in the cost of a semijoin operation Limitation: Only works on execution tree Tightly related with the hash functions