Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Query Processing using different Semijoin operations.

Similar presentations


Presentation on theme: "Distributed Query Processing using different Semijoin operations."— Presentation transcript:

1 Distributed Query Processing using different Semijoin operations.

2 Presentation Outline:
1.Overview. 2.Semijoin Operation. 3. Different semijoin operations. a. 2 way semijoin. b. Hash Semijoin.

3 1.1 What is distributed database system?
A distributed database system is characterized by the distribution of the system components of hardware ,control and data. For this research, a distributed system is a collection of independent computers interconnected via point-to-point communication lines.

4 1.2 Node Characteristics:
Each computer , known as a node in the network, has a processing capability, a data storage capability, and is capable of operating autonomously in the system. Each node contains a version of a distributed DBMS.

5 1.3 What is distributed query processing?
The retrieval of data from different sites in a network is known as distributed query processing.

6 1.4 Phases of distributed query processing with a semijoin operator.
1. Initial Local processing (Selections and Projects are processed at each site.) 2. Semijoin processing ( A semijoin program) is derived from the remaining join operations and executed to reduce the size of the relations in a cost-effective way) 3. Final processing (all relations involved are transmitted to final site and all joins are performed there. qs: query site)

7 2.1 Semijoin: A semijoin from Ri to Rj on attribute A can be denoted as Rj⋉ Ri .It is used to reduce the data transmission cost. Computing steps: 1) Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ; 2) Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A] .

8 2.2 Example: Example (semijoin s: R1—AR2): Site 2 Site 1 qs
3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 1 2 3 R1[A] projection Ship(3) 3 7 R2’ reduce qs Ship(2) Ship(6) Benefit (s) = 6 -2 = 4 Cost (s) = 3 Cost effectiveness D(s) = B(s)-C(s) >0

9 3.a.1 Definition of 2 way semijoin.
2-way Semijoin—an extended version of the semijoin Definition: A 2-way semijoin (t) of Ri and Rj on attribute A can be denoted as RiARj = {Ri—ARj, Rj—ARi } So t reduces Ri and Rj to Ri’ and Rj’ respectively.

10 3.a.2 Properties of 2 way semijoin.
Computing steps: 1) Send Ri [A] from site i to site j ; 2) Reduce Rj to Rj’ by eliminating tuples whose attribute A are not matching any of Ri [A] and at the same time partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri [A]nm(Ri [A]- Ri [A]m) ; 3) Send min(Ri [A]m , Ri [A]nm) back to site i ; 4) Reduce Ri to Ri ’ using Ri [A]m (or Ri [A]nm) . Evaluation: Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)] Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)] If the benefit exceeds the cost (D(t) >0) then it is called a cost-effective 2-way semioin

11 3.a.3 2-way semijoin example.
1 2 3 R1[A] projection Ship(3) 3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 3 R1[A]m 1 2 R1[A]nm partition 7 R2’ reduce Ship(1) 3 6 R1’ reduce qs Ship(2)

12 3.a.4 Semijoin Vs 2-way semijoin.
- It is an extended version of semijoin. - It has more reduction power than semijoin. - The propagation of reduction effects by the 2-way semijoin is further than by the semijoin.

13 3.b.1 Hash-semijoin operator.
Main idea : use a search filter which represents the semijoin projection with a small bit array . Definition: The hash-semijoin of Ri and Rj is denoted Rj∝ Ri. It is computed as follow: The Semijoin projection of Ri is represented as a bit array; Shipping this bit array to the site of Rj ; finally, the tuples of Rj are screened by the search filter.

14 3.b.2 hash semijoin example.
R2 R1 1 B H(x)=X Hij((Ri))Bij S#(R1) 1 3 4 8 projection S# Phone 2 222 3 333 4 444 5 555 6 666 Ship(Bij) Rj S# Name 1 Cindy 3 Jemal 4 Sunny 8 Maggie reduce 3 333 4 444 14

15 3.b.3 Semijoin Vs Hash Semijoin.
Advantages: Hash-semijoin is more cost-effective than semijoin The search filter in the hash-semijoin achieves considerable savings in the cost of a semijoin operation Limitation: Only works on execution tree Tightly related with the hash functions


Download ppt "Distributed Query Processing using different Semijoin operations."

Similar presentations


Ads by Google