Distributed Query Processing using different Semijoin operations.

Slides:



Advertisements
Similar presentations
พีชคณิตแบบสัมพันธ์ (Relational Algebra) บทที่ 3 อ. ดร. ชุรี เตชะวุฒิ CS (204)321 ระบบฐานข้อมูล 1 (Database System I)
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Lecture 24: Query Execution Monday, November 20, 2000.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
CS 347Notes 031 CS 347: Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query optimization in distributed database systems.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Advanced Relational Algebra & SQL (Part1 )
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
CS4432: Database Systems II Query Processing- Part 1 1.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
CS 540 Database Management Systems
CS 440 Database Management Systems
Chapter # 6 The Relational Algebra and Calculus
Database Management System
Distributed Query Processing using different Semijoin operations.
Distributed Database Management Systems
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
Overview of Query Optimization
Database Management Systems (CS 564)
R*: An Overview of the Architecture
Lecture 17: Distributed Transactions
1 Demand of your DB is changing Presented By: Ashwani Kumar
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query Execution Two-pass Algorithms based on Hashing
(Two-Pass Algorithms)
Distributed Databases and DBMSs: Concepts and Design
Query Optimization CS 157B Ch. 14 Mien Siao.
Vertical Fragmentation
Distributed Database Management Systems
2018, Spring Pusan National University Ki-Joune Li
Chapter 11 Database Performance Tuning and Query Optimization
A Framework for Testing Query Transformation Rules
Distributed Database Management Systems
Lecture 22: Query Execution
Distributed Databases
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Distributed Database Management Systems
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Distributed Database Management Systems
The Gamma Database Machine Project
Lecture 22: Friday, November 22, 2002.
Distributed Databases
Presentation transcript:

Distributed Query Processing using different Semijoin operations. Presented By: Jamal Uddin Ahamed Friday,March12,2004

Presentation Outline: 1.Overview. 2.Semijoin Operation. 3. Different semijoin operations. a. 2 way semijoin. b.Hash Semijoin. c.Domain Specific Semijoin. d. Composite semijoin. 4. References. 5.Questions and Answer.

1.1 What is distributed database system? A distributed database system is characterized by the distribution of the system components of hardware ,control and data. For this research, a distributed system is a collection of independent computers interconnected via point-to-point communication lines.

1.2 Node Characteristics: Each computer , known as a node in the network, has a processing capability, a data storage capability, and is capable of operating autonomously in the system. Each node contains a version of a distributed DBMS.

1.3 What is distributed query processing? The retrieval of data from different sites in a network is known as distributed query processing.

1.4 Phases of distributed query processing with a semijoin operator. Initial Local processing (Selections and Projects are processed at each site.) Semijoin processing ( A semijoin program) is derived from the remaining join operations and executed to reduce the size of the relations in a cost-effective way) Final processing (all relations involved are transmitted to final site and all joins are performed there.)

2.1 Semijoin: A semijoin from Ri to Rj on attribute A can be denoted as Rj⋉ Ri .It is used to reduce the data transmission cost. Computing steps: Project Ri on attribute A (Ri[A] ) and ship this projection ( a semijoin projection) from the site of Ri to the site of Rj ; Reduce Rj to Rj’ by eliminating tuples where attribute A are not matching any value in Ri[A] .

2.2 Example: Example (semijoin s: R1—AR2): Site 2 Site 1 qs 3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 1 2 3 R1[A] projection Ship(3) 3 7 R2’ reduce qs Ship(2) Ship(6) Benefit (s) = 6 -2 = 4 Cost (s) = 3 Cost effectiveness D(s) = B(s)-C(s) >0

3.a.1 Definition of 2 way semijoin. 2-way Semijoin—an extended version of the semijoin Definition: A 2-way semijoin (t) of Ri and Rj on attribute A can be denoted as RiARj = {Ri—ARj, Rj—ARi } So t reduces Ri and Rj to Ri’ and Rj’ respectively.

3.a.2 Properties of 2 way semijoin. Computing steps: Send Ri [A] from site i to site j ; Reduce Rj to Rj’ by eliminating tuples whose attribute A are not matching any of Ri [A] and at the same time partition Ri [A] to Ri [A]m (match one of Rj [A]) and Ri [A]nm(Ri [A]- Ri [A]m) ; Send min(Ri [A]m , Ri [A]nm) back to site i ; Reduce Ri to Ri ’ using Ri [A]m (or Ri [A]nm) . Evaluation: Benefit: B(t) = [S(Ri ) - S(Ri ’)] + [S(Rj) - S(Rj’)] Cost: C(t) = S(Ri [A] ) + min[S(Ri [A]m ) , S( Ri [A]nm)] If the benefit exceeds the cost (D(t) >0) then it is called a cost-effective 2-way semioin

3.a.3 2-way semijoin example. 1 2 3 R1[A] projection Ship(3) 3 4 5 7 8 9 A C R2 B 1 2 6 R1 Site 1 Site 2 3 R1[A]m 1 2 R1[A]nm partition 7 R2’ reduce Ship(1) 3 6 R1’ reduce qs Ship(2)

3.a.4 Semijoin Vs 2-way semijoin. -It is an extended version of semijoin. It has more reduction power than semijoin. The propagation of reduction effects by the 2-way semijoin is further than by the semijoin.

3.b.1 Hash-semijoin operator. Main idea : use a search filter which represents the semijoin projection with a small bit array . Definition: The hash-semijoin of Ri and Rj is denoted Rj∝ Ri. It is computed as follow: The Semijoin projection of Ri is represented as a bit array; Shipping this bit array to the site of Rj ; finally, the tuples of Rj are screened by the search filter.

3.b.2 hash semijoin example. R2 R1 1 B H(x)=X Hij((Ri))Bij S#(R1) 1 3 4 8 projection S# Phone 2 222 3 333 4 444 5 555 6 666 Ship(Bij) Rj S# Name 1 Cindy 3 Jemal 4 Sunny 8 Maggie reduce 3 333 4 444

3.b.3 Semijoin Vs Hash Semijoin. Advantages: Hash-semijoin is more cost-effective than semijoin The search filter in the hash-semijoin achieves considerable savings in the cost of a semijoin operation Limitation: Only works on execution tree Tightly related with the hash functions

3.c.1 What is horizontally partitioned database We can call a distributed database system is horizontally partitioned (or fragmented) if the relations can be split horizontally into several disjoint sets of tuples, which are called horizontal fragments.

3.c.2 Horizontally partitioned database system.(Example) EMP1: 1D-no 10 EMP E-no E-name D-no 101 johnson 01 103 jordan 03 105 erving E-no E-name D-no 101 johnson 01 103 jordan 03 105 erving 109 jabbar 12 110 sampson 14 141 chang 16  EMP2: 11D-no 20 E-no E-name D-no 109 jabbar 12 110 sampson 14 141 chang 16

3.c.3 Horizontally partitioned database system.(Properties) A fragmented relation Ri can be constructed by performing a union operation on all its fragment. Ri = Uk Rik There is commutative rule between the binary operations join and union for fragmented relations: a join between two fragmented relation R1 and R2 is equivalent to a union over the joins between each fragment of R1 and each fragment of R2. Mathematically: (U R1k)[A=B] (U R2m)= U(R1k[A=B] R2m) k m k.m

3.c.4 Why can’t we use regular semjoin between two fragment to reduce the size of fragments?(Continue) We consider a joint Ri[A=B] Rj between two fragmented relations Ri and Rj. We want to reduce the size of Rik, a fragment of Ri , by semijoin before it is sent to the final processing site. We cannot perform the semijoin Rik A=B] Rjm between Rik and any fragment Rjm of Rj without considering the other fragment Rjm of Rj , because the join operation dictates that no tuple of a relation can be eliminate before it is compare with all tupls of the other joining relation which may be contribute to the join.

Example: sal: 101E-no 105 EMP1: 1D-no 10 EMP2: 11D-no 20 E-name D-no 101 johnson 01 103 jordan 03 135 erving E-no Sal D-no 101 1000 12 102 2000 03 105 3000 11 D-no 01 03 12 14 16 EMP2: 11D-no 20 sal: 105E-no 110 E-no E-name D-no 109 jabbar 12 110 sampson 14 141 chang 16 E-no Sal D-no 107 1000 12 2000 03 110 3000 11

3.c.5 Definition of Domain Specific Semijoin. The domain-specific semijoin operation, Rik( A=B] Rjm, where A and B are the joining attributes and Rik, Rjm are two fragments of the joining relation Ri and Rj respectively, is defined as follows: Rik( A=B] Rjm ={r|r Rik ; r.A  Rjm [B] U(Dom[Rj.B]-Dom[Rjm.B])} Where Rik is the restricted fragment and Rjm is the restricting fragment. We also called Ri the restricted relation and Rj is the restricting relation of the domain-specific semijoin.

3.d.1 Definition of Composite Semijoin. Composite Semijoin: a semijoin in which the projection and the transimssion involve multiple columns (attrs).

3.d.2 Example of Composite Semijoin. R2 R1 A1 A2 Non-join Attr 1 aa - bb 2 cc 3 A1 A2 Non-join Attr 1 cc - aa 2 bb 3 A1 A2 Non-join Attr 1 aa - No False loop!!

3.d.3 Semijoin Vs Composite Semijoin. Composite semijoins in a query processing algorithm is likely to result in substantial RT reduction. Composite semijoins should not always be used. If it results greater RT, ignore it. Strategy with composite semijoins is at least as good as that without composite semijoins.

References: Using 2-way semijoin in distributed query processing. By Hyunchul Kang and Nick Roussopoulos. Improving distributed query processing by hash-semijoins. By Judy Tseng and Arbee Chen. Domain Specific Semijoin:A new operation for distributed query processing. By Jason Chen and Victor Li. Composite Semijoin in distributed query processing. By William Perrizio and Chun Chen

Comments & Questions?? Thank You!